CanItRun Logocanitrun.

Qwen 2.5 Coder 32B Instruct

Qwen 2.5 Coder 32B Instruct needs roughly 22.9 GB VRAM at Q4_K_M quantization (75.2 GB at FP16). 75 GPUs we track can run it fully in VRAM at 8k context.

75 GPUs run this natively · 19 with CPU offload

Alibaba32.5B params125k contextApache 2.0Commercial use ok

Qwen 2.5 Coder 32B Instruct is a 32.5B parameter dense model developed by Alibaba. November 2024 coding-specialized variant — best open-weight coding model at this size.

To run Qwen 2.5 Coder 32B Instruct locally: Same VRAM requirements as Qwen2.5-32B (~18-20GB Q4). The top choice for developers with 24GB GPUs.

HumanEval 92.7% is exceptional, rivaling much larger models. MMLU-Pro 50.4% shows strong general capabilities too.

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP32130.0 GB2.15 GB148.0 GB
BF1665.0 GB2.15 GB75.2 GB
FP1665.0 GB2.15 GB75.2 GB
Q8_032.5 GB2.15 GB38.8 GB
Q6_K26.6 GB2.15 GB32.3 GB
Q5_K_M20.9 GB2.15 GB25.9 GB
Q4_K_Mrec18.3 GB2.15 GB22.9 GB
Q3_K_M14.0 GB2.15 GB18.1 GB
Q2_K10.7 GB2.15 GB14.4 GB
NVFP4cuda16.3 GB2.15 GB20.6 GB

KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.

Benchmarks

GPUs that run Qwen 2.5 Coder 32B Instruct natively (75)

Plus 19 GPUs that run it with CPU offload (slower)

Notes

Best open-weight coding model at this size.

Hugging Face ↗Ollama ↗Released 2024-11-12

Compare Qwen 2.5 Coder 32B Instruct with other models

Frequently asked questions

What are the VRAM requirements for Qwen 2.5 Coder 32B Instruct?
Qwen 2.5 Coder 32B Instruct requires approximately 22.9 GB of VRAM at Q4_K_M quantization, 38.8 GB at Q8, and 75.2 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
How many parameters does Qwen 2.5 Coder 32B Instruct have?
Qwen 2.5 Coder 32B Instruct has 32.5 billion parameters.
How capable is Qwen 2.5 Coder 32B Instruct?
With an MMLU-Pro score of 50.4, Qwen 2.5 Coder 32B Instruct delivers solid general-purpose performance suitable for most everyday tasks and professional use.
Can Qwen 2.5 Coder 32B Instruct run on a 16 GB GPU?
No. At Q4_K_M, Qwen 2.5 Coder 32B Instruct needs 22.9 GB of VRAM — more than 16 GB. You will need a 24 GB GPU like the RTX 4090 or RTX 3090.
Can Qwen 2.5 Coder 32B Instruct run on a 24 GB GPU?
Yes. Qwen 2.5 Coder 32B Instruct fits in a 24 GB GPU at Q4_K_M, requiring 22.9 GB VRAM. GPUs with 24 GB include the RTX 4090, RTX 3090, and RTX 3090 Ti.
What is the smallest quantization for Qwen 2.5 Coder 32B Instruct that fits in 24 GB of VRAM?
At NVFP4, Qwen 2.5 Coder 32B Instruct needs 20.6 GB — the highest-quality quantization that fits in 24 GB of VRAM.
What GPU do I need to run Qwen 2.5 Coder 32B Instruct locally?
A 24 GB GPU is the minimum. At Q4_K_M, Qwen 2.5 Coder 32B Instruct needs 22.9 GB VRAM. Good options: RTX 4090 (24 GB), RTX 3090 (24 GB).