CanItRun Logocanitrun.

Qwen3 32B

Qwen3 32B needs roughly 22.2 GB VRAM at Q4_K_M quantization (75.0 GB at FP16). 76 GPUs we track can run it fully in VRAM at 8k context.

76 GPUs run this natively · 19 with CPU offload

Alibaba32.8B params128k contextApache 2.0Commercial use ok

Qwen3 32B is a 32.8B parameter dense model developed by Alibaba. Dense 32B model with thinking/non-thinking mode support.

To run Qwen3 32B locally: Q4_K_M ~18-20GB — same tier as Qwen2.5-32B.

Chain-of-thought capabilities at 32B scale — strong reasoning with Apache 2.0 licensing.

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP32131.2 GB1.34 GB148.4 GB
BF1665.6 GB1.34 GB75.0 GB
FP1665.6 GB1.34 GB75.0 GB
Q8_032.8 GB1.34 GB38.2 GB
Q6_K26.9 GB1.34 GB31.6 GB
Q5_K_M21.1 GB1.34 GB25.2 GB
Q4_K_Mrec18.5 GB1.34 GB22.2 GB
Q3_K_M14.1 GB1.34 GB17.3 GB
Q2_K10.8 GB1.34 GB13.6 GB
NVFP4cuda16.4 GB1.34 GB19.9 GB

KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.

Benchmarks

GPUs that run Qwen3 32B natively (76)

Plus 19 GPUs that run it with CPU offload (slower)

Notes

Supports thinking (chain-of-thought) and non-thinking modes.

Hugging Face ↗Ollama ↗Released 2025-04-29

Compare Qwen3 32B with other models

Frequently asked questions

What are the VRAM requirements for Qwen3 32B?
Qwen3 32B requires approximately 22.2 GB of VRAM at Q4_K_M quantization, 38.2 GB at Q8, and 75.0 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
How many parameters does Qwen3 32B have?
Qwen3 32B has 32.8 billion parameters.
How capable is Qwen3 32B?
With an MMLU-Pro score of 65.54, Qwen3 32B delivers solid general-purpose performance suitable for most everyday tasks and professional use.
Can Qwen3 32B run on a 16 GB GPU?
No. At Q4_K_M, Qwen3 32B needs 22.2 GB of VRAM — more than 16 GB. You will need a 24 GB GPU like the RTX 4090 or RTX 3090.
Can Qwen3 32B run on a 24 GB GPU?
Yes. Qwen3 32B fits in a 24 GB GPU at Q4_K_M, requiring 22.2 GB VRAM. GPUs with 24 GB include the RTX 4090, RTX 3090, and RTX 3090 Ti.
What is the smallest quantization for Qwen3 32B that fits in 24 GB of VRAM?
At NVFP4, Qwen3 32B needs 19.9 GB — the highest-quality quantization that fits in 24 GB of VRAM.
What GPU do I need to run Qwen3 32B locally?
A 24 GB GPU is the minimum. At Q4_K_M, Qwen3 32B needs 22.2 GB VRAM. Good options: RTX 4090 (24 GB), RTX 3090 (24 GB).