CanItRun Logocanitrun.

DeepSeek R1 Distill Qwen 32B

DeepSeek R1 Distill Qwen 32B needs roughly 22.9 GB VRAM at Q4_K_M quantization (75.2 GB at FP16). 75 GPUs we track can run it fully in VRAM at 8k context.

75 GPUs run this natively · 19 with CPU offload

DeepSeek32.5B params125k contextMITCommercial use ok

DeepSeek R1 Distill Qwen 32B is a 32.5B parameter dense model developed by DeepSeek. 32B distillation using Qwen architecture — great local option for hard problems.

To run DeepSeek R1 Distill Qwen 32B locally: Q4_K_M ~18-20GB — runs on 24GB GPU. The sweet spot for reasoning-focused local deployment.

MMLU-Pro 65.0%, GPQA 62.1%, Math 94.3% — reasoning quality rivaling 70B+ models.

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP32130.0 GB2.15 GB148.0 GB
BF1665.0 GB2.15 GB75.2 GB
FP1665.0 GB2.15 GB75.2 GB
Q8_032.5 GB2.15 GB38.8 GB
Q6_K26.6 GB2.15 GB32.3 GB
Q5_K_M20.9 GB2.15 GB25.9 GB
Q4_K_Mrec18.3 GB2.15 GB22.9 GB
Q3_K_M14.0 GB2.15 GB18.1 GB
Q2_K10.7 GB2.15 GB14.4 GB
NVFP4cuda16.3 GB2.15 GB20.6 GB

KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.

Benchmarks

GPUs that run DeepSeek R1 Distill Qwen 32B natively (75)

Plus 19 GPUs that run it with CPU offload (slower)

Notes

Reasoning model — great local option for hard problems.

Hugging Face ↗Ollama ↗Released 2025-01-20

Compare DeepSeek R1 Distill Qwen 32B with other models

Frequently asked questions

What are the VRAM requirements for DeepSeek R1 Distill Qwen 32B?
DeepSeek R1 Distill Qwen 32B requires approximately 22.9 GB of VRAM at Q4_K_M quantization, 38.8 GB at Q8, and 75.2 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
How many parameters does DeepSeek R1 Distill Qwen 32B have?
DeepSeek R1 Distill Qwen 32B has 32.5 billion parameters.
Is DeepSeek R1 Distill Qwen 32B good at reasoning and math?
Yes. With a MATH score of 94.3 and MMLU-Pro of 65, DeepSeek R1 Distill Qwen 32B handles complex multi-step reasoning, analytical tasks, and problem-solving well.
Can DeepSeek R1 Distill Qwen 32B run on a 16 GB GPU?
No. At Q4_K_M, DeepSeek R1 Distill Qwen 32B needs 22.9 GB of VRAM — more than 16 GB. You will need a 24 GB GPU like the RTX 4090 or RTX 3090.
Can DeepSeek R1 Distill Qwen 32B run on a 24 GB GPU?
Yes. DeepSeek R1 Distill Qwen 32B fits in a 24 GB GPU at Q4_K_M, requiring 22.9 GB VRAM. GPUs with 24 GB include the RTX 4090, RTX 3090, and RTX 3090 Ti.
What is the smallest quantization for DeepSeek R1 Distill Qwen 32B that fits in 24 GB of VRAM?
At NVFP4, DeepSeek R1 Distill Qwen 32B needs 20.6 GB — the highest-quality quantization that fits in 24 GB of VRAM.
What GPU do I need to run DeepSeek R1 Distill Qwen 32B locally?
A 24 GB GPU is the minimum. At Q4_K_M, DeepSeek R1 Distill Qwen 32B needs 22.9 GB VRAM. Good options: RTX 4090 (24 GB), RTX 3090 (24 GB).