DeepSeek R1 671B
DeepSeek R1 671B needs roughly 423.7 GB VRAM at Q4_K_M quantization (1503.6 GB at FP16). 4 GPUs we track can run it fully in VRAM at 8k context.
4 GPUs run this natively · 0 with CPU offload
DeepSeek R1 671B is a Mixture of Experts (MoE) model with 671B total parameters but only 37B active per token developed by DeepSeek. January 2025 reasoning model trained via pure reinforcement learning (GRPO) without supervised fine-tuning. 671B MoE with 37B active parameters.
To run DeepSeek R1 671B locally: Full model needs datacenter hardware. Use distill variants: 32B distill (~20GB Q4) runs on 24GB GPU; 70B distill (~40GB Q4) needs dual GPU or Mac Studio. As a MoE model, inference speed depends on active parameters (37B) rather than total size.
AIME 2024: 79.8% pass@1, MATH-500: 97.3%, Codeforces: 96.3rd percentile — self-reflection and verification emerge from RL.
VRAM at each quantization
Assumes 8k context. KV cache grows linearly with context length.
| Quant | Weights | KV cache | Total |
|---|---|---|---|
| FP32 | 2684.0 GB | 0.51 GB | 3006.7 GB |
| BF16 | 1342.0 GB | 0.51 GB | 1503.6 GB |
| FP16 | 1342.0 GB | 0.51 GB | 1503.6 GB |
| Q8_0 | 671.0 GB | 0.51 GB | 752.1 GB |
| Q6_K | 550.2 GB | 0.51 GB | 616.8 GB |
| Q5_K_M | 432.1 GB | 0.51 GB | 484.6 GB |
| Q4_K_M | 377.8 GB | 0.51 GB | 423.7 GB |
| Q3_K_M | 288.5 GB | 0.51 GB | 323.7 GB |
| Q2_Krec | 220.8 GB | 0.51 GB | 247.8 GB |
| NVFP4cuda | 335.5 GB | 0.51 GB | 376.3 GB |
KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.
Benchmarks
GPUs that run DeepSeek R1 671B natively (4)
- Apple M4 Ultra (384GB)Q3_K_M · 75.5 t/s
- Apple M3 Ultra (512GB)Q5_K_M · 37.8 t/s
- Apple M3 Ultra (256GB)Q2_K · 74 t/s
- Apple M2 Ultra (384GB)Q3_K_M · 55.3 t/s
Notes
Full reasoning model — outputs long chains-of-thought. Same architecture as V3; needs a multi-GPU server.
Compare DeepSeek R1 671B with other models
Frequently asked questions
- What are the VRAM requirements for DeepSeek R1 671B?
- DeepSeek R1 671B requires approximately 423.7 GB of VRAM at Q4_K_M quantization, 752.1 GB at Q8, and 1503.6 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
- How many parameters does DeepSeek R1 671B have?
- DeepSeek R1 671B has 671 billion total parameters, but only 37 billion are active per token thanks to its Mixture of Experts (MoE) architecture. This makes inference significantly faster than the total parameter count suggests.
- Is DeepSeek R1 671B good at reasoning and math?
- Yes. With a MATH score of 97.3 and MMLU-Pro of 85, DeepSeek R1 671B handles complex multi-step reasoning, analytical tasks, and problem-solving well.
- Can DeepSeek R1 671B run on a 16 GB GPU?
- No. At Q4_K_M, DeepSeek R1 671B needs 423.7 GB of VRAM — more than 16 GB. You will need a multi-GPU server.
- Can DeepSeek R1 671B run on a 24 GB GPU?
- No. Even at Q4_K_M, DeepSeek R1 671B needs 423.7 GB. Consider a multi-GPU server with 80 GB+ total VRAM.
- What is the smallest quantization for DeepSeek R1 671B that fits in 24 GB of VRAM?
- DeepSeek R1 671B cannot fit in 24 GB of VRAM at any standard quantization level. The minimum needed is 247.8 GB at Q2_K.
- What GPU do I need to run DeepSeek R1 671B locally?
- You need a multi-GPU server. At Q4_K_M, DeepSeek R1 671B needs 423.7 GB VRAM, more than any single consumer GPU. Consider 2–4× H100 or A100 GPUs.