CanItRun Logocanitrun.

DeepSeek R1 Distill Llama 70B

DeepSeek R1 Distill Llama 70B needs roughly 47.1 GB VRAM at Q4_K_M quantization (159.8 GB at FP16). 47 GPUs we track can run it fully in VRAM at 8k context.

47 GPUs run this natively · 35 with CPU offload

DeepSeek70B params125k contextMITCommercial use ok

DeepSeek R1 Distill Llama 70B is a 70B parameter dense model developed by DeepSeek. 70B distillation of DeepSeek-R1's reasoning capabilities into Llama-3.3 architecture.

To run DeepSeek R1 Distill Llama 70B locally: Q4_K_M ~35-40GB — same requirements as Llama-3.3-70B. Best way to get R1-style reasoning locally.

MMLU-Pro 70.0%, GPQA 65.2%, Math 94.5% — inherits R1's reasoning strength at practical size.

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP32280.0 GB2.68 GB316.6 GB
BF16140.0 GB2.68 GB159.8 GB
FP16140.0 GB2.68 GB159.8 GB
Q8_070.0 GB2.68 GB81.4 GB
Q6_K57.4 GB2.68 GB67.3 GB
Q5_K_M45.1 GB2.68 GB53.5 GB
Q4_K_Mrec39.4 GB2.68 GB47.1 GB
Q3_K_M30.1 GB2.68 GB36.7 GB
Q2_K23.0 GB2.68 GB28.8 GB
NVFP4cuda35.0 GB2.68 GB42.2 GB

KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.

Benchmarks

GPUs that run DeepSeek R1 Distill Llama 70B natively (47)

Plus 35 GPUs that run it with CPU offload (slower)

Notes

Reasoning model — outputs long chains-of-thought before answering.

Hugging Face ↗Ollama ↗Released 2025-01-20

Compare DeepSeek R1 Distill Llama 70B with other models

Frequently asked questions

What are the VRAM requirements for DeepSeek R1 Distill Llama 70B?
DeepSeek R1 Distill Llama 70B requires approximately 47.1 GB of VRAM at Q4_K_M quantization, 81.4 GB at Q8, and 159.8 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
How many parameters does DeepSeek R1 Distill Llama 70B have?
DeepSeek R1 Distill Llama 70B has 70 billion parameters.
Is DeepSeek R1 Distill Llama 70B good at reasoning and math?
Yes. With a MATH score of 94.5 and MMLU-Pro of 70, DeepSeek R1 Distill Llama 70B handles complex multi-step reasoning, analytical tasks, and problem-solving well.
Can DeepSeek R1 Distill Llama 70B run on a 16 GB GPU?
No. At Q4_K_M, DeepSeek R1 Distill Llama 70B needs 47.1 GB of VRAM — more than 16 GB. You will need a 48 GB GPU like the RTX 6000 Ada or a dual-GPU setup.
Can DeepSeek R1 Distill Llama 70B run on a 24 GB GPU?
No. Even at Q4_K_M, DeepSeek R1 Distill Llama 70B needs 47.1 GB. Consider a 48 GB card like the RTX 6000 Ada or a dual RTX 4090 setup.
What is the smallest quantization for DeepSeek R1 Distill Llama 70B that fits in 24 GB of VRAM?
DeepSeek R1 Distill Llama 70B cannot fit in 24 GB of VRAM at any standard quantization level. The minimum needed is 28.8 GB at Q2_K.
What GPU do I need to run DeepSeek R1 Distill Llama 70B locally?
You need a 48 GB GPU or a dual-GPU setup. At Q4_K_M, DeepSeek R1 Distill Llama 70B needs 47.1 GB VRAM. Options: RTX 6000 Ada (48 GB), A6000 (48 GB), or 2× RTX 4090.