CanItRun Logocanitrun.

DeepSeek R1 Distill Llama 70B

DeepSeek R1 Distill Llama 70B needs roughly 35.0GB VRAM at Q4 quantization (140.0GB at FP16). 33 GPUs we track can run it fully in VRAM at 8k context.

DeepSeek70B params125k contextMITCommercial use ok

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP16140.0 GB2.68 GB159.8 GB
Q870.0 GB2.68 GB81.4 GB
Q6_K52.5 GB2.68 GB61.8 GB
Q5_K_M43.8 GB2.68 GB52.0 GB
Q4_K_M35.0 GB2.68 GB42.2 GB
Q3_K_M28.0 GB2.68 GB34.4 GB
Q2_K21.0 GB2.68 GB26.5 GB

Benchmarks

GPUs that run DeepSeek R1 Distill Llama 70B natively (33)

Plus 15 GPUs that run it with CPU offload (slower)

Notes

Reasoning model — outputs long chains-of-thought before answering.

Hugging Face ↗Ollama ↗Released 2025-01-20