CanItRun Logocanitrun.

DeepSeek R1 Distill Llama 8B

DeepSeek R1 Distill Llama 8B needs roughly 6.3 GB VRAM at Q4_K_M quantization (19.1 GB at FP16). 105 GPUs we track can run it fully in VRAM at 8k context.

105 GPUs run this natively · 2 with CPU offload

DeepSeek8B params125k contextMITCommercial use ok

DeepSeek R1 Distill Llama 8B is a 8B parameter dense model developed by DeepSeek. Compact 8B distillation bringing R1 reasoning to edge devices.

To run DeepSeek R1 Distill Llama 8B locally: Q5_K_M ~6GB — runs on 8GB GPUs. Best reasoning model for budget hardware.

MMLU-Pro 41.0%, GPQA 49.0%, Math 89.1% — exceptional reasoning for the size.

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP3232.0 GB1.07 GB37.0 GB
BF1616.0 GB1.07 GB19.1 GB
FP1616.0 GB1.07 GB19.1 GB
Q8_08.0 GB1.07 GB10.2 GB
Q6_K6.6 GB1.07 GB8.6 GB
Q5_K_Mrec5.2 GB1.07 GB7.0 GB
Q4_K_M4.5 GB1.07 GB6.3 GB
Q3_K_M3.4 GB1.07 GB5.1 GB
Q2_K2.6 GB1.07 GB4.2 GB
NVFP4cuda4.0 GB1.07 GB5.7 GB

KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.

Benchmarks

GPUs that run DeepSeek R1 Distill Llama 8B natively (105)

Plus 2 GPUs that run it with CPU offload (slower)
Hugging Face ↗Ollama ↗Released 2025-01-20

Compare DeepSeek R1 Distill Llama 8B with other models

Frequently asked questions

What are the VRAM requirements for DeepSeek R1 Distill Llama 8B?
DeepSeek R1 Distill Llama 8B requires approximately 6.2 GB of VRAM at Q4_K_M quantization, 10.2 GB at Q8, and 19.1 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
How many parameters does DeepSeek R1 Distill Llama 8B have?
DeepSeek R1 Distill Llama 8B has 8 billion parameters.
Is DeepSeek R1 Distill Llama 8B good at reasoning and math?
Yes. With a MATH score of 89.1 and MMLU-Pro of 41, DeepSeek R1 Distill Llama 8B handles complex multi-step reasoning, analytical tasks, and problem-solving well.
Can DeepSeek R1 Distill Llama 8B run on a 16 GB GPU?
Yes. DeepSeek R1 Distill Llama 8B needs 6.2 GB at Q4_K_M, which fits in a 16 GB GPU like the RTX 4080 or RTX 4070 Ti Super.
What is the smallest quantization for DeepSeek R1 Distill Llama 8B that fits in 24 GB of VRAM?
At BF16, DeepSeek R1 Distill Llama 8B needs 19.1 GB — the highest-quality quantization that fits in 24 GB of VRAM.
What GPU do I need to run DeepSeek R1 Distill Llama 8B locally?
A 16 GB GPU is enough. At Q4_K_M, DeepSeek R1 Distill Llama 8B needs 6.2 GB VRAM. Good options: RTX 4080 (16 GB), RTX 4070 Ti Super (16 GB).