CanItRun Logocanitrun.

Llama 4 Scout 109B

Llama 4 Scout 109B needs roughly 54.5GB VRAM at Q4 quantization (218.0GB at FP16). 24 GPUs we track can run it fully in VRAM at 8k context.

Meta109B params17B active (MoE)9766k contextLlama 4 CommunityCommercial use ok

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP16218.0 GB2.68 GB247.2 GB
Q8109.0 GB2.68 GB125.1 GB
Q6_K81.8 GB2.68 GB94.6 GB
Q5_K_M68.1 GB2.68 GB79.3 GB
Q4_K_M54.5 GB2.68 GB64.0 GB
Q3_K_M43.6 GB2.68 GB51.8 GB
Q2_K32.7 GB2.68 GB39.6 GB

Benchmarks

GPUs that run Llama 4 Scout 109B natively (24)

Plus 11 GPUs that run it with CPU offload (slower)

Notes

16 experts, 2 active. 10M context; KV cache limits practical context to much less.

Hugging Face ↗Ollama ↗Released 2025-04-05