CanItRun Logocanitrun.

Llama 4 Maverick 400B

Llama 4 Maverick 400B needs roughly 200.0GB VRAM at Q4 quantization (800.0GB at FP16). 5 GPUs we track can run it fully in VRAM at 8k context.

Meta400B params17B active (MoE)977k contextLlama 4 CommunityCommercial use ok

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP16800.0 GB4.03 GB900.5 GB
Q8400.0 GB4.03 GB452.5 GB
Q6_K300.0 GB4.03 GB340.5 GB
Q5_K_M250.0 GB4.03 GB284.5 GB
Q4_K_M200.0 GB4.03 GB228.5 GB
Q3_K_M160.0 GB4.03 GB183.7 GB
Q2_K120.0 GB4.03 GB138.9 GB

Benchmarks

GPUs that run Llama 4 Maverick 400B natively (5)

Notes

128 experts, 2 active. Realistically needs a multi-GPU server.

Hugging Face ↗Ollama ↗Released 2025-04-05