Llama 4 Maverick 400B
Llama 4 Maverick 400B needs roughly 200.0GB VRAM at Q4 quantization (800.0GB at FP16). 5 GPUs we track can run it fully in VRAM at 8k context.
Meta400B params17B active (MoE)977k contextLlama 4 CommunityCommercial use ok
VRAM at each quantization
Assumes 8k context. KV cache grows linearly with context length.
| Quant | Weights | KV cache | Total |
|---|---|---|---|
| FP16 | 800.0 GB | 4.03 GB | 900.5 GB |
| Q8 | 400.0 GB | 4.03 GB | 452.5 GB |
| Q6_K | 300.0 GB | 4.03 GB | 340.5 GB |
| Q5_K_M | 250.0 GB | 4.03 GB | 284.5 GB |
| Q4_K_M | 200.0 GB | 4.03 GB | 228.5 GB |
| Q3_K_M | 160.0 GB | 4.03 GB | 183.7 GB |
| Q2_K | 120.0 GB | 4.03 GB | 138.9 GB |
Benchmarks
GPUs that run Llama 4 Maverick 400B natively (5)
- AMD Instinct MI300XQ2_K · 1143.1 t/s
- Apple M4 Ultra (384GB)Q6_K · 94.2 t/s
- Apple M4 Ultra (192GB)Q3_K_M · 176.6 t/s
- Apple M2 Ultra (384GB)Q6_K · 69 t/s
- Apple M2 Ultra (192GB)Q3_K_M · 129.4 t/s
Notes
128 experts, 2 active. Realistically needs a multi-GPU server.
