MiniMax M1 456B
MiniMax M1 456B needs roughly 258.4GB VRAM at Q4_K_M quantization (1024.5GB at FP16). 5 GPUs we track can run it fully in VRAM at 8k context.
MiniMax456B params46B active (MoE)1024k contextApache 2.0Commercial use ok
VRAM at each quantization
Assumes 8k context. KV cache grows linearly with context length.
| Quant | Weights | KV cache | Total |
|---|---|---|---|
| FP16 | 912.0 GB | 2.68 GB | 1024.5 GB |
| Q8 | 456.0 GB | 2.68 GB | 513.7 GB |
| Q6_K | 342.0 GB | 2.68 GB | 386.1 GB |
| Q5_K_M | 285.0 GB | 2.68 GB | 322.2 GB |
| Q4_K_M | 228.0 GB | 2.68 GB | 258.4 GB |
| Q3_K_M | 182.4 GB | 2.68 GB | 207.3 GB |
| Q2_K | 136.8 GB | 2.68 GB | 156.2 GB |
Benchmarks
GPUs that run MiniMax M1 456B natively (5)
- AMD Instinct MI300XQ2_K · 422.5 t/s
- Apple M4 Ultra (384GB)Q5_K_M · 41.8 t/s
- Apple M4 Ultra (192GB)Q2_K · 87 t/s
- Apple M2 Ultra (384GB)Q5_K_M · 30.6 t/s
- Apple M2 Ultra (192GB)Q2_K · 63.8 t/s
Notes
Hybrid lightning attention (linear+softmax) reasoning model with 1M context. 40k and 80k thinking-budget variants available.