Kimi K2.6
Kimi K2.6 needs roughly 500.0GB VRAM at Q4 quantization (2000.0GB at FP16). 2 GPUs we track can run it fully in VRAM at 8k context.
Moonshot AI1000B params32B active (MoE)250k contextKimiNon-commercial only
VRAM at each quantization
Assumes 8k context. KV cache grows linearly with context length.
| Quant | Weights | KV cache | Total |
|---|---|---|---|
| FP16 | 2000.0 GB | 2.68 GB | 2243.0 GB |
| Q8 | 1000.0 GB | 2.68 GB | 1123.0 GB |
| Q6_K | 750.0 GB | 2.68 GB | 843.0 GB |
| Q5_K_M | 625.0 GB | 2.68 GB | 703.0 GB |
| Q4_K_M | 500.0 GB | 2.68 GB | 563.0 GB |
| Q3_K_M | 400.0 GB | 2.68 GB | 451.0 GB |
| Q2_K | 300.0 GB | 2.68 GB | 339.0 GB |
Benchmarks
GPUs that run Kimi K2.6 natively (2)
- Apple M4 Ultra (384GB)Q2_K · 125.1 t/s
- Apple M2 Ultra (384GB)Q2_K · 91.7 t/s
Notes
1 Trillion parameter MoE. Advanced agentic capabilities for complex workflows.
