Kimi K2.6

Kimi K2.6 needs roughly 500.0GB VRAM at Q4 quantization (2000.0GB at FP16). 2 GPUs we track can run it fully in VRAM at 8k context.

Moonshot AI1000B params32B active (MoE)250k contextKimiNon-commercial only

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

Quant	Weights	KV cache	Total
FP16	2000.0 GB	2.68 GB	2243.0 GB
Q8	1000.0 GB	2.68 GB	1123.0 GB
Q6_K	750.0 GB	2.68 GB	843.0 GB
Q5_K_M	625.0 GB	2.68 GB	703.0 GB
Q4_K_M	500.0 GB	2.68 GB	563.0 GB
Q3_K_M	400.0 GB	2.68 GB	451.0 GB
Q2_K	300.0 GB	2.68 GB	339.0 GB

Benchmarks

GPUs that run Kimi K2.6 natively (2)

Apple M4 Ultra (384GB)Q2_K · 125.1 t/s
Apple M2 Ultra (384GB)Q2_K · 91.7 t/s

Notes

1 Trillion parameter MoE. Advanced agentic capabilities for complex workflows.

Hugging Face ↗Ollama ↗Released 2026-04-10