DeepSeek V4 Pro 1.6T

DeepSeek V4 Pro 1.6T needs roughly 897.1GB VRAM at Q4_K_M quantization (3585.2GB at FP16). 0 GPUs we track can run it fully in VRAM at 8k context.

DeepSeek1600B params49B active (MoE)1024k contextMITCommercial use ok

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

Quant	Weights	KV cache	Total
FP16	3200.0 GB	1.02 GB	3585.2 GB
Q8	1600.0 GB	1.02 GB	1793.2 GB
Q6_K	1200.0 GB	1.02 GB	1345.2 GB
Q5_K_M	1000.0 GB	1.02 GB	1121.2 GB
Q4_K_M	800.0 GB	1.02 GB	897.1 GB
Q3_K_M	640.0 GB	1.02 GB	718.0 GB
Q2_K	480.0 GB	1.02 GB	538.8 GB

Benchmarks

MMLU-Pro

87.5

GPQA

90.1

GPUs that run DeepSeek V4 Pro 1.6T natively (0)

No single GPU in our list fits this model at Q4 with 8k context. Try multi-GPU or CPU offload.

Notes

1.6T MoE with hybrid CSA/HCA attention and 1M token context. Requires 27% of V3.2's inference FLOPs at 1M context; kvHeads/headDim approximates MLA storage.

Hugging Face ↗Released 2026-04-24