DeepSeek V4 Pro 1.6T
DeepSeek V4 Pro 1.6T needs roughly 1010.0 GB VRAM at Q4_K_M quantization (3585.2 GB at FP16). 0 GPUs we track can run it fully in VRAM at 8k context.
0 GPUs run this natively · 0 with CPU offload
DeepSeek V4 Pro 1.6T is a Mixture of Experts (MoE) model with 1600B total parameters but only 49B active per token developed by DeepSeek. April 2026 1.6T parameter MoE with hybrid CSA/HCA attention and 1M token context.
To run DeepSeek V4 Pro 1.6T locally: Q2_K needs ~400-500GB — datacenter-only. The Flash variant (284B/13B active) is more practical at ~100-120GB Q2. As a MoE model, inference speed depends on active parameters (49B) rather than total size.
MMLU-Pro 87.5%, GPQA 90.1% — new frontier benchmark. Requires 27% of V3's inference FLOPs at 1M context.
VRAM at each quantization
Assumes 8k context. KV cache grows linearly with context length.
| Quant | Weights | KV cache | Total |
|---|---|---|---|
| FP32 | 6400.0 GB | 1.02 GB | 7169.1 GB |
| BF16 | 3200.0 GB | 1.02 GB | 3585.2 GB |
| FP16 | 3200.0 GB | 1.02 GB | 3585.2 GB |
| Q8_0 | 1600.0 GB | 1.02 GB | 1793.2 GB |
| Q6_K | 1312.0 GB | 1.02 GB | 1470.6 GB |
| Q5_K_M | 1030.4 GB | 1.02 GB | 1155.2 GB |
| Q4_K_M | 900.8 GB | 1.02 GB | 1010.0 GB |
| Q3_K_M | 688.0 GB | 1.02 GB | 771.7 GB |
| Q2_Krec | 526.4 GB | 1.02 GB | 590.7 GB |
| NVFP4cuda | 800.0 GB | 1.02 GB | 897.1 GB |
KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.
Benchmarks
GPUs that run DeepSeek V4 Pro 1.6T natively (0)
No single GPU in our list fits this model at Q4 with 8k context. Try multi-GPU or CPU offload.
Notes
1.6T MoE with hybrid CSA/HCA attention and 1M token context. Requires 27% of V3.2's inference FLOPs at 1M context; kvHeads/headDim approximates MLA storage.
Compare DeepSeek V4 Pro 1.6T with other models
Frequently asked questions
- What are the VRAM requirements for DeepSeek V4 Pro 1.6T?
- DeepSeek V4 Pro 1.6T requires approximately 1010.0 GB of VRAM at Q4_K_M quantization, 1793.1 GB at Q8, and 3585.1 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
- How many parameters does DeepSeek V4 Pro 1.6T have?
- DeepSeek V4 Pro 1.6T has 1600 billion total parameters, but only 49 billion are active per token thanks to its Mixture of Experts (MoE) architecture. This makes inference significantly faster than the total parameter count suggests.
- How capable is DeepSeek V4 Pro 1.6T?
- DeepSeek V4 Pro 1.6T achieves an MMLU-Pro score of 87.5, placing it among the most capable open-weight models available — competitive with frontier systems on general knowledge and reasoning.
- Can DeepSeek V4 Pro 1.6T run on a 16 GB GPU?
- No. At Q4_K_M, DeepSeek V4 Pro 1.6T needs 1010.0 GB of VRAM — more than 16 GB. You will need a multi-GPU server.
- Can DeepSeek V4 Pro 1.6T run on a 24 GB GPU?
- No. Even at Q4_K_M, DeepSeek V4 Pro 1.6T needs 1010.0 GB. Consider a multi-GPU server with 80 GB+ total VRAM.
- What is the smallest quantization for DeepSeek V4 Pro 1.6T that fits in 24 GB of VRAM?
- DeepSeek V4 Pro 1.6T cannot fit in 24 GB of VRAM at any standard quantization level. The minimum needed is 590.7 GB at Q2_K.
- What GPU do I need to run DeepSeek V4 Pro 1.6T locally?
- You need a multi-GPU server. At Q4_K_M, DeepSeek V4 Pro 1.6T needs 1010.0 GB VRAM, more than any single consumer GPU. Consider 2–4× H100 or A100 GPUs.