Phi-4 14B Instruct
Phi-4 14B Instruct needs roughly 10.3 GB VRAM at Q4_K_M quantization (32.9 GB at FP16). 99 GPUs we track can run it fully in VRAM at 8k context.
99 GPUs run this natively · 5 with CPU offload
Microsoft14B params16k contextMITCommercial use ok
Phi-4 14B Instruct is a 14B parameter dense model developed by Microsoft. December 2024 14B model with MIT licensing — compact yet competitive.
To run Phi-4 14B Instruct locally: Q5_K_M ~10-11GB — fits on 12GB GPUs. Excellent mid-range choice.
MMLU-Pro 56.1%, Math 80.4% — punches above weight class.
VRAM at each quantization
Assumes 8k context. KV cache grows linearly with context length.
| Quant | Weights | KV cache | Total |
|---|---|---|---|
| FP32 | 56.0 GB | 1.34 GB | 64.2 GB |
| BF16 | 28.0 GB | 1.34 GB | 32.9 GB |
| FP16 | 28.0 GB | 1.34 GB | 32.9 GB |
| Q8_0 | 14.0 GB | 1.34 GB | 17.2 GB |
| Q6_K | 11.5 GB | 1.34 GB | 14.4 GB |
| Q5_K_Mrec | 9.0 GB | 1.34 GB | 11.6 GB |
| Q4_K_M | 7.9 GB | 1.34 GB | 10.3 GB |
| Q3_K_M | 6.0 GB | 1.34 GB | 8.3 GB |
| Q2_K | 4.6 GB | 1.34 GB | 6.7 GB |
| NVFP4cuda | 7.0 GB | 1.34 GB | 9.3 GB |
KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.
Benchmarks
GPUs that run Phi-4 14B Instruct natively (99)
- NVIDIA RTX 5090NVFP4 · 256 t/s
- NVIDIA RTX 5080NVFP4 · 137.1 t/s
- NVIDIA RTX 5070 TiNVFP4 · 128 t/s
- NVIDIA RTX 5070NVFP4 · 96 t/s
- NVIDIA RTX 5060 Ti 16GBNVFP4 · 64 t/s
- NVIDIA RTX 5060Q2_K · 97.3 t/s
- NVIDIA RTX 5050Q2_K · 69.5 t/s
- NVIDIA RTX 4090NVFP4 · 144 t/s
- NVIDIA RTX 4080NVFP4 · 102.4 t/s
- NVIDIA RTX 4070 TiNVFP4 · 72 t/s
- NVIDIA RTX 4070NVFP4 · 72 t/s
- NVIDIA RTX 4060 Ti 16GBNVFP4 · 41.1 t/s
- NVIDIA RTX 4060Q2_K · 59.1 t/s
- NVIDIA RTX 3090NVFP4 · 133.7 t/s
- NVIDIA RTX 3090 TiNVFP4 · 144 t/s
- NVIDIA RTX 3080 10GBNVFP4 · 108.6 t/s
- NVIDIA RTX 3060 12GBNVFP4 · 51.4 t/s
- NVIDIA H100 80GBFP32 · 59.8 t/s
- NVIDIA A100 80GBFP32 · 36.4 t/s
- NVIDIA A100 40GBBF16 · 55.5 t/s
- NVIDIA L40SBF16 · 30.9 t/s
- NVIDIA RTX A6000BF16 · 27.4 t/s
- NVIDIA RTX 4000 AdaNVFP4 · 45.7 t/s
- NVIDIA RTX 4500 AdaNVFP4 · 61.7 t/s
- NVIDIA RTX 5000 AdaNVFP4 · 82.3 t/s
- NVIDIA RTX 6000 AdaBF16 · 34.3 t/s
- NVIDIA RTX Pro 6000FP32 · 24 t/s
- NVIDIA DGX Spark (128GB)FP32 · 4.9 t/s
- AMD Radeon RX 7900 XTXQ8_0 · 68.6 t/s
- AMD Radeon RX 7900 XTQ8_0 · 57.1 t/s
- AMD Radeon RX 7900 GREQ6_K · 50.2 t/s
- AMD Radeon RX 6800 XTQ6_K · 44.6 t/s
- AMD Radeon PRO W7800Q8_0 · 41.1 t/s
- AMD Radeon PRO W7900BF16 · 30.9 t/s
- AMD Instinct MI300XFP32 · 94.6 t/s
- AMD Radeon AI Pro 9700 32GBQ8_0 · 45.7 t/s
- AMD Strix Halo (128GB)FP32 · 4.6 t/s
- AMD Strix Halo (96GB)FP32 · 4.6 t/s
- AMD Strix Halo (64GB)BF16 · 9.1 t/s
- Apple M5 Max (128GB)FP32 · 11 t/s
- Apple M5 Max (64GB)BF16 · 21.9 t/s
- Apple M5 Max (48GB)BF16 · 21.9 t/s
- Apple M5 Pro (48GB)BF16 · 11 t/s
- Apple M5 Pro (36GB)Q8_0 · 21.9 t/s
- Apple M5 Pro (24GB)Q8_0 · 21.9 t/s
- Apple M5 (32GB)Q8_0 · 10.9 t/s
- Apple M5 (16GB)Q5_K_M · 17 t/s
- Apple M4 Ultra (384GB)FP32 · 19.5 t/s
- Apple M4 Ultra (192GB)FP32 · 19.5 t/s
- Apple M4 Max (128GB)FP32 · 9.8 t/s
- Apple M4 Max (96GB)FP32 · 9.8 t/s
- Apple M4 Max (64GB)BF16 · 19.5 t/s
- Apple M4 Max (48GB)BF16 · 19.5 t/s
- Apple M4 Pro (48GB)BF16 · 9.8 t/s
- Apple M4 Pro (24GB)Q8_0 · 19.5 t/s
- Apple M4 (32GB)Q8_0 · 8.6 t/s
- Apple M4 (16GB)Q5_K_M · 13.3 t/s
- Apple M3 Ultra (512GB)FP32 · 14.6 t/s
- Apple M3 Ultra (256GB)FP32 · 14.6 t/s
- Apple M3 Ultra (96GB)FP32 · 14.6 t/s
- Apple M3 Max (128GB)FP32 · 7.1 t/s
- Apple M3 Max (96GB)FP32 · 7.1 t/s
- Apple M3 Max (64GB)BF16 · 14.3 t/s
- Apple M3 Max (48GB)BF16 · 14.3 t/s
- Apple M3 Max (36GB)Q8_0 · 28.6 t/s
- Apple M3 Pro (36GB)Q8_0 · 10.7 t/s
- Apple M3 Pro (18GB)Q5_K_M · 16.6 t/s
- Apple M3 (24GB)Q8_0 · 7.1 t/s
- Apple M3 (16GB)Q5_K_M · 11.1 t/s
- Apple M2 Ultra (384GB)FP32 · 14.3 t/s
- Apple M2 Ultra (192GB)FP32 · 14.3 t/s
- Apple M2 Max (96GB)FP32 · 7.1 t/s
- Apple M2 Max (64GB)BF16 · 14.3 t/s
- Apple M2 Max (32GB)Q8_0 · 28.6 t/s
- Apple M2 Pro (32GB)Q8_0 · 14.3 t/s
- Apple M2 Pro (16GB)Q5_K_M · 22.2 t/s
- Apple M2 (24GB)Q8_0 · 7.1 t/s
- Apple M2 (16GB)Q5_K_M · 11.1 t/s
- Apple M1 Ultra (128GB)FP32 · 14.3 t/s
- Apple M1 Ultra (64GB)BF16 · 28.6 t/s
- Apple M1 Max (64GB)BF16 · 14.3 t/s
- Apple M1 Max (32GB)Q8_0 · 28.6 t/s
- Apple M1 Pro (32GB)Q8_0 · 14.3 t/s
- Apple M1 Pro (16GB)Q5_K_M · 22.2 t/s
- Apple M1 (16GB)Q5_K_M · 7.5 t/s
- Intel Arc B580 12GBQ4_K_M · 57.9 t/s
- Intel Arc B570 10GBQ3_K_M · 63.1 t/s
- Intel Arc Pro B70 24GBQ8_0 · 32.6 t/s
- Intel Arc Pro B60 24GBQ8_0 · 27.1 t/s
- Intel Arc A770 16GBQ6_K · 48.8 t/s
- Intel Arc A770 8GBQ2_K · 111.2 t/s
- Intel Arc A750 8GBQ2_K · 111.2 t/s
- Intel Arc A580 8GBQ2_K · 111.2 t/s
- Intel Arc Pro A60 12GBQ4_K_M · 48.7 t/s
- Intel Data Center GPU Max 1550FP32 · 58.5 t/s
- Intel Data Center GPU Max 1100BF16 · 43.9 t/s
- Intel Arc 140V (32GB)Q8_0 · 9.8 t/s
- Intel Arc 140V (16GB)Q5_K_M · 15.2 t/s
- Intel Arc 130V (16GB)Q5_K_M · 15.2 t/s
Plus 5 GPUs that run it with CPU offload (slower)
- Intel Arc A380 6GBQ8_0 · 3.3 t/s
- Intel Arc A310 4GBQ8_0 · 2.2 t/s
- Intel Arc Pro A50 6GBQ8_0 · 3.4 t/s
- Intel Arc Pro A40 6GBQ8_0 · 3.4 t/s
- CPU only (system RAM)Q8_0 · 0.7 t/s
Compare Phi-4 14B Instruct with other models
Frequently asked questions
- What are the VRAM requirements for Phi-4 14B Instruct?
- Phi-4 14B Instruct requires approximately 10.3 GB of VRAM at Q4_K_M quantization, 17.2 GB at Q8, and 32.9 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
- How many parameters does Phi-4 14B Instruct have?
- Phi-4 14B Instruct has 14 billion parameters.
- How capable is Phi-4 14B Instruct?
- Phi-4 14B Instruct achieves an MMLU-Pro score of 70.4, placing it among the most capable open-weight models available — competitive with frontier systems on general knowledge and reasoning.
- Can Phi-4 14B Instruct run on a 16 GB GPU?
- Yes. Phi-4 14B Instruct needs 10.3 GB at Q4_K_M, which fits in a 16 GB GPU like the RTX 4080 or RTX 4070 Ti Super.
- What is the smallest quantization for Phi-4 14B Instruct that fits in 24 GB of VRAM?
- At NVFP4, Phi-4 14B Instruct needs 9.3 GB — the highest-quality quantization that fits in 24 GB of VRAM.
- What GPU do I need to run Phi-4 14B Instruct locally?
- A 16 GB GPU is enough. At Q4_K_M, Phi-4 14B Instruct needs 10.3 GB VRAM. Good options: RTX 4080 (16 GB), RTX 4070 Ti Super (16 GB).