CanItRun Logocanitrun.

Phi-4 14B Instruct

Phi-4 14B Instruct needs roughly 10.3 GB VRAM at Q4_K_M quantization (32.9 GB at FP16). 99 GPUs we track can run it fully in VRAM at 8k context.

99 GPUs run this natively · 5 with CPU offload

Microsoft14B params16k contextMITCommercial use ok

Phi-4 14B Instruct is a 14B parameter dense model developed by Microsoft. December 2024 14B model with MIT licensing — compact yet competitive.

To run Phi-4 14B Instruct locally: Q5_K_M ~10-11GB — fits on 12GB GPUs. Excellent mid-range choice.

MMLU-Pro 56.1%, Math 80.4% — punches above weight class.

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP3256.0 GB1.34 GB64.2 GB
BF1628.0 GB1.34 GB32.9 GB
FP1628.0 GB1.34 GB32.9 GB
Q8_014.0 GB1.34 GB17.2 GB
Q6_K11.5 GB1.34 GB14.4 GB
Q5_K_Mrec9.0 GB1.34 GB11.6 GB
Q4_K_M7.9 GB1.34 GB10.3 GB
Q3_K_M6.0 GB1.34 GB8.3 GB
Q2_K4.6 GB1.34 GB6.7 GB
NVFP4cuda7.0 GB1.34 GB9.3 GB

KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.

Benchmarks

MATH
80.4

GPUs that run Phi-4 14B Instruct natively (99)

Plus 5 GPUs that run it with CPU offload (slower)
Hugging Face ↗Ollama ↗Released 2024-12-13

Compare Phi-4 14B Instruct with other models

Frequently asked questions

What are the VRAM requirements for Phi-4 14B Instruct?
Phi-4 14B Instruct requires approximately 10.3 GB of VRAM at Q4_K_M quantization, 17.2 GB at Q8, and 32.9 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
How many parameters does Phi-4 14B Instruct have?
Phi-4 14B Instruct has 14 billion parameters.
How capable is Phi-4 14B Instruct?
Phi-4 14B Instruct achieves an MMLU-Pro score of 70.4, placing it among the most capable open-weight models available — competitive with frontier systems on general knowledge and reasoning.
Can Phi-4 14B Instruct run on a 16 GB GPU?
Yes. Phi-4 14B Instruct needs 10.3 GB at Q4_K_M, which fits in a 16 GB GPU like the RTX 4080 or RTX 4070 Ti Super.
What is the smallest quantization for Phi-4 14B Instruct that fits in 24 GB of VRAM?
At NVFP4, Phi-4 14B Instruct needs 9.3 GB — the highest-quality quantization that fits in 24 GB of VRAM.
What GPU do I need to run Phi-4 14B Instruct locally?
A 16 GB GPU is enough. At Q4_K_M, Phi-4 14B Instruct needs 10.3 GB VRAM. Good options: RTX 4080 (16 GB), RTX 4070 Ti Super (16 GB).