CanItRun Logocanitrun.

Phi-3.5 Mini Instruct

Phi-3.5 Mini Instruct needs roughly 6.0 GB VRAM at Q4_K_M quantization (12.1 GB at FP16). 105 GPUs we track can run it fully in VRAM at 8k context.

105 GPUs run this natively · 2 with CPU offload

Microsoft3.8B params125k contextMITCommercial use ok

Phi-3.5 Mini Instruct is a 3.8B parameter dense model developed by Microsoft. August 2024 3.8B model with 128K context — no GQA means larger KV cache.

To run Phi-3.5 Mini Instruct locally: Q6_K ~4GB — runs on 8GB GPUs. Note: full KV heads increase context memory.

MMLU-Pro 35.6% is strong for sub-4B. HumanEval 62.8%.

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP3215.2 GB3.22 GB20.6 GB
BF167.6 GB3.22 GB12.1 GB
FP167.6 GB3.22 GB12.1 GB
Q8_03.8 GB3.22 GB7.9 GB
Q6_Krec3.1 GB3.22 GB7.1 GB
Q5_K_M2.5 GB3.22 GB6.3 GB
Q4_K_M2.1 GB3.22 GB6.0 GB
Q3_K_M1.6 GB3.22 GB5.4 GB
Q2_K1.3 GB3.22 GB5.0 GB
NVFP4cuda1.9 GB3.22 GB5.7 GB

KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.

Benchmarks

GPUs that run Phi-3.5 Mini Instruct natively (105)

Plus 2 GPUs that run it with CPU offload (slower)

Notes

No GQA — full KV heads means large KV cache at long context.

Hugging Face ↗Ollama ↗Released 2024-08-21

Compare Phi-3.5 Mini Instruct with other models

Frequently asked questions

What are the VRAM requirements for Phi-3.5 Mini Instruct?
Phi-3.5 Mini Instruct requires approximately 6.0 GB of VRAM at Q4_K_M quantization, 7.9 GB at Q8, and 12.1 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
How many parameters does Phi-3.5 Mini Instruct have?
Phi-3.5 Mini Instruct has 3.8 billion parameters.
How capable is Phi-3.5 Mini Instruct?
Phi-3.5 Mini Instruct has an MMLU-Pro score of 47.4, making it well-suited for lightweight tasks, prototyping, and resource-constrained environments.
Can Phi-3.5 Mini Instruct run on a 16 GB GPU?
Yes. Phi-3.5 Mini Instruct needs 6.0 GB at Q4_K_M, which fits in a 16 GB GPU like the RTX 4080 or RTX 4070 Ti Super.
What is the smallest quantization for Phi-3.5 Mini Instruct that fits in 24 GB of VRAM?
At FP32, Phi-3.5 Mini Instruct needs 20.6 GB — the highest-quality quantization that fits in 24 GB of VRAM.
What GPU do I need to run Phi-3.5 Mini Instruct locally?
A 16 GB GPU is enough. At Q4_K_M, Phi-3.5 Mini Instruct needs 6.0 GB VRAM. Good options: RTX 4080 (16 GB), RTX 4070 Ti Super (16 GB).