Mistral Small 22B
Mistral Small 22B needs roughly 16.1 GB VRAM at Q4_K_M quantization (51.8 GB at FP16). 91 GPUs we track can run it fully in VRAM at 8k context.
91 GPUs run this natively · 13 with CPU offload
Mistral AI22.2B params32k contextMistral ResearchNon-commercial only
Mistral Small 22B is a 22.2B parameter dense model developed by Mistral AI. September 2024 22B model with strong general capabilities.
To run Mistral Small 22B locally: Q4_K_M ~14-15GB — fits on 16GB GPUs. Good 16GB GPU choice.
MMLU-Pro 49.2%, HumanEval 81.1% — solid mid-range performer.
VRAM at each quantization
Assumes 8k context. KV cache grows linearly with context length.
| Quant | Weights | KV cache | Total |
|---|---|---|---|
| FP32 | 88.8 GB | 1.88 GB | 101.6 GB |
| BF16 | 44.4 GB | 1.88 GB | 51.8 GB |
| FP16 | 44.4 GB | 1.88 GB | 51.8 GB |
| Q8_0 | 22.2 GB | 1.88 GB | 27.0 GB |
| Q6_K | 18.2 GB | 1.88 GB | 22.5 GB |
| Q5_K_M | 14.3 GB | 1.88 GB | 18.1 GB |
| Q4_K_Mrec | 12.5 GB | 1.88 GB | 16.1 GB |
| Q3_K_M | 9.6 GB | 1.88 GB | 12.8 GB |
| Q2_K | 7.3 GB | 1.88 GB | 10.3 GB |
| NVFP4cuda | 11.1 GB | 1.88 GB | 14.5 GB |
KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.
Benchmarks
GPUs that run Mistral Small 22B natively (91)
- NVIDIA RTX 5090NVFP4 · 161.4 t/s
- NVIDIA RTX 5080NVFP4 · 86.5 t/s
- NVIDIA RTX 5070 TiNVFP4 · 80.7 t/s
- NVIDIA RTX 5070Q2_K · 92 t/s
- NVIDIA RTX 5060 Ti 16GBNVFP4 · 40.4 t/s
- NVIDIA RTX 4090NVFP4 · 90.8 t/s
- NVIDIA RTX 4080NVFP4 · 64.6 t/s
- NVIDIA RTX 4070 TiQ2_K · 69 t/s
- NVIDIA RTX 4070Q2_K · 69 t/s
- NVIDIA RTX 4060 Ti 16GBNVFP4 · 25.9 t/s
- NVIDIA RTX 3090NVFP4 · 84.3 t/s
- NVIDIA RTX 3090 TiNVFP4 · 90.8 t/s
- NVIDIA RTX 3060 12GBQ2_K · 49.3 t/s
- NVIDIA H100 80GBBF16 · 75.5 t/s
- NVIDIA A100 80GBBF16 · 45.9 t/s
- NVIDIA A100 40GBNVFP4 · 140.1 t/s
- NVIDIA L40SNVFP4 · 77.8 t/s
- NVIDIA RTX A6000NVFP4 · 69.2 t/s
- NVIDIA RTX 4000 AdaNVFP4 · 28.8 t/s
- NVIDIA RTX 4500 AdaNVFP4 · 38.9 t/s
- NVIDIA RTX 5000 AdaNVFP4 · 51.9 t/s
- NVIDIA RTX 6000 AdaNVFP4 · 86.5 t/s
- NVIDIA RTX Pro 6000BF16 · 30.3 t/s
- NVIDIA DGX Spark (128GB)FP32 · 3.1 t/s
- AMD Radeon RX 7900 XTXQ6_K · 52.7 t/s
- AMD Radeon RX 7900 XTQ5_K_M · 56 t/s
- AMD Radeon RX 7900 GREQ3_K_M · 60.3 t/s
- AMD Radeon RX 6800 XTQ3_K_M · 53.6 t/s
- AMD Radeon PRO W7800Q8_0 · 25.9 t/s
- AMD Radeon PRO W7900Q8_0 · 38.9 t/s
- AMD Instinct MI300XFP32 · 59.7 t/s
- AMD Radeon AI Pro 9700 32GBQ8_0 · 28.8 t/s
- AMD Strix Halo (128GB)FP32 · 2.9 t/s
- AMD Strix Halo (96GB)BF16 · 5.8 t/s
- AMD Strix Halo (64GB)BF16 · 5.8 t/s
- Apple M5 Max (128GB)FP32 · 6.9 t/s
- Apple M5 Max (64GB)BF16 · 13.8 t/s
- Apple M5 Max (48GB)Q8_0 · 27.7 t/s
- Apple M5 Pro (48GB)Q8_0 · 13.8 t/s
- Apple M5 Pro (36GB)Q8_0 · 13.8 t/s
- Apple M5 Pro (24GB)Q5_K_M · 21.5 t/s
- Apple M5 (32GB)Q8_0 · 6.9 t/s
- Apple M5 (16GB)Q2_K · 20.9 t/s
- Apple M4 Ultra (384GB)FP32 · 12.3 t/s
- Apple M4 Ultra (192GB)FP32 · 12.3 t/s
- Apple M4 Max (128GB)FP32 · 6.1 t/s
- Apple M4 Max (96GB)BF16 · 12.3 t/s
- Apple M4 Max (64GB)BF16 · 12.3 t/s
- Apple M4 Max (48GB)Q8_0 · 24.6 t/s
- Apple M4 Pro (48GB)Q8_0 · 12.3 t/s
- Apple M4 Pro (24GB)Q5_K_M · 19.1 t/s
- Apple M4 (32GB)Q8_0 · 5.4 t/s
- Apple M4 (16GB)Q2_K · 16.4 t/s
- Apple M3 Ultra (512GB)FP32 · 9.2 t/s
- Apple M3 Ultra (256GB)FP32 · 9.2 t/s
- Apple M3 Ultra (96GB)BF16 · 18.4 t/s
- Apple M3 Max (128GB)FP32 · 4.5 t/s
- Apple M3 Max (96GB)BF16 · 9 t/s
- Apple M3 Max (64GB)BF16 · 9 t/s
- Apple M3 Max (48GB)Q8_0 · 18 t/s
- Apple M3 Max (36GB)Q8_0 · 18 t/s
- Apple M3 Pro (36GB)Q8_0 · 6.8 t/s
- Apple M3 Pro (18GB)Q3_K_M · 15.7 t/s
- Apple M3 (24GB)Q5_K_M · 7 t/s
- Apple M3 (16GB)Q2_K · 13.7 t/s
- Apple M2 Ultra (384GB)FP32 · 9 t/s
- Apple M2 Ultra (192GB)FP32 · 9 t/s
- Apple M2 Max (96GB)BF16 · 9 t/s
- Apple M2 Max (64GB)BF16 · 9 t/s
- Apple M2 Max (32GB)Q8_0 · 18 t/s
- Apple M2 Pro (32GB)Q8_0 · 9 t/s
- Apple M2 Pro (16GB)Q2_K · 27.4 t/s
- Apple M2 (24GB)Q5_K_M · 7 t/s
- Apple M2 (16GB)Q2_K · 13.7 t/s
- Apple M1 Ultra (128GB)FP32 · 9 t/s
- Apple M1 Ultra (64GB)BF16 · 18 t/s
- Apple M1 Max (64GB)BF16 · 9 t/s
- Apple M1 Max (32GB)Q8_0 · 18 t/s
- Apple M1 Pro (32GB)Q8_0 · 9 t/s
- Apple M1 Pro (16GB)Q2_K · 27.4 t/s
- Apple M1 (16GB)Q2_K · 9.3 t/s
- Intel Arc B580 12GBQ2_K · 62.4 t/s
- Intel Arc Pro B70 24GBQ6_K · 25 t/s
- Intel Arc Pro B60 24GBQ6_K · 20.9 t/s
- Intel Arc A770 16GBQ3_K_M · 58.7 t/s
- Intel Arc Pro A60 12GBQ2_K · 52.6 t/s
- Intel Data Center GPU Max 1550FP32 · 36.9 t/s
- Intel Data Center GPU Max 1100Q8_0 · 55.4 t/s
- Intel Arc 140V (32GB)Q8_0 · 6.2 t/s
- Intel Arc 140V (16GB)Q2_K · 18.8 t/s
- Intel Arc 130V (16GB)Q2_K · 18.8 t/s
Plus 13 GPUs that run it with CPU offload (slower)
- NVIDIA RTX 5060NVFP4 · 10.1 t/s
- NVIDIA RTX 5050NVFP4 · 7.2 t/s
- NVIDIA RTX 4060NVFP4 · 6.1 t/s
- NVIDIA RTX 3080 10GBNVFP4 · 17.1 t/s
- Intel Arc B570 10GBQ8_0 · 4.3 t/s
- Intel Arc A770 8GBQ8_0 · 5.8 t/s
- Intel Arc A750 8GBQ8_0 · 5.8 t/s
- Intel Arc A580 8GBQ8_0 · 5.8 t/s
- Intel Arc A380 6GBQ8_0 · 2.1 t/s
- Intel Arc A310 4GBQ8_0 · 1.4 t/s
- Intel Arc Pro A50 6GBQ8_0 · 2.2 t/s
- Intel Arc Pro A40 6GBQ8_0 · 2.2 t/s
- CPU only (system RAM)Q8_0 · 0.5 t/s
Frequently asked questions
- What are the VRAM requirements for Mistral Small 22B?
- Mistral Small 22B requires approximately 16.1 GB of VRAM at Q4_K_M quantization, 27.0 GB at Q8, and 51.8 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
- How many parameters does Mistral Small 22B have?
- Mistral Small 22B has 22.2 billion parameters.
- How capable is Mistral Small 22B?
- Mistral Small 22B has an MMLU-Pro score of 49.2, making it well-suited for lightweight tasks, prototyping, and resource-constrained environments.
- Can Mistral Small 22B run on a 16 GB GPU?
- No. At Q4_K_M, Mistral Small 22B needs 16.1 GB of VRAM — more than 16 GB. You will need a 24 GB GPU like the RTX 4090 or RTX 3090.
- Can Mistral Small 22B run on a 24 GB GPU?
- Yes. Mistral Small 22B fits in a 24 GB GPU at Q4_K_M, requiring 16.1 GB VRAM. GPUs with 24 GB include the RTX 4090, RTX 3090, and RTX 3090 Ti.
- What is the smallest quantization for Mistral Small 22B that fits in 24 GB of VRAM?
- At NVFP4, Mistral Small 22B needs 14.5 GB — the highest-quality quantization that fits in 24 GB of VRAM.
- What GPU do I need to run Mistral Small 22B locally?
- A 24 GB GPU is the minimum. At Q4_K_M, Mistral Small 22B needs 16.1 GB VRAM. Good options: RTX 4090 (24 GB), RTX 3090 (24 GB).