Gemma 3 27B Instruct
Gemma 3 27B Instruct needs roughly 18.8 GB VRAM at Q4_K_M quantization (62.2 GB at FP16). 85 GPUs we track can run it fully in VRAM at 8k context.
85 GPUs run this natively · 19 with CPU offload
Gemma 3 27B Instruct is a 27B parameter dense model developed by Google. March 2025 multimodal model with native vision via SigLIP 400M encoder. 128K context with 5:1 local/global attention interleaving.
To run Gemma 3 27B Instruct locally: Q4_K_M needs ~32-33GB with context — 24GB GPU can run it but 32GB+ recommended for vision tasks. KV-cache optimization gives 60% memory reduction.
MMLU-Pro 67.5%, LiveCodeBench 29.7%, MMMU (vision) 64.9% — Chatbot Arena Elo 1338 ranks it best open non-thinking model.
VRAM at each quantization
Assumes 8k context. KV cache grows linearly with context length.
| Quant | Weights | KV cache | Total |
|---|---|---|---|
| FP32 | 108.0 GB | 1.54 GB | 122.7 GB |
| BF16 | 54.0 GB | 1.54 GB | 62.2 GB |
| FP16 | 54.0 GB | 1.54 GB | 62.2 GB |
| Q8_0 | 27.0 GB | 1.54 GB | 32.0 GB |
| Q6_K | 22.1 GB | 1.54 GB | 26.5 GB |
| Q5_K_M | 17.4 GB | 1.54 GB | 21.2 GB |
| Q4_K_Mrec | 15.2 GB | 1.54 GB | 18.8 GB |
| Q3_K_M | 11.6 GB | 1.54 GB | 14.7 GB |
| Q2_K | 8.9 GB | 1.54 GB | 11.7 GB |
| NVFP4cuda | 13.5 GB | 1.54 GB | 16.9 GB |
KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.
Benchmarks
GPUs that run Gemma 3 27B Instruct natively (85)
- NVIDIA RTX 5090NVFP4 · 132.7 t/s
- NVIDIA RTX 5080Q3_K_M · 82.7 t/s
- NVIDIA RTX 5070 TiQ3_K_M · 77.2 t/s
- NVIDIA RTX 5060 Ti 16GBQ3_K_M · 38.6 t/s
- NVIDIA RTX 4090NVFP4 · 74.7 t/s
- NVIDIA RTX 4080Q3_K_M · 61.8 t/s
- NVIDIA RTX 4060 Ti 16GBQ3_K_M · 24.8 t/s
- NVIDIA RTX 3090NVFP4 · 69.3 t/s
- NVIDIA RTX 3090 TiNVFP4 · 74.7 t/s
- NVIDIA H100 80GBBF16 · 62 t/s
- NVIDIA A100 80GBBF16 · 37.8 t/s
- NVIDIA A100 40GBNVFP4 · 115.2 t/s
- NVIDIA L40SNVFP4 · 64 t/s
- NVIDIA RTX A6000NVFP4 · 56.9 t/s
- NVIDIA RTX 4000 AdaNVFP4 · 23.7 t/s
- NVIDIA RTX 4500 AdaNVFP4 · 32 t/s
- NVIDIA RTX 5000 AdaNVFP4 · 42.7 t/s
- NVIDIA RTX 6000 AdaNVFP4 · 71.1 t/s
- NVIDIA RTX Pro 6000BF16 · 24.9 t/s
- NVIDIA DGX Spark (128GB)FP32 · 2.5 t/s
- AMD Radeon RX 7900 XTXQ5_K_M · 55.2 t/s
- AMD Radeon RX 7900 XTQ4_K_M · 52.6 t/s
- AMD Radeon RX 7900 GREQ3_K_M · 49.6 t/s
- AMD Radeon RX 6800 XTQ3_K_M · 44.1 t/s
- AMD Radeon PRO W7800Q6_K · 26 t/s
- AMD Radeon PRO W7900Q8_0 · 32 t/s
- AMD Instinct MI300XFP32 · 49.1 t/s
- AMD Radeon AI Pro 9700 32GBQ6_K · 28.9 t/s
- AMD Strix Halo (128GB)FP32 · 2.4 t/s
- AMD Strix Halo (96GB)BF16 · 4.7 t/s
- AMD Strix Halo (64GB)Q8_0 · 9.5 t/s
- Apple M5 Max (128GB)FP32 · 5.7 t/s
- Apple M5 Max (64GB)Q8_0 · 22.7 t/s
- Apple M5 Max (48GB)Q8_0 · 22.7 t/s
- Apple M5 Pro (48GB)Q8_0 · 11.4 t/s
- Apple M5 Pro (36GB)Q8_0 · 11.4 t/s
- Apple M5 Pro (24GB)Q4_K_M · 20.2 t/s
- Apple M5 (32GB)Q6_K · 6.9 t/s
- Apple M5 (16GB)Q2_K · 17.2 t/s
- Apple M4 Ultra (384GB)FP32 · 10.1 t/s
- Apple M4 Ultra (192GB)FP32 · 10.1 t/s
- Apple M4 Max (128GB)FP32 · 5.1 t/s
- Apple M4 Max (96GB)BF16 · 10.1 t/s
- Apple M4 Max (64GB)Q8_0 · 20.2 t/s
- Apple M4 Max (48GB)Q8_0 · 20.2 t/s
- Apple M4 Pro (48GB)Q8_0 · 10.1 t/s
- Apple M4 Pro (24GB)Q4_K_M · 18 t/s
- Apple M4 (32GB)Q6_K · 5.4 t/s
- Apple M4 (16GB)Q2_K · 13.5 t/s
- Apple M3 Ultra (512GB)FP32 · 7.6 t/s
- Apple M3 Ultra (256GB)FP32 · 7.6 t/s
- Apple M3 Ultra (96GB)BF16 · 15.2 t/s
- Apple M3 Max (128GB)FP32 · 3.7 t/s
- Apple M3 Max (96GB)BF16 · 7.4 t/s
- Apple M3 Max (64GB)Q8_0 · 14.8 t/s
- Apple M3 Max (48GB)Q8_0 · 14.8 t/s
- Apple M3 Max (36GB)Q8_0 · 14.8 t/s
- Apple M3 Pro (36GB)Q8_0 · 5.6 t/s
- Apple M3 Pro (18GB)Q2_K · 16.9 t/s
- Apple M3 (24GB)Q4_K_M · 6.6 t/s
- Apple M3 (16GB)Q2_K · 11.3 t/s
- Apple M2 Ultra (384GB)FP32 · 7.4 t/s
- Apple M2 Ultra (192GB)FP32 · 7.4 t/s
- Apple M2 Max (96GB)BF16 · 7.4 t/s
- Apple M2 Max (64GB)Q8_0 · 14.8 t/s
- Apple M2 Max (32GB)Q6_K · 18.1 t/s
- Apple M2 Pro (32GB)Q6_K · 9 t/s
- Apple M2 Pro (16GB)Q2_K · 22.5 t/s
- Apple M2 (24GB)Q4_K_M · 6.6 t/s
- Apple M2 (16GB)Q2_K · 11.3 t/s
- Apple M1 Ultra (128GB)FP32 · 7.4 t/s
- Apple M1 Ultra (64GB)Q8_0 · 29.6 t/s
- Apple M1 Max (64GB)Q8_0 · 14.8 t/s
- Apple M1 Max (32GB)Q6_K · 18.1 t/s
- Apple M1 Pro (32GB)Q6_K · 9 t/s
- Apple M1 Pro (16GB)Q2_K · 22.5 t/s
- Apple M1 (16GB)Q2_K · 7.7 t/s
- Intel Arc Pro B70 24GBQ5_K_M · 26.2 t/s
- Intel Arc Pro B60 24GBQ5_K_M · 21.9 t/s
- Intel Arc A770 16GBQ3_K_M · 48.2 t/s
- Intel Data Center GPU Max 1550BF16 · 60.7 t/s
- Intel Data Center GPU Max 1100Q8_0 · 45.5 t/s
- Intel Arc 140V (32GB)Q6_K · 6.2 t/s
- Intel Arc 140V (16GB)Q2_K · 15.4 t/s
- Intel Arc 130V (16GB)Q2_K · 15.4 t/s
Plus 19 GPUs that run it with CPU offload (slower)
- NVIDIA RTX 5070NVFP4 · 12.4 t/s
- NVIDIA RTX 5060NVFP4 · 8.3 t/s
- NVIDIA RTX 5050NVFP4 · 5.9 t/s
- NVIDIA RTX 4070 TiNVFP4 · 9.3 t/s
- NVIDIA RTX 4070NVFP4 · 9.3 t/s
- NVIDIA RTX 4060NVFP4 · 5 t/s
- NVIDIA RTX 3080 10GBNVFP4 · 14.1 t/s
- NVIDIA RTX 3060 12GBNVFP4 · 6.7 t/s
- Intel Arc B580 12GBQ8_0 · 4.2 t/s
- Intel Arc B570 10GBQ8_0 · 3.5 t/s
- Intel Arc A770 8GBQ8_0 · 4.7 t/s
- Intel Arc A750 8GBQ8_0 · 4.7 t/s
- Intel Arc A580 8GBQ8_0 · 4.7 t/s
- Intel Arc A380 6GBQ6_K · 2.1 t/s
- Intel Arc A310 4GBQ6_K · 1.4 t/s
- Intel Arc Pro A60 12GBQ8_0 · 3.6 t/s
- Intel Arc Pro A50 6GBQ6_K · 2.2 t/s
- Intel Arc Pro A40 6GBQ6_K · 2.2 t/s
- CPU only (system RAM)Q6_K · 0.5 t/s
Notes
5:1 local/global attention interleaving.
Compare Gemma 3 27B Instruct with other models
Frequently asked questions
- What are the VRAM requirements for Gemma 3 27B Instruct?
- Gemma 3 27B Instruct requires approximately 18.8 GB of VRAM at Q4_K_M quantization, 32.0 GB at Q8, and 62.2 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
- How many parameters does Gemma 3 27B Instruct have?
- Gemma 3 27B Instruct has 27 billion parameters.
- How capable is Gemma 3 27B Instruct?
- With an MMLU-Pro score of 67.5, Gemma 3 27B Instruct delivers solid general-purpose performance suitable for most everyday tasks and professional use.
- Can Gemma 3 27B Instruct run on a 16 GB GPU?
- No. At Q4_K_M, Gemma 3 27B Instruct needs 18.8 GB of VRAM — more than 16 GB. You will need a 24 GB GPU like the RTX 4090 or RTX 3090.
- Can Gemma 3 27B Instruct run on a 24 GB GPU?
- Yes. Gemma 3 27B Instruct fits in a 24 GB GPU at Q4_K_M, requiring 18.8 GB VRAM. GPUs with 24 GB include the RTX 4090, RTX 3090, and RTX 3090 Ti.
- What is the smallest quantization for Gemma 3 27B Instruct that fits in 24 GB of VRAM?
- At NVFP4, Gemma 3 27B Instruct needs 16.8 GB — the highest-quality quantization that fits in 24 GB of VRAM.
- What GPU do I need to run Gemma 3 27B Instruct locally?
- A 24 GB GPU is the minimum. At Q4_K_M, Gemma 3 27B Instruct needs 18.8 GB VRAM. Good options: RTX 4090 (24 GB), RTX 3090 (24 GB).