CanItRun Logocanitrun.

Gemma 2 27B Instruct

Gemma 2 27B Instruct needs roughly 20.6 GB VRAM at Q4_K_M quantization (64.4 GB at FP16). 76 GPUs we track can run it fully in VRAM at 8k context.

76 GPUs run this natively · 19 with CPU offload

Google27.2B params8k contextGemmaCommercial use ok

Gemma 2 27B Instruct is a 27.2B parameter dense model developed by Google. June 2024 release with 8K context — short context but efficient architecture.

To run Gemma 2 27B Instruct locally: Q4_K_M ~16-18GB — fits on 24GB GPU with room to spare. Good mid-range option.

MMLU-Pro 38.0%, strong for its size. The 8K context keeps KV cache tiny even at full context.

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP32108.8 GB3.09 GB125.3 GB
BF1654.4 GB3.09 GB64.4 GB
FP1654.4 GB3.09 GB64.4 GB
Q8_027.2 GB3.09 GB33.9 GB
Q6_K22.3 GB3.09 GB28.4 GB
Q5_K_M17.5 GB3.09 GB23.1 GB
Q4_K_Mrec15.3 GB3.09 GB20.6 GB
Q3_K_M11.7 GB3.09 GB16.6 GB
Q2_K8.9 GB3.09 GB13.5 GB
NVFP4cuda13.6 GB3.09 GB18.7 GB

KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.

Benchmarks

GPUs that run Gemma 2 27B Instruct natively (76)

Plus 19 GPUs that run it with CPU offload (slower)

Notes

Short 8k context — KV cache is tiny even at full context.

Hugging Face ↗Ollama ↗Released 2024-06-27

Compare Gemma 2 27B Instruct with other models

Frequently asked questions

What are the VRAM requirements for Gemma 2 27B Instruct?
Gemma 2 27B Instruct requires approximately 20.6 GB of VRAM at Q4_K_M quantization, 33.9 GB at Q8, and 64.4 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
How many parameters does Gemma 2 27B Instruct have?
Gemma 2 27B Instruct has 27.2 billion parameters.
How capable is Gemma 2 27B Instruct?
Gemma 2 27B Instruct has an MMLU-Pro score of 38, making it well-suited for lightweight tasks, prototyping, and resource-constrained environments.
Can Gemma 2 27B Instruct run on a 16 GB GPU?
No. At Q4_K_M, Gemma 2 27B Instruct needs 20.6 GB of VRAM — more than 16 GB. You will need a 24 GB GPU like the RTX 4090 or RTX 3090.
Can Gemma 2 27B Instruct run on a 24 GB GPU?
Yes. Gemma 2 27B Instruct fits in a 24 GB GPU at Q4_K_M, requiring 20.6 GB VRAM. GPUs with 24 GB include the RTX 4090, RTX 3090, and RTX 3090 Ti.
What is the smallest quantization for Gemma 2 27B Instruct that fits in 24 GB of VRAM?
At NVFP4, Gemma 2 27B Instruct needs 18.7 GB — the highest-quality quantization that fits in 24 GB of VRAM.
What GPU do I need to run Gemma 2 27B Instruct locally?
A 24 GB GPU is the minimum. At Q4_K_M, Gemma 2 27B Instruct needs 20.6 GB VRAM. Good options: RTX 4090 (24 GB), RTX 3090 (24 GB).