CanItRun Logocanitrun.

Gemma 4 31B

Gemma 4 31B needs roughly 23.2 GB VRAM at Q4_K_M quantization (73.0 GB at FP16). 75 GPUs we track can run it fully in VRAM at 8k context.

75 GPUs run this natively · 19 with CPU offload

Google31B params250k contextApache 2.0Commercial use ok

Gemma 4 31B is a 31B parameter dense model developed by Google. April 2026 dense model optimized for workstations and servers. 256K context with multimodal support.

To run Gemma 4 31B locally: Q4_K_M ~18-20GB — fits on 24GB GPUs. Good choice for M4 Max or RTX 4090 owners.

31B dense architecture with vision capabilities — Google's workstation-focused offering.

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP32124.0 GB3.22 GB142.5 GB
BF1662.0 GB3.22 GB73.0 GB
FP1662.0 GB3.22 GB73.0 GB
Q8_031.0 GB3.22 GB38.3 GB
Q6_K25.4 GB3.22 GB32.1 GB
Q5_K_M20.0 GB3.22 GB26.0 GB
Q4_K_Mrec17.4 GB3.22 GB23.2 GB
Q3_K_M13.3 GB3.22 GB18.5 GB
Q2_K10.2 GB3.22 GB15.0 GB
NVFP4cuda15.5 GB3.22 GB21.0 GB

KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.

Benchmarks

GPUs that run Gemma 4 31B natively (75)

Plus 19 GPUs that run it with CPU offload (slower)

Notes

Dense model optimized for workstations/servers.

Hugging Face ↗Ollama ↗Released 2026-04-02

Compare Gemma 4 31B with other models

Frequently asked questions

What are the VRAM requirements for Gemma 4 31B?
Gemma 4 31B requires approximately 23.2 GB of VRAM at Q4_K_M quantization, 38.3 GB at Q8, and 73.0 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
How many parameters does Gemma 4 31B have?
Gemma 4 31B has 31 billion parameters.
How capable is Gemma 4 31B?
Gemma 4 31B achieves an MMLU-Pro score of 85.2, placing it among the most capable open-weight models available — competitive with frontier systems on general knowledge and reasoning.
Can Gemma 4 31B run on a 16 GB GPU?
No. At Q4_K_M, Gemma 4 31B needs 23.2 GB of VRAM — more than 16 GB. You will need a 24 GB GPU like the RTX 4090 or RTX 3090.
Can Gemma 4 31B run on a 24 GB GPU?
Yes. Gemma 4 31B fits in a 24 GB GPU at Q4_K_M, requiring 23.2 GB VRAM. GPUs with 24 GB include the RTX 4090, RTX 3090, and RTX 3090 Ti.
What is the smallest quantization for Gemma 4 31B that fits in 24 GB of VRAM?
At NVFP4, Gemma 4 31B needs 21.0 GB — the highest-quality quantization that fits in 24 GB of VRAM.
What GPU do I need to run Gemma 4 31B locally?
A 24 GB GPU is the minimum. At Q4_K_M, Gemma 4 31B needs 23.2 GB VRAM. Good options: RTX 4090 (24 GB), RTX 3090 (24 GB).