CanItRun Logocanitrun.

Gemma 4 26B (MoE)

Gemma 4 26B (MoE) needs roughly 18.0 GB VRAM at Q4_K_M quantization (59.8 GB at FP16). 91 GPUs we track can run it fully in VRAM at 8k context.

91 GPUs run this natively · 13 with CPU offload

Google26B params3.8B active (MoE)250k contextApache 2.0Commercial use ok

Gemma 4 26B (MoE) is a Mixture of Experts (MoE) model with 26B total parameters but only 3.8B active per token developed by Google. April 2026 MoE variant with ~3.8B active parameters — optimizes latency while maintaining quality.

To run Gemma 4 26B (MoE) locally: Q4_K_M ~16-18GB — similar to Gemma-3-27B but potentially faster due to MoE architecture. As a MoE model, inference speed depends on active parameters (3.8B) rather than total size.

MoE efficiency at 26B scale — faster inference than dense 26B models when VRAM allows.

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP32104.0 GB1.41 GB118.1 GB
BF1652.0 GB1.41 GB59.8 GB
FP1652.0 GB1.41 GB59.8 GB
Q8_026.0 GB1.41 GB30.7 GB
Q6_K21.3 GB1.41 GB25.5 GB
Q5_K_M16.7 GB1.41 GB20.3 GB
Q4_K_Mrec14.6 GB1.41 GB18.0 GB
Q3_K_M11.2 GB1.41 GB14.1 GB
Q2_K8.6 GB1.41 GB11.2 GB
NVFP4cuda13.0 GB1.41 GB16.1 GB

KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.

Benchmarks

GPUs that run Gemma 4 26B (MoE) natively (91)

Plus 13 GPUs that run it with CPU offload (slower)

Notes

MoE architecture optimizing latency, activating ~3.8B parameters.

Hugging Face ↗Ollama ↗Released 2026-04-02

Frequently asked questions

What are the VRAM requirements for Gemma 4 26B (MoE)?
Gemma 4 26B (MoE) requires approximately 18.0 GB of VRAM at Q4_K_M quantization, 30.7 GB at Q8, and 59.8 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
How many parameters does Gemma 4 26B (MoE) have?
Gemma 4 26B (MoE) has 26 billion total parameters, but only 3.8 billion are active per token thanks to its Mixture of Experts (MoE) architecture. This makes inference significantly faster than the total parameter count suggests.
How capable is Gemma 4 26B (MoE)?
Gemma 4 26B (MoE) achieves an MMLU-Pro score of 82.6, placing it among the most capable open-weight models available — competitive with frontier systems on general knowledge and reasoning.
Can Gemma 4 26B (MoE) run on a 16 GB GPU?
No. At Q4_K_M, Gemma 4 26B (MoE) needs 18.0 GB of VRAM — more than 16 GB. You will need a 24 GB GPU like the RTX 4090 or RTX 3090.
Can Gemma 4 26B (MoE) run on a 24 GB GPU?
Yes. Gemma 4 26B (MoE) fits in a 24 GB GPU at Q4_K_M, requiring 18.0 GB VRAM. GPUs with 24 GB include the RTX 4090, RTX 3090, and RTX 3090 Ti.
What is the smallest quantization for Gemma 4 26B (MoE) that fits in 24 GB of VRAM?
At NVFP4, Gemma 4 26B (MoE) needs 16.1 GB — the highest-quality quantization that fits in 24 GB of VRAM.
What GPU do I need to run Gemma 4 26B (MoE) locally?
A 24 GB GPU is the minimum. At Q4_K_M, Gemma 4 26B (MoE) needs 18.0 GB VRAM. Good options: RTX 4090 (24 GB), RTX 3090 (24 GB).