CanItRun Logocanitrun.

Qwen3 30B-A3B (MoE)

Qwen3 30B-A3B (MoE) needs roughly 15.0GB VRAM at Q4 quantization (60.0GB at FP16). 55 GPUs we track can run it fully in VRAM at 8k context.

Alibaba30B params3B active (MoE)128k contextApache 2.0Commercial use ok

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP1660.0 GB0.81 GB68.1 GB
Q830.0 GB0.81 GB34.5 GB
Q6_K22.5 GB0.81 GB26.1 GB
Q5_K_M18.8 GB0.81 GB21.9 GB
Q4_K_M15.0 GB0.81 GB17.7 GB
Q3_K_M12.0 GB0.81 GB14.3 GB
Q2_K9.0 GB0.81 GB11.0 GB

Benchmarks

GPUs that run Qwen3 30B-A3B (MoE) natively (55)

Plus 3 GPUs that run it with CPU offload (slower)

Notes

30B total, only 3B active per token — fast inference when it fits.

Hugging Face ↗Ollama ↗Released 2025-04-29