CanItRun Logocanitrun.

Nemotron 3 Nano 30B

Nemotron 3 Nano 30B needs roughly 20.7 GB VRAM at Q4_K_M quantization (72.2 GB at FP16). 76 GPUs we track can run it fully in VRAM at 8k context.

76 GPUs run this natively · 19 with CPU offload

NVIDIA32B params3B active (MoE)1024k contextNVIDIACommercial use ok

Nemotron 3 Nano 30B is a Mixture of Experts (MoE) model with 32B total parameters but only 3B active per token developed by NVIDIA. December 2025 32B MoE with only 3B active per token. Hybrid Mamba-Transformer architecture with 1M context.

To run Nemotron 3 Nano 30B locally: Q5_K_M ~20-22GB — fits on 24GB GPU. Exceptional tokens/sec due to 3B active params. As a MoE model, inference speed depends on active parameters (3B) rather than total size.

Optimized for agentic workflows — MoE efficiency with Mamba speed.

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP32128.0 GB0.44 GB143.8 GB
BF1664.0 GB0.44 GB72.2 GB
FP1664.0 GB0.44 GB72.2 GB
Q8_032.0 GB0.44 GB36.3 GB
Q6_K26.2 GB0.44 GB29.9 GB
Q5_K_Mrec20.6 GB0.44 GB23.6 GB
Q4_K_M18.0 GB0.44 GB20.7 GB
Q3_K_M13.8 GB0.44 GB15.9 GB
Q2_K10.5 GB0.44 GB12.3 GB
NVFP4cuda16.0 GB0.44 GB18.4 GB

KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.

Benchmarks

GPUs that run Nemotron 3 Nano 30B natively (76)

Plus 19 GPUs that run it with CPU offload (slower)

Notes

Hybrid Mamba-Transformer MoE — 30B total / 3B active per token. Optimized for agentic workflows.

Hugging Face ↗Released 2025-12-01

Frequently asked questions

What are the VRAM requirements for Nemotron 3 Nano 30B?
Nemotron 3 Nano 30B requires approximately 20.7 GB of VRAM at Q4_K_M quantization, 36.3 GB at Q8, and 72.2 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
How many parameters does Nemotron 3 Nano 30B have?
Nemotron 3 Nano 30B has 32 billion total parameters, but only 3 billion are active per token thanks to its Mixture of Experts (MoE) architecture. This makes inference significantly faster than the total parameter count suggests.
How capable is Nemotron 3 Nano 30B?
Nemotron 3 Nano 30B achieves an MMLU-Pro score of 78.3, placing it among the most capable open-weight models available — competitive with frontier systems on general knowledge and reasoning.
Can Nemotron 3 Nano 30B run on a 16 GB GPU?
No. At Q4_K_M, Nemotron 3 Nano 30B needs 20.7 GB of VRAM — more than 16 GB. You will need a 24 GB GPU like the RTX 4090 or RTX 3090.
Can Nemotron 3 Nano 30B run on a 24 GB GPU?
Yes. Nemotron 3 Nano 30B fits in a 24 GB GPU at Q4_K_M, requiring 20.7 GB VRAM. GPUs with 24 GB include the RTX 4090, RTX 3090, and RTX 3090 Ti.
What is the smallest quantization for Nemotron 3 Nano 30B that fits in 24 GB of VRAM?
At NVFP4, Nemotron 3 Nano 30B needs 18.4 GB — the highest-quality quantization that fits in 24 GB of VRAM.
What GPU do I need to run Nemotron 3 Nano 30B locally?
A 24 GB GPU is the minimum. At Q4_K_M, Nemotron 3 Nano 30B needs 20.7 GB VRAM. Good options: RTX 4090 (24 GB), RTX 3090 (24 GB).