Question 1

What are the VRAM requirements for Nemotron 3 Super 120B?

Accepted Answer

Nemotron 3 Super 120B requires approximately 82.7 GB of VRAM at Q4_K_M quantization, 143.7 GB at Q8, and 269.6 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.

Question 2

How many parameters does Nemotron 3 Super 120B have?

Accepted Answer

Nemotron 3 Super 120B has 120 billion total parameters, but only 12 billion are active per token thanks to its Mixture of Experts (MoE) architecture. This makes inference significantly faster than the total parameter count suggests.

Question 3

How capable is Nemotron 3 Super 120B?

Accepted Answer

Nemotron 3 Super 120B achieves an MMLU-Pro score of 83.7, placing it among the most capable open-weight models available — competitive with frontier systems on general knowledge and reasoning.

Question 4

Can Nemotron 3 Super 120B run on a 16 GB GPU?

Accepted Answer

No. At Q4_K_M, Nemotron 3 Super 120B needs 82.7 GB of VRAM — more than 16 GB. You will need a multi-GPU server.

Question 5

Can Nemotron 3 Super 120B run on a 24 GB GPU?

Accepted Answer

No. Even at Q4_K_M, Nemotron 3 Super 120B needs 82.7 GB. Consider a multi-GPU server with 80 GB+ total VRAM.

Question 6

What is the smallest quantization for Nemotron 3 Super 120B that fits in 24 GB of VRAM?

Accepted Answer

Nemotron 3 Super 120B cannot fit in 24 GB of VRAM at any standard quantization level. The minimum needed is 52.0 GB at Q2_K.

Question 7

What GPU do I need to run Nemotron 3 Super 120B locally?

Accepted Answer

You need a multi-GPU server. At Q4_K_M, Nemotron 3 Super 120B needs 82.7 GB VRAM, more than any single consumer GPU. Consider 2–4× H100 or A100 GPUs.

Quant	Weights	KV cache	Total
FP32	480.0 GB	0.74 GB	538.4 GB
BF16	240.0 GB	0.74 GB	269.6 GB
FP16	240.0 GB	0.74 GB	269.6 GB
Q8_0	127.6 GB	0.74 GB	143.7 GB
Q6_K	98.5 GB	0.74 GB	111.2 GB
Q5_K_M	85.4 GB	0.74 GB	96.5 GB
Q4_K_M	73.1 GB	0.74 GB	82.7 GB
Q3_K_Mrec	57.7 GB	0.74 GB	65.5 GB
Q2_K	45.7 GB	0.74 GB	52.0 GB
NVFP4cuda	60.0 GB	0.74 GB	68.0 GB

Nemotron 3 Super 120B

VRAM at each quantization

Benchmarks

GPUs that run Nemotron 3 Super 120B natively (29)

Notes

Explore

Frequently asked questions