CanItRun Logocanitrun.

GPT-OSS 120B

GPT-OSS 120B needs roughly 74.5 GB VRAM at Q4_K_M quantization (262.8 GB at FP16). 39 GPUs we track can run it fully in VRAM at 8k context.

39 GPUs run this natively · 14 with CPU offload

OpenAI117B params5B active (MoE)128k contextApache 2.0Commercial use ok

GPT-OSS 120B is a Mixture of Experts (MoE) model with 117B total parameters but only 5B active per token developed by OpenAI. August 2025 117B MoE with 5B active. Alternating sliding+full attention. Apache 2.0.

To run GPT-OSS 120B locally: Q4_K_M ~70-80GB — fits on 80GB GPU or Mac Studio. Best open reasoning model at this size. As a MoE model, inference speed depends on active parameters (5B) rather than total size.

GPQA 80.1% — near-parity with o4-mini. Fits on single 80GB GPU at Q4.

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP32468.0 GB0.60 GB524.8 GB
BF16234.0 GB0.60 GB262.8 GB
FP16234.0 GB0.60 GB262.8 GB
Q8_0117.0 GB0.60 GB131.7 GB
Q6_K95.9 GB0.60 GB108.1 GB
Q5_K_M75.3 GB0.60 GB85.1 GB
Q4_K_Mrec65.9 GB0.60 GB74.5 GB
Q3_K_M50.3 GB0.60 GB57.0 GB
Q2_K38.5 GB0.60 GB43.8 GB
NVFP4cuda58.5 GB0.60 GB66.2 GB

KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.

Benchmarks

GPUs that run GPT-OSS 120B natively (39)

Plus 14 GPUs that run it with CPU offload (slower)

Notes

Alternating sliding+full attention MoE. Near-parity with o4-mini; fits on a single 80 GB GPU at q4.

Hugging Face ↗Released 2025-08-05

Frequently asked questions

What are the VRAM requirements for GPT-OSS 120B?
GPT-OSS 120B requires approximately 74.5 GB of VRAM at Q4_K_M quantization, 131.7 GB at Q8, and 262.8 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
How many parameters does GPT-OSS 120B have?
GPT-OSS 120B has 117 billion total parameters, but only 5 billion are active per token thanks to its Mixture of Experts (MoE) architecture. This makes inference significantly faster than the total parameter count suggests.
How capable is GPT-OSS 120B?
GPT-OSS 120B achieves an MMLU-Pro score of 80.7, placing it among the most capable open-weight models available — competitive with frontier systems on general knowledge and reasoning.
Can GPT-OSS 120B run on a 16 GB GPU?
No. At Q4_K_M, GPT-OSS 120B needs 74.5 GB of VRAM — more than 16 GB. You will need a multi-GPU server.
Can GPT-OSS 120B run on a 24 GB GPU?
No. Even at Q4_K_M, GPT-OSS 120B needs 74.5 GB. Consider a multi-GPU server with 80 GB+ total VRAM.
What is the smallest quantization for GPT-OSS 120B that fits in 24 GB of VRAM?
GPT-OSS 120B cannot fit in 24 GB of VRAM at any standard quantization level. The minimum needed is 43.8 GB at Q2_K.
What GPU do I need to run GPT-OSS 120B locally?
You need a multi-GPU server. At Q4_K_M, GPT-OSS 120B needs 74.5 GB VRAM, more than any single consumer GPU. Consider 2–4× H100 or A100 GPUs.