CanItRun Logocanitrun.

GPT-OSS 120B

GPT-OSS 120B needs roughly 66.2GB VRAM at Q4_K_M quantization (262.8GB at FP16). 28 GPUs we track can run it fully in VRAM at 8k context.

OpenAI117B params5B active (MoE)128k contextApache 2.0Commercial use ok

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP16234.0 GB0.60 GB262.8 GB
Q8117.0 GB0.60 GB131.7 GB
Q6_K87.8 GB0.60 GB99.0 GB
Q5_K_M73.1 GB0.60 GB82.6 GB
Q4_K_M58.5 GB0.60 GB66.2 GB
Q3_K_M46.8 GB0.60 GB53.1 GB
Q2_K35.1 GB0.60 GB40.0 GB

Benchmarks

GPQA
80.1

GPUs that run GPT-OSS 120B natively (28)

Plus 11 GPUs that run it with CPU offload (slower)

Notes

Alternating sliding+full attention MoE. Near-parity with o4-mini; fits on a single 80 GB GPU at q4.

Hugging Face ↗Released 2025-08-05