CanItRun Logocanitrun.

GPT-OSS 20B

GPT-OSS 20B needs roughly 13.7 GB VRAM at Q4_K_M quantization (47.5 GB at FP16). 93 GPUs we track can run it fully in VRAM at 8k context.

93 GPUs run this natively · 11 with CPU offload

OpenAI21B params4B active (MoE)128k contextApache 2.0Commercial use ok

GPT-OSS 20B is a Mixture of Experts (MoE) model with 21B total parameters but only 4B active per token developed by OpenAI. August 2025 21B MoE with 4B active — matches o3-mini on key benchmarks.

To run GPT-OSS 20B locally: Q5_K_M ~14-16GB — fits on 16GB GPUs. Best reasoning model for 16GB hardware. As a MoE model, inference speed depends on active parameters (4B) rather than total size.

GPQA 71.5% at 21B scale — exceptional reasoning efficiency.

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP3284.0 GB0.40 GB94.5 GB
BF1642.0 GB0.40 GB47.5 GB
FP1642.0 GB0.40 GB47.5 GB
Q8_021.0 GB0.40 GB24.0 GB
Q6_K17.2 GB0.40 GB19.7 GB
Q5_K_Mrec13.5 GB0.40 GB15.6 GB
Q4_K_M11.8 GB0.40 GB13.7 GB
Q3_K_M9.0 GB0.40 GB10.6 GB
Q2_K6.9 GB0.40 GB8.2 GB
NVFP4cuda10.5 GB0.40 GB12.2 GB

KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.

Benchmarks

GPUs that run GPT-OSS 20B natively (93)

Plus 11 GPUs that run it with CPU offload (slower)

Notes

Smaller sibling of GPT-OSS 120B. Matches o3-mini on key benchmarks; runs on 16 GB of VRAM.

Hugging Face ↗Released 2025-08-05

Frequently asked questions

What are the VRAM requirements for GPT-OSS 20B?
GPT-OSS 20B requires approximately 13.7 GB of VRAM at Q4_K_M quantization, 24.0 GB at Q8, and 47.5 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
How many parameters does GPT-OSS 20B have?
GPT-OSS 20B has 21 billion total parameters, but only 4 billion are active per token thanks to its Mixture of Experts (MoE) architecture. This makes inference significantly faster than the total parameter count suggests.
How capable is GPT-OSS 20B?
With an MMLU-Pro score of 67.86, GPT-OSS 20B delivers solid general-purpose performance suitable for most everyday tasks and professional use.
Can GPT-OSS 20B run on a 16 GB GPU?
Yes. GPT-OSS 20B needs 13.7 GB at Q4_K_M, which fits in a 16 GB GPU like the RTX 4080 or RTX 4070 Ti Super.
What is the smallest quantization for GPT-OSS 20B that fits in 24 GB of VRAM?
At NVFP4, GPT-OSS 20B needs 12.2 GB — the highest-quality quantization that fits in 24 GB of VRAM.
What GPU do I need to run GPT-OSS 20B locally?
A 16 GB GPU is enough. At Q4_K_M, GPT-OSS 20B needs 13.7 GB VRAM. Good options: RTX 4080 (16 GB), RTX 4070 Ti Super (16 GB).