Question 1

What are the VRAM requirements for Qwen3 235B-A22B (MoE)?

Accepted Answer

Qwen3 235B-A22B (MoE) requires approximately 162.1 GB of VRAM at Q4_K_M quantization, 281.5 GB at Q8, and 528.2 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.

Question 2

How many parameters does Qwen3 235B-A22B (MoE) have?

Accepted Answer

Qwen3 235B-A22B (MoE) has 235 billion total parameters, but only 22 billion are active per token thanks to its Mixture of Experts (MoE) architecture. This makes inference significantly faster than the total parameter count suggests.

Question 3

How capable is Qwen3 235B-A22B (MoE)?

Accepted Answer

Qwen3 235B-A22B (MoE) achieves an MMLU-Pro score of 84.4, placing it among the most capable open-weight models available — competitive with frontier systems on general knowledge and reasoning.

Question 4

Can Qwen3 235B-A22B (MoE) run on a 16 GB GPU?

Accepted Answer

No. At Q4_K_M, Qwen3 235B-A22B (MoE) needs 162.1 GB of VRAM — more than 16 GB. You will need a multi-GPU server.

Question 5

Can Qwen3 235B-A22B (MoE) run on a 24 GB GPU?

Accepted Answer

No. Even at Q4_K_M, Qwen3 235B-A22B (MoE) needs 162.1 GB. Consider a multi-GPU server with 80 GB+ total VRAM.

Question 6

What is the smallest quantization for Qwen3 235B-A22B (MoE) that fits in 24 GB of VRAM?

Accepted Answer

Qwen3 235B-A22B (MoE) cannot fit in 24 GB of VRAM at any standard quantization level. The minimum needed is 102.0 GB at Q2_K.

Question 7

What GPU do I need to run Qwen3 235B-A22B (MoE) locally?

Accepted Answer

You need a multi-GPU server. At Q4_K_M, Qwen3 235B-A22B (MoE) needs 162.1 GB VRAM, more than any single consumer GPU. Consider 2–4× H100 or A100 GPUs.

Quant	Weights	KV cache	Total
FP32	940.0 GB	1.58 GB	1054.6 GB
BF16	470.0 GB	1.58 GB	528.2 GB
FP16	470.0 GB	1.58 GB	528.2 GB
Q8_0	249.8 GB	1.58 GB	281.6 GB
Q6_K	192.9 GB	1.58 GB	217.8 GB
Q5_K_M	167.3 GB	1.58 GB	189.2 GB
Q4_K_M	143.1 GB	1.58 GB	162.1 GB
Q3_K_M	113.0 GB	1.58 GB	128.4 GB
Q2_Krec	89.5 GB	1.58 GB	102.0 GB
NVFP4cuda	117.5 GB	1.58 GB	133.4 GB

Qwen3 235B-A22B (MoE)

VRAM at each quantization

Benchmarks

GPUs that run Qwen3 235B-A22B (MoE) natively (14)

Notes

Compare Qwen3 235B-A22B (MoE) with other models

Continue reading

Frequently asked questions