Question 1

What are the VRAM requirements for Qwen 3.5 122B-A10B (MoE)?

Accepted Answer

Qwen 3.5 122B-A10B (MoE) requires approximately 85.6 GB of VRAM at Q4_K_M quantization, 147.7 GB at Q8, and 275.7 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.

Question 2

How many parameters does Qwen 3.5 122B-A10B (MoE) have?

Accepted Answer

Qwen 3.5 122B-A10B (MoE) has 122 billion total parameters, but only 10 billion are active per token thanks to its Mixture of Experts (MoE) architecture. This makes inference significantly faster than the total parameter count suggests.

Question 3

How capable is Qwen 3.5 122B-A10B (MoE)?

Accepted Answer

Qwen 3.5 122B-A10B (MoE) achieves an MMLU-Pro score of 86.7, placing it among the most capable open-weight models available — competitive with frontier systems on general knowledge and reasoning.

Question 4

Can Qwen 3.5 122B-A10B (MoE) run on a 16 GB GPU?

Accepted Answer

No. At Q4_K_M, Qwen 3.5 122B-A10B (MoE) needs 85.6 GB of VRAM — more than 16 GB. You will need a multi-GPU server.

Question 5

Can Qwen 3.5 122B-A10B (MoE) run on a 24 GB GPU?

Accepted Answer

No. Even at Q4_K_M, Qwen 3.5 122B-A10B (MoE) needs 85.6 GB. Consider a multi-GPU server with 80 GB+ total VRAM.

Question 6

What is the smallest quantization for Qwen 3.5 122B-A10B (MoE) that fits in 24 GB of VRAM?

Accepted Answer

Qwen 3.5 122B-A10B (MoE) cannot fit in 24 GB of VRAM at any standard quantization level. The minimum needed is 54.5 GB at Q2_K.

Question 7

What GPU do I need to run Qwen 3.5 122B-A10B (MoE) locally?

Accepted Answer

You need a multi-GPU server. At Q4_K_M, Qwen 3.5 122B-A10B (MoE) needs 85.6 GB VRAM, more than any single consumer GPU. Consider 2–4× H100 or A100 GPUs.

Quant	Weights	KV cache	Total
FP32	488.0 GB	2.15 GB	549.0 GB
BF16	244.0 GB	2.15 GB	275.7 GB
FP16	244.0 GB	2.15 GB	275.7 GB
Q8_0	129.7 GB	2.15 GB	147.7 GB
Q6_K	100.2 GB	2.15 GB	114.6 GB
Q5_K_M	86.9 GB	2.15 GB	99.7 GB
Q4_K_M	74.3 GB	2.15 GB	85.6 GB
Q3_K_Mrec	58.7 GB	2.15 GB	68.1 GB
Q2_K	46.5 GB	2.15 GB	54.5 GB
NVFP4cuda	61.0 GB	2.15 GB	70.7 GB

Qwen 3.5 122B-A10B (MoE)

VRAM at each quantization

Benchmarks

GPUs that run Qwen 3.5 122B-A10B (MoE) natively (29)

Continue reading

Frequently asked questions