Question 1

Which is better, DeepSeek V3 671B or Qwen3 235B-A22B (MoE)?

Accepted Answer

DeepSeek V3 671B has 671B parameters vs 235B for Qwen3 235B-A22B (MoE), so DeepSeek V3 671B is the larger model. Qwen3 235B-A22B (MoE) is more hardware-efficient, needing 133.4 GB at Q4_K_M vs 376.3 GB. Qwen3 235B-A22B (MoE) runs on more GPUs natively (14 vs 2).

Question 2

How much VRAM does DeepSeek V3 671B need vs Qwen3 235B-A22B (MoE)?

Accepted Answer

At Q4_K_M quantization with 8k context, DeepSeek V3 671B needs approximately 376.3 GB of VRAM, while Qwen3 235B-A22B (MoE) needs 133.4 GB. At FP16, DeepSeek V3 671B requires 1503.6 GB vs 528.2 GB for Qwen3 235B-A22B (MoE).

Question 3

Can you run DeepSeek V3 671B on the same GPUs as Qwen3 235B-A22B (MoE)?

Accepted Answer

Yes, 2 GPUs can run both natively in VRAM, including Apple M4 Ultra (384GB), Apple M2 Ultra (384GB). However, no GPU can run DeepSeek V3 671B without also fitting Qwen3 235B-A22B (MoE), and 12 GPUs can run Qwen3 235B-A22B (MoE) but not DeepSeek V3 671B.

Question 4

What is the difference between DeepSeek V3 671B and Qwen3 235B-A22B (MoE)?

Accepted Answer

DeepSeek V3 671B has 671B parameters (37B active, MoE) with a 125k context window. Qwen3 235B-A22B (MoE) has 235B parameters (22B active, MoE) with a 128k context window. Licensing differs: DeepSeek V3 671B is MIT while Qwen3 235B-A22B (MoE) is Apache 2.0.

Question 5

Which model fits in 24 GB of VRAM, DeepSeek V3 671B or Qwen3 235B-A22B (MoE)?

Accepted Answer

Neither fits in 24 GB at Q4_K_M — DeepSeek V3 671B needs 376.3 GB and Qwen3 235B-A22B (MoE) needs 133.4 GB. Both require at least a 48 GB GPU.

Quant	DeepSeek V3 671B	Qwen3 235B-A22B (MoE)	Diff
FP16	1503.6 GB	528.2 GB	+185%
Q8	752.1 GB	265.0 GB	+184%
Q6_K	564.2 GB	199.2 GB	+183%
Q5_K_M	470.3 GB	166.3 GB	+183%
Q4_K_M	376.3 GB	133.4 GB	+182%
Q3_K_M	301.2 GB	107.0 GB	+181%
Q2_K	226.0 GB	80.7 GB	+180%

Spec	DeepSeek V3 671B	Qwen3 235B-A22B (MoE)
Org	DeepSeek	Alibaba
Parameters	671B	235B
Architecture	MoE (37B active)	MoE (22B active)
Context	125k tokens	128k tokens
Modalities	text	text
License	MIT	Apache 2.0
Commercial	Yes	Yes
Released	2024-12-27	2025-04-29
GPUs (native)	2 / 67	14 / 67

Benchmark	DeepSeek V3 671B	Qwen3 235B-A22B (MoE)
MMLU-Pro	75.9	—
MATH	90.2	—

DeepSeek V3 671B vs Qwen3 235B-A22B (MoE)

Quick verdict

VRAM at each quantization (8k context)

Model specifications

Benchmark scores

GPUs that run only DeepSeek V3 671B(0)

GPUs that run only Qwen3 235B-A22B (MoE)(12)

GPUs that run both natively(2)

Which should you use?

Frequently asked questions