CanItRun Logocanitrun.

DeepSeek R1 671B vs Qwen3 235B-A22B (MoE)

Side-by-side VRAM requirements, benchmark scores, and GPU compatibility for local AI inference.

Quick verdict

Qwen3 235B-A22B (MoE) is more hardware-efficient — it needs 133.4 GB at Q4_K_M vs 376.3 GB for DeepSeek R1 671B, fitting on 14 GPUs natively.

VRAM at each quantization (8k context)

QuantDeepSeek R1 671BQwen3 235B-A22B (MoE)Diff
FP161503.6 GB528.2 GB+185%
Q8752.1 GB265.0 GB+184%
Q6_K564.2 GB199.2 GB+183%
Q5_K_M470.3 GB166.3 GB+183%
Q4_K_M376.3 GB133.4 GB+182%
Q3_K_M301.2 GB107.0 GB+181%
Q2_K226.0 GB80.7 GB+180%

Diff is DeepSeek R1 671B relative to Qwen3 235B-A22B (MoE). Green = lower VRAM (fits more GPUs).

Model specifications

SpecDeepSeek R1 671BQwen3 235B-A22B (MoE)
OrgDeepSeekAlibaba
Parameters671B235B
ArchitectureMoE (37B active)MoE (22B active)
Context125k tokens128k tokens
Modalitiestexttext
LicenseMITApache 2.0
CommercialYesYes
Released2025-01-202025-04-29
GPUs (native)2 / 6714 / 67

Benchmark scores

BenchmarkDeepSeek R1 671BQwen3 235B-A22B (MoE)
GPQA71.5
MATH97.3

Green = higher score (better). — = not yet available.

GPUs that run only DeepSeek R1 671B(0)

Every GPU that runs DeepSeek R1 671B also runs Qwen3 235B-A22B (MoE).

GPUs that run only Qwen3 235B-A22B (MoE)(12)

GPUs that run both natively(2)

Which should you use?

Choose DeepSeek R1 671B if:
  • • You want maximum capability and have a 377 GB+ GPU
Choose Qwen3 235B-A22B (MoE) if:
  • • You have limited VRAM — it's a smaller model needing 133.4 GB vs 376.3 GB
  • • Long context matters — it supports 128k tokens vs 125k

Frequently asked questions

Which is better, DeepSeek R1 671B or Qwen3 235B-A22B (MoE)?
DeepSeek R1 671B has 671B parameters vs 235B for Qwen3 235B-A22B (MoE), so DeepSeek R1 671B is the larger model. Qwen3 235B-A22B (MoE) is more hardware-efficient, needing 133.4 GB at Q4_K_M vs 376.3 GB. Qwen3 235B-A22B (MoE) runs on more GPUs natively (14 vs 2).
How much VRAM does DeepSeek R1 671B need vs Qwen3 235B-A22B (MoE)?
At Q4_K_M quantization with 8k context, DeepSeek R1 671B needs approximately 376.3 GB of VRAM, while Qwen3 235B-A22B (MoE) needs 133.4 GB. At FP16, DeepSeek R1 671B requires 1503.6 GB vs 528.2 GB for Qwen3 235B-A22B (MoE).
Can you run DeepSeek R1 671B on the same GPUs as Qwen3 235B-A22B (MoE)?
Yes, 2 GPUs can run both natively in VRAM, including Apple M4 Ultra (384GB), Apple M2 Ultra (384GB). However, no GPU can run DeepSeek R1 671B without also fitting Qwen3 235B-A22B (MoE), and 12 GPUs can run Qwen3 235B-A22B (MoE) but not DeepSeek R1 671B.
What is the difference between DeepSeek R1 671B and Qwen3 235B-A22B (MoE)?
DeepSeek R1 671B has 671B parameters (37B active, MoE) with a 125k context window. Qwen3 235B-A22B (MoE) has 235B parameters (22B active, MoE) with a 128k context window. Licensing differs: DeepSeek R1 671B is MIT while Qwen3 235B-A22B (MoE) is Apache 2.0.
Which model fits in 24 GB of VRAM, DeepSeek R1 671B or Qwen3 235B-A22B (MoE)?
Neither fits in 24 GB at Q4_K_M — DeepSeek R1 671B needs 376.3 GB and Qwen3 235B-A22B (MoE) needs 133.4 GB. Both require at least a 48 GB GPU.
Full DeepSeek R1 671B page →Full Qwen3 235B-A22B (MoE) page →Check your hardware →