Question 1

Which is better, Mixtral 8x22B Instruct v0.1 or Qwen 2.5 72B Instruct?

Accepted Answer

Mixtral 8x22B Instruct v0.1 has 141B parameters vs 72B for Qwen 2.5 72B Instruct, so Mixtral 8x22B Instruct v0.1 is the larger model. Qwen 2.5 72B Instruct is more hardware-efficient, needing 43.3 GB at Q4_K_M vs 81.1 GB. Qwen 2.5 72B Instruct runs on more GPUs natively (38 vs 22). On MMLU-Pro, Qwen 2.5 72B Instruct scores higher (58.1 vs 40.0).

Question 2

How much VRAM does Mixtral 8x22B Instruct v0.1 need vs Qwen 2.5 72B Instruct?

Accepted Answer

At Q4_K_M quantization with 8k context, Mixtral 8x22B Instruct v0.1 needs approximately 81.1 GB of VRAM, while Qwen 2.5 72B Instruct needs 43.3 GB. At FP16, Mixtral 8x22B Instruct v0.1 requires 317.9 GB vs 164.3 GB for Qwen 2.5 72B Instruct.

Question 3

Can you run Mixtral 8x22B Instruct v0.1 on the same GPUs as Qwen 2.5 72B Instruct?

Accepted Answer

Yes, 22 GPUs can run both natively in VRAM, including NVIDIA H100 80GB, NVIDIA A100 80GB, NVIDIA DGX Spark (128GB). However, no GPU can run Mixtral 8x22B Instruct v0.1 without also fitting Qwen 2.5 72B Instruct, and 16 GPUs can run Qwen 2.5 72B Instruct but not Mixtral 8x22B Instruct v0.1.

Question 4

What is the difference between Mixtral 8x22B Instruct v0.1 and Qwen 2.5 72B Instruct?

Accepted Answer

Mixtral 8x22B Instruct v0.1 has 141B parameters (39B active, MoE) with a 64k context window. Qwen 2.5 72B Instruct has 72B parameters (dense) with a 125k context window. Licensing differs: Mixtral 8x22B Instruct v0.1 is Apache 2.0 while Qwen 2.5 72B Instruct is Qwen.

Question 5

Which model fits in 24 GB of VRAM, Mixtral 8x22B Instruct v0.1 or Qwen 2.5 72B Instruct?

Accepted Answer

Neither fits in 24 GB at Q4_K_M — Mixtral 8x22B Instruct v0.1 needs 81.1 GB and Qwen 2.5 72B Instruct needs 43.3 GB. Both require at least a 48 GB GPU.

Quant	Mixtral 8x22B Instruct v0.1	Qwen 2.5 72B Instruct	Diff
FP16	317.9 GB	164.3 GB	+94%
Q8	160.0 GB	83.6 GB	+91%
Q6_K	120.5 GB	63.5 GB	+90%
Q5_K_M	100.8 GB	53.4 GB	+89%
Q4_K_M	81.1 GB	43.3 GB	+87%
Q3_K_M	65.3 GB	35.3 GB	+85%
Q2_K	49.5 GB	27.2 GB	+82%

Spec	Mixtral 8x22B Instruct v0.1	Qwen 2.5 72B Instruct
Org	Mistral AI	Alibaba
Parameters	141B	72B
Architecture	MoE (39B active)	Dense
Context	64k tokens	125k tokens
Modalities	text	text
License	Apache 2.0	Qwen
Commercial	Yes	Yes
Released	2024-04-17	2024-09-19
GPUs (native)	22 / 67	38 / 67

Benchmark	Mixtral 8x22B Instruct v0.1	Qwen 2.5 72B Instruct
MMLU-Pro	40.0	58.1
IFEval	71.8	86.4
MATH	41.8	83.1
HumanEval	76.2	86.6
Arena ELO	1147.0	1259.0

Mixtral 8x22B Instruct v0.1 vs Qwen 2.5 72B Instruct

Quick verdict

VRAM at each quantization (8k context)

Model specifications

Benchmark scores

GPUs that run only Mixtral 8x22B Instruct v0.1(0)

GPUs that run only Qwen 2.5 72B Instruct(16)

GPUs that run both natively(22)

Which should you use?

Frequently asked questions