Question 1

Which is better, Llama 3.3 70B Instruct or Qwen 2.5 72B Instruct?

Accepted Answer

Llama 3.3 70B Instruct has 70B parameters vs 72B for Qwen 2.5 72B Instruct, so Qwen 2.5 72B Instruct is the larger model. Llama 3.3 70B Instruct is more hardware-efficient, needing 42.2 GB at Q4_K_M vs 43.3 GB. On MMLU-Pro, Llama 3.3 70B Instruct scores higher (68.9 vs 58.1).

Question 2

How much VRAM does Llama 3.3 70B Instruct need vs Qwen 2.5 72B Instruct?

Accepted Answer

At Q4_K_M quantization with 8k context, Llama 3.3 70B Instruct needs approximately 42.2 GB of VRAM, while Qwen 2.5 72B Instruct needs 43.3 GB. At FP16, Llama 3.3 70B Instruct requires 159.8 GB vs 164.3 GB for Qwen 2.5 72B Instruct.

Question 3

Can you run Llama 3.3 70B Instruct on the same GPUs as Qwen 2.5 72B Instruct?

Accepted Answer

Yes, 38 GPUs can run both natively in VRAM, including NVIDIA RTX 5090, NVIDIA H100 80GB, NVIDIA A100 80GB. However, no GPU can run Llama 3.3 70B Instruct without also fitting Qwen 2.5 72B Instruct, and no GPU can run Qwen 2.5 72B Instruct without also fitting Llama 3.3 70B Instruct.

Question 4

What is the difference between Llama 3.3 70B Instruct and Qwen 2.5 72B Instruct?

Accepted Answer

Llama 3.3 70B Instruct has 70B parameters (dense) with a 125k context window. Qwen 2.5 72B Instruct has 72B parameters (dense) with a 125k context window. Licensing differs: Llama 3.3 70B Instruct is Llama 3.3 Community while Qwen 2.5 72B Instruct is Qwen.

Question 5

Which model fits in 24 GB of VRAM, Llama 3.3 70B Instruct or Qwen 2.5 72B Instruct?

Accepted Answer

Neither fits in 24 GB at Q4_K_M — Llama 3.3 70B Instruct needs 42.2 GB and Qwen 2.5 72B Instruct needs 43.3 GB. Both require at least a 48 GB GPU.

Quant	Llama 3.3 70B Instruct	Qwen 2.5 72B Instruct	Diff
FP16	159.8 GB	164.3 GB	-3%
Q8	81.4 GB	83.6 GB	-3%
Q6_K	61.8 GB	63.5 GB	-3%
Q5_K_M	52.0 GB	53.4 GB	-3%
Q4_K_M	42.2 GB	43.3 GB	-3%
Q3_K_M	34.4 GB	35.3 GB	-3%
Q2_K	26.5 GB	27.2 GB	-2%

Spec	Llama 3.3 70B Instruct	Qwen 2.5 72B Instruct
Org	Meta	Alibaba
Parameters	70B	72B
Architecture	Dense	Dense
Context	125k tokens	125k tokens
Modalities	text	text
License	Llama 3.3 Community	Qwen
Commercial	Yes	Yes
Released	2024-12-06	2024-09-19
GPUs (native)	38 / 67	38 / 67

Benchmark	Llama 3.3 70B Instruct	Qwen 2.5 72B Instruct
MMLU-Pro	68.9	58.1
GPQA	50.5	49.0
IFEval	92.1	86.4
MATH	77.0	83.1
HumanEval	88.4	86.6
Arena ELO	1256.0	1259.0

Llama 3.3 70B Instruct vs Qwen 2.5 72B Instruct

Quick verdict

VRAM at each quantization (8k context)

Model specifications

Benchmark scores

GPUs that run only Llama 3.3 70B Instruct(0)

GPUs that run only Qwen 2.5 72B Instruct(0)

GPUs that run both natively(38)

Which should you use?

Frequently asked questions