Question 1

Which is better, Llama 3.1 70B Instruct or Llama 3.3 70B Instruct?

Accepted Answer

On MMLU-Pro, Llama 3.3 70B Instruct scores higher (68.9 vs 66.4).

Question 2

How much VRAM does Llama 3.1 70B Instruct need vs Llama 3.3 70B Instruct?

Accepted Answer

At Q4_K_M quantization with 8k context, Llama 3.1 70B Instruct needs approximately 42.2 GB of VRAM, while Llama 3.3 70B Instruct needs 42.2 GB. At FP16, Llama 3.1 70B Instruct requires 159.8 GB vs 159.8 GB for Llama 3.3 70B Instruct.

Question 3

Can you run Llama 3.1 70B Instruct on the same GPUs as Llama 3.3 70B Instruct?

Accepted Answer

Yes, 38 GPUs can run both natively in VRAM, including NVIDIA RTX 5090, NVIDIA H100 80GB, NVIDIA A100 80GB. However, no GPU can run Llama 3.1 70B Instruct without also fitting Llama 3.3 70B Instruct, and no GPU can run Llama 3.3 70B Instruct without also fitting Llama 3.1 70B Instruct.

Question 4

What is the difference between Llama 3.1 70B Instruct and Llama 3.3 70B Instruct?

Accepted Answer

Llama 3.1 70B Instruct has 70B parameters (dense) with a 125k context window. Llama 3.3 70B Instruct has 70B parameters (dense) with a 125k context window. Licensing differs: Llama 3.1 70B Instruct is Llama 3.1 Community while Llama 3.3 70B Instruct is Llama 3.3 Community.

Question 5

Which model fits in 24 GB of VRAM, Llama 3.1 70B Instruct or Llama 3.3 70B Instruct?

Accepted Answer

Neither fits in 24 GB at Q4_K_M — Llama 3.1 70B Instruct needs 42.2 GB and Llama 3.3 70B Instruct needs 42.2 GB. Both require at least a 48 GB GPU.

Quant	Llama 3.1 70B Instruct	Llama 3.3 70B Instruct	Diff
FP16	159.8 GB	159.8 GB	+0%
Q8	81.4 GB	81.4 GB	+0%
Q6_K	61.8 GB	61.8 GB	+0%
Q5_K_M	52.0 GB	52.0 GB	+0%
Q4_K_M	42.2 GB	42.2 GB	+0%
Q3_K_M	34.4 GB	34.4 GB	+0%
Q2_K	26.5 GB	26.5 GB	+0%

Spec	Llama 3.1 70B Instruct	Llama 3.3 70B Instruct
Org	Meta	Meta
Parameters	70B	70B
Architecture	Dense	Dense
Context	125k tokens	125k tokens
Modalities	text	text
License	Llama 3.1 Community	Llama 3.3 Community
Commercial	Yes	Yes
Released	2024-07-23	2024-12-06
GPUs (native)	38 / 67	38 / 67

Benchmark	Llama 3.1 70B Instruct	Llama 3.3 70B Instruct
MMLU-Pro	66.4	68.9
GPQA	46.7	50.5
IFEval	87.5	92.1
MATH	68.0	77.0
HumanEval	80.5	88.4
Arena ELO	1247.0	1256.0

Llama 3.1 70B Instruct vs Llama 3.3 70B Instruct

Quick verdict

VRAM at each quantization (8k context)

Model specifications

Benchmark scores

GPUs that run only Llama 3.1 70B Instruct(0)

GPUs that run only Llama 3.3 70B Instruct(0)

GPUs that run both natively(38)

Which should you use?

Frequently asked questions