Question 1

Which is better, Llama 3.1 8B Instruct or Gemma 2 9B Instruct?

Accepted Answer

Llama 3.1 8B Instruct has 8B parameters vs 9.2B for Gemma 2 9B Instruct, so Gemma 2 9B Instruct is the larger model. Llama 3.1 8B Instruct is more hardware-efficient, needing 5.7 GB at Q4_K_M vs 8.3 GB. Llama 3.1 8B Instruct runs on more GPUs natively (66 vs 63). On MMLU-Pro, Llama 3.1 8B Instruct scores higher (37.5 vs 32.0).

Question 2

How much VRAM does Llama 3.1 8B Instruct need vs Gemma 2 9B Instruct?

Accepted Answer

At Q4_K_M quantization with 8k context, Llama 3.1 8B Instruct needs approximately 5.7 GB of VRAM, while Gemma 2 9B Instruct needs 8.3 GB. At FP16, Llama 3.1 8B Instruct requires 19.1 GB vs 23.8 GB for Gemma 2 9B Instruct.

Question 3

Can you run Llama 3.1 8B Instruct on the same GPUs as Gemma 2 9B Instruct?

Accepted Answer

Yes, 63 GPUs can run both natively in VRAM, including NVIDIA RTX 5090, NVIDIA RTX 4090, NVIDIA RTX 4080. However, 3 GPUs can run Llama 3.1 8B Instruct but not Gemma 2 9B Instruct, and no GPU can run Gemma 2 9B Instruct without also fitting Llama 3.1 8B Instruct.

Question 4

What is the difference between Llama 3.1 8B Instruct and Gemma 2 9B Instruct?

Accepted Answer

Llama 3.1 8B Instruct has 8B parameters (dense) with a 125k context window. Gemma 2 9B Instruct has 9.2B parameters (dense) with a 8k context window. Licensing differs: Llama 3.1 8B Instruct is Llama 3.1 Community while Gemma 2 9B Instruct is Gemma.

Question 5

Which model fits in 24 GB of VRAM, Llama 3.1 8B Instruct or Gemma 2 9B Instruct?

Accepted Answer

Both fit in 24 GB of VRAM at Q4_K_M — Llama 3.1 8B Instruct needs 5.7 GB and Gemma 2 9B Instruct needs 8.3 GB.

Quant	Llama 3.1 8B Instruct	Gemma 2 9B Instruct	Diff
FP16	19.1 GB	23.8 GB	-20%
Q8	10.2 GB	13.5 GB	-25%
Q6_K	7.9 GB	10.9 GB	-27%
Q5_K_M	6.8 GB	9.6 GB	-29%
Q4_K_M	5.7 GB	8.3 GB	-32%
Q3_K_M	4.8 GB	7.3 GB	-34%
Q2_K	3.9 GB	6.2 GB	-38%

Spec	Llama 3.1 8B Instruct	Gemma 2 9B Instruct
Org	Meta	Google
Parameters	8B	9.2B
Architecture	Dense	Dense
Context	125k tokens	8k tokens
Modalities	text	text
License	Llama 3.1 Community	Gemma
Commercial	Yes	Yes
Released	2024-07-23	2024-06-27
GPUs (native)	66 / 67	63 / 67

Benchmark	Llama 3.1 8B Instruct	Gemma 2 9B Instruct
MMLU-Pro	37.5	32.0
GPQA	30.4	31.5
IFEval	77.4	74.4
MATH	48.0	44.3
HumanEval	72.6	60.4
Arena ELO	1176.0	1190.0

Llama 3.1 8B Instruct vs Gemma 2 9B Instruct

Quick verdict

VRAM at each quantization (8k context)

Model specifications

Benchmark scores

GPUs that run only Llama 3.1 8B Instruct(3)

GPUs that run only Gemma 2 9B Instruct(0)

GPUs that run both natively(63)

Which should you use?

Frequently asked questions