Question 1

Which is better, Mixtral 8x7B Instruct v0.1 or Llama 3.1 8B Instruct?

Accepted Answer

Mixtral 8x7B Instruct v0.1 has 46.7B parameters vs 8B for Llama 3.1 8B Instruct, so Mixtral 8x7B Instruct v0.1 is the larger model. Llama 3.1 8B Instruct is more hardware-efficient, needing 5.7 GB at Q4_K_M vs 27.4 GB. Llama 3.1 8B Instruct runs on more GPUs natively (66 vs 46). On MMLU-Pro, Llama 3.1 8B Instruct scores higher (37.5 vs 29.7).

Question 2

How much VRAM does Mixtral 8x7B Instruct v0.1 need vs Llama 3.1 8B Instruct?

Accepted Answer

At Q4_K_M quantization with 8k context, Mixtral 8x7B Instruct v0.1 needs approximately 27.4 GB of VRAM, while Llama 3.1 8B Instruct needs 5.7 GB. At FP16, Mixtral 8x7B Instruct v0.1 requires 105.8 GB vs 19.1 GB for Llama 3.1 8B Instruct.

Question 3

Can you run Mixtral 8x7B Instruct v0.1 on the same GPUs as Llama 3.1 8B Instruct?

Accepted Answer

Yes, 46 GPUs can run both natively in VRAM, including NVIDIA RTX 5090, NVIDIA RTX 4090, NVIDIA RTX 3090. However, no GPU can run Mixtral 8x7B Instruct v0.1 without also fitting Llama 3.1 8B Instruct, and 20 GPUs can run Llama 3.1 8B Instruct but not Mixtral 8x7B Instruct v0.1.

Question 4

What is the difference between Mixtral 8x7B Instruct v0.1 and Llama 3.1 8B Instruct?

Accepted Answer

Mixtral 8x7B Instruct v0.1 has 46.7B parameters (12.9B active, MoE) with a 32k context window. Llama 3.1 8B Instruct has 8B parameters (dense) with a 125k context window. Licensing differs: Mixtral 8x7B Instruct v0.1 is Apache 2.0 while Llama 3.1 8B Instruct is Llama 3.1 Community.

Question 5

Which model fits in 24 GB of VRAM, Mixtral 8x7B Instruct v0.1 or Llama 3.1 8B Instruct?

Accepted Answer

Only Llama 3.1 8B Instruct fits in 24 GB at Q4_K_M (5.7 GB). Mixtral 8x7B Instruct v0.1 needs 27.4 GB, requiring a larger GPU.

Quant	Mixtral 8x7B Instruct v0.1	Llama 3.1 8B Instruct	Diff
FP16	105.8 GB	19.1 GB	+453%
Q8	53.5 GB	10.2 GB	+427%
Q6_K	40.4 GB	7.9 GB	+410%
Q5_K_M	33.9 GB	6.8 GB	+398%
Q4_K_M	27.4 GB	5.7 GB	+381%
Q3_K_M	22.1 GB	4.8 GB	+362%
Q2_K	16.9 GB	3.9 GB	+334%

Spec	Mixtral 8x7B Instruct v0.1	Llama 3.1 8B Instruct
Org	Mistral AI	Meta
Parameters	46.7B	8B
Architecture	MoE (12.9B active)	Dense
Context	32k tokens	125k tokens
Modalities	text	text
License	Apache 2.0	Llama 3.1 Community
Commercial	Yes	Yes
Released	2023-12-11	2024-07-23
GPUs (native)	46 / 67	66 / 67

Benchmark	Mixtral 8x7B Instruct v0.1	Llama 3.1 8B Instruct
MMLU-Pro	29.7	37.5
IFEval	54.8	77.4
HumanEval	45.1	72.6
Arena ELO	1114.0	1176.0

Mixtral 8x7B Instruct v0.1 vs Llama 3.1 8B Instruct

Quick verdict

VRAM at each quantization (8k context)

Model specifications

Benchmark scores

GPUs that run only Mixtral 8x7B Instruct v0.1(0)

GPUs that run only Llama 3.1 8B Instruct(20)

GPUs that run both natively(46)

Which should you use?

Frequently asked questions