CanItRun Logocanitrun.

Mixtral 8x7B Instruct v0.1

Mixtral 8x7B Instruct v0.1 needs roughly 23.4GB VRAM at Q4 quantization (93.4GB at FP16). 41 GPUs we track can run it fully in VRAM at 8k context.

Mistral AI46.7B params12.9B active (MoE)32k contextApache 2.0Commercial use ok

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP1693.4 GB1.07 GB105.8 GB
Q846.7 GB1.07 GB53.5 GB
Q6_K35.0 GB1.07 GB40.4 GB
Q5_K_M29.2 GB1.07 GB33.9 GB
Q4_K_M23.4 GB1.07 GB27.4 GB
Q3_K_M18.7 GB1.07 GB22.1 GB
Q2_K14.0 GB1.07 GB16.9 GB

Benchmarks

GPUs that run Mixtral 8x7B Instruct v0.1 natively (41)

Plus 10 GPUs that run it with CPU offload (slower)

Notes

MoE: 47B total params, only ~13B active per token — fast if it fits.

Hugging Face ↗Ollama ↗Released 2023-12-11