CanItRun Logocanitrun.

Qwen 2.5 72B Instruct

Qwen 2.5 72B Instruct needs roughly 36.0GB VRAM at Q4 quantization (144.0GB at FP16). 33 GPUs we track can run it fully in VRAM at 8k context.

Alibaba72B params125k contextQwenCommercial use ok

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP16144.0 GB2.68 GB164.3 GB
Q872.0 GB2.68 GB83.7 GB
Q6_K54.0 GB2.68 GB63.5 GB
Q5_K_M45.0 GB2.68 GB53.4 GB
Q4_K_M36.0 GB2.68 GB43.3 GB
Q3_K_M28.8 GB2.68 GB35.3 GB
Q2_K21.6 GB2.68 GB27.2 GB

Benchmarks

GPUs that run Qwen 2.5 72B Instruct natively (33)

Plus 15 GPUs that run it with CPU offload (slower)
Hugging Face ↗Ollama ↗Released 2024-09-19