CanItRun Logocanitrun.

SmolLM2 1.7B Instruct

SmolLM2 1.7B Instruct needs roughly 0.8GB VRAM at Q4 quantization (3.4GB at FP16). 60 GPUs we track can run it fully in VRAM at 8k context.

Hugging Face1.7B params8k contextApache 2.0Commercial use ok

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP163.4 GB1.61 GB5.6 GB
Q81.7 GB1.61 GB3.7 GB
Q6_K1.3 GB1.61 GB3.2 GB
Q5_K_M1.1 GB1.61 GB3.0 GB
Q4_K_M0.8 GB1.61 GB2.8 GB
Q3_K_M0.7 GB1.61 GB2.6 GB
Q2_K0.5 GB1.61 GB2.4 GB

Benchmarks

GPUs that run SmolLM2 1.7B Instruct natively (60)

Plus 1 GPUs that run it with CPU offload (slower)
Hugging Face ↗Ollama ↗Released 2024-11-01