CanItRun Logocanitrun.

SmolLM2 360M Instruct

SmolLM2 360M Instruct needs roughly 0.2GB VRAM at Q4 quantization (0.7GB at FP16). 60 GPUs we track can run it fully in VRAM at 8k context.

Hugging Face0.36B params8k contextApache 2.0Commercial use ok

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP160.7 GB0.34 GB1.2 GB
Q80.4 GB0.34 GB0.8 GB
Q6_K0.3 GB0.34 GB0.7 GB
Q5_K_M0.2 GB0.34 GB0.6 GB
Q4_K_M0.2 GB0.34 GB0.6 GB
Q3_K_M0.1 GB0.34 GB0.5 GB
Q2_K0.1 GB0.34 GB0.5 GB

Benchmarks

GPUs that run SmolLM2 360M Instruct natively (60)

Plus 1 GPUs that run it with CPU offload (slower)
Hugging Face ↗Ollama ↗Released 2024-11-01