CanItRun Logocanitrun.

SmolLM2 360M Instruct

SmolLM2 360M Instruct needs roughly 0.6 GB VRAM at Q4_K_M quantization (1.2 GB at FP16). 106 GPUs we track can run it fully in VRAM at 8k context.

106 GPUs run this natively · 1 with CPU offload

Hugging Face0.36B params8k contextApache 2.0Commercial use ok

SmolLM2 360M Instruct is a 0.36B parameter dense model developed by Hugging Face. Ultra-compact 360M model for mobile and embedded.

To run SmolLM2 360M Instruct locally: Q8_K_M ~500MB — runs on virtually any device.

360M parameters — smallest practical LLM for on-device use.

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP321.4 GB0.34 GB2.0 GB
BF160.7 GB0.34 GB1.2 GB
FP160.7 GB0.34 GB1.2 GB
Q8_0rec0.4 GB0.34 GB0.8 GB
Q6_K0.3 GB0.34 GB0.7 GB
Q5_K_M0.2 GB0.34 GB0.6 GB
Q4_K_M0.2 GB0.34 GB0.6 GB
Q3_K_M0.1 GB0.34 GB0.6 GB
Q2_K0.1 GB0.34 GB0.5 GB
NVFP4cuda0.2 GB0.34 GB0.6 GB

KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.

Benchmarks

GPUs that run SmolLM2 360M Instruct natively (106)

Plus 1 GPUs that run it with CPU offload (slower)
Hugging Face ↗Ollama ↗Released 2024-11-01

Frequently asked questions

What are the VRAM requirements for SmolLM2 360M Instruct?
SmolLM2 360M Instruct requires approximately 0.6 GB of VRAM at Q4_K_M quantization, 0.8 GB at Q8, and 1.2 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
How many parameters does SmolLM2 360M Instruct have?
SmolLM2 360M Instruct has 0.36 billion parameters.
How capable is SmolLM2 360M Instruct?
SmolLM2 360M Instruct has an MMLU-Pro score of 8, making it well-suited for lightweight tasks, prototyping, and resource-constrained environments.
Can SmolLM2 360M Instruct run on a 16 GB GPU?
Yes. SmolLM2 360M Instruct needs 0.6 GB at Q4_K_M, which fits in a 16 GB GPU like the RTX 4080 or RTX 4070 Ti Super.
What is the smallest quantization for SmolLM2 360M Instruct that fits in 24 GB of VRAM?
At FP32, SmolLM2 360M Instruct needs 2.0 GB — the highest-quality quantization that fits in 24 GB of VRAM.
What GPU do I need to run SmolLM2 360M Instruct locally?
A 16 GB GPU is enough. At Q4_K_M, SmolLM2 360M Instruct needs 0.6 GB VRAM. Good options: RTX 4080 (16 GB), RTX 4070 Ti Super (16 GB).