CanItRun Logocanitrun.

Small LLMs

9models · local AI VRAM requirements & GPU compatibility

Small models (under ~10B parameters) are the sweet spot for consumer hardware. Many run at full FP16 precision on an 8 GB GPU, eliminating quantization artifacts entirely. They're fast enough for real-time chat and can run on laptops and integrated graphics. Ideal if you want always-on local AI without high power draw.

Want to check your specific GPU? Use the homepage calculator to see which of these models fit your hardware with estimated tokens per second.