Small LLMs
9models · local AI VRAM requirements & GPU compatibility
Small models (under ~10B parameters) are the sweet spot for consumer hardware. Many run at full FP16 precision on an 8 GB GPU, eliminating quantization artifacts entirely. They're fast enough for real-time chat and can run on laptops and integrated graphics. Ideal if you want always-on local AI without high power draw.
- Gemma 4 E4BGoogle · 4B params3.4 GBQ4_K_Mfits 8 GB
- Phi-3.5 Mini InstructMicrosoft · 3.8B params5.7 GBQ4_K_Mfits 8 GB
- Llama 3.2 3B InstructMeta · 3.2B params2.8 GBQ4_K_Mfits 8 GB
- Qwen 2.5 3B InstructAlibaba · 3.1B params2.1 GBQ4_K_Mfits 8 GB
- Gemma 2 2B InstructGoogle · 2.6B params2.4 GBQ4_K_Mfits 8 GB
- SmolLM2 1.7B InstructHugging Face · 1.7B params2.8 GBQ4_K_Mfits 8 GB
- Qwen 2.5 1.5B InstructAlibaba · 1.5B params1.1 GBQ4_K_Mfits 8 GB
- Llama 3.2 1B InstructMeta · 1.24B params1.0 GBQ4_K_Mfits 8 GB
- Gemma 3 1B InstructGoogle · 1B params0.9 GBQ4_K_Mfits 8 GB
Want to check your specific GPU? Use the homepage calculator to see which of these models fit your hardware with estimated tokens per second.