Reasoning LLMs
21models · local AI VRAM requirements & GPU compatibility
Reasoning models produce long chains-of-thought before answering, which means higher quality on math, science, and multi-step tasks — but also longer outputs and higher KV-cache VRAM at long contexts. If you're running these locally, prioritize GPUs with more VRAM and high memory bandwidth to sustain token generation through lengthy reasoning traces.
- GLM-5.1 754BZ.ai · 754B params (44B active)434.0 GBQ4_K_M
- GLM-5 744BZ.ai · 744B params (40B active)428.4 GBQ4_K_M
- DeepSeek R1 671BDeepSeek · 671B params (37B active)376.3 GBQ4_K_M
- MiniMax M1 456BMiniMax · 456B params (46B active)258.4 GBQ4_K_M
- GLM-4.7 358BZ.ai · 358B params (32B active)203.9 GBQ4_K_M
- GLM-4.5 355BZ.ai · 355B params (32B active)202.3 GBQ4_K_M
- Qwen3 235B-A22B (MoE)Alibaba · 235B params (22B active)133.4 GBQ4_K_M
- Qwen 3.5 122B-A10B (MoE)Alibaba · 122B params (10B active)70.7 GBQ4_K_Mfits 80 GB
- GPT-OSS 120BOpenAI · 117B params (5B active)66.2 GBQ4_K_Mfits 80 GB
- GLM-4.5 Air 106BZ.ai · 106B params (12B active)61.1 GBQ4_K_Mfits 80 GB
- DeepSeek R1 Distill Llama 70BDeepSeek · 70B params42.2 GBQ4_K_Mfits 48 GB
- Qwen 3.5 35B-A3B (MoE)Alibaba · 35B params (3B active)20.5 GBQ4_K_Mfits 24 GB
- Qwen 3.6 35BAlibaba · 35B params22.0 GBQ4_K_Mfits 24 GB
- Qwen3 32BAlibaba · 32.8B params19.9 GBQ4_K_Mfits 24 GB
- DeepSeek R1 Distill Qwen 32BDeepSeek · 32.5B params20.6 GBQ4_K_Mfits 24 GB
- Qwen3 30B-A3B (MoE)Alibaba · 30B params (3B active)17.7 GBQ4_K_Mfits 24 GB
- Qwen 3.6 27BAlibaba · 27B params16.9 GBQ4_K_Mfits 24 GB
- GPT-OSS 20BOpenAI · 21B params (4B active)12.2 GBQ4_K_Mfits 16 GB
- Qwen3 14BAlibaba · 14.8B params9.8 GBQ4_K_Mfits 12 GB
- DeepSeek R1 Distill Llama 8BDeepSeek · 8B params5.7 GBQ4_K_Mfits 8 GB
- Qwen3 8BAlibaba · 8B params5.8 GBQ4_K_Mfits 8 GB
Want to check your specific GPU? Use the homepage calculator to see which of these models fit your hardware with estimated tokens per second.