Coding LLMs
7models · local AI VRAM requirements & GPU compatibility
Coding-focused open-weight models excel at code generation, debugging, refactoring, and technical documentation. These models are optimized for instruction-following on programming tasks, often scoring high on HumanEval and similar benchmarks. Because you need quick iteration, tokens/sec matters — pair each model with a GPU that fits it in VRAM for fastest generation.
- GLM-5.1 754BZ.ai · 754B params (44B active)434.0 GBQ4_K_M
- GLM-4.5 355BZ.ai · 355B params (32B active)202.3 GBQ4_K_M
- GLM-4.6 355BZ.ai · 355B params (32B active)202.3 GBQ4_K_M
- MiniMax M2.5 229BMiniMax · 229B params (10B active)130.6 GBQ4_K_M
- MiniMax M2.7 229BMiniMax · 229B params (10B active)130.6 GBQ4_K_M
- GLM-4.5 Air 106BZ.ai · 106B params (12B active)61.1 GBQ4_K_Mfits 80 GB
- Qwen 2.5 Coder 32B InstructAlibaba · 32.5B params20.6 GBQ4_K_Mfits 24 GB
Want to check your specific GPU? Use the homepage calculator to see which of these models fit your hardware with estimated tokens per second.