Coding LLMs

7models · local AI VRAM requirements & GPU compatibility

Coding-focused open-weight models excel at code generation, debugging, refactoring, and technical documentation. These models are optimized for instruction-following on programming tasks, often scoring high on HumanEval and similar benchmarks. Because you need quick iteration, tokens/sec matters — pair each model with a GPU that fits it in VRAM for fastest generation.

GLM-5.1 754B
Z.ai · 754B params (44B active)
434.0 GB
Q4_K_M
GLM-4.5 355B
Z.ai · 355B params (32B active)
202.3 GB
Q4_K_M
GLM-4.6 355B
Z.ai · 355B params (32B active)
202.3 GB
Q4_K_M
MiniMax M2.5 229B
MiniMax · 229B params (10B active)
130.6 GB
Q4_K_M
MiniMax M2.7 229B
MiniMax · 229B params (10B active)
130.6 GB
Q4_K_M
GLM-4.5 Air 106B
Z.ai · 106B params (12B active)
61.1 GB
Q4_K_M
fits 80 GB
Qwen 2.5 Coder 32B Instruct
Alibaba · 32.5B params
20.6 GB
Q4_K_M
fits 24 GB

Want to check your specific GPU? Use the homepage calculator to see which of these models fit your hardware with estimated tokens per second.