Claude Code
Anthropic's official agentic coding CLI. Reads your entire codebase, plans and executes changes across files, runs tests, and iterates on failures.
Coding Agent, Developer Tool
Yes
Yes
Yes
Yes — for local inference
Autonomous multi-file coding with Claude models
Easy
CLI, macOS, Linux
Paid
Claude Code is Anthropic's official agentic coding CLI. Reads your entire codebase, plans and executes changes across files, runs tests, and iterates on failures. Claude Code is Anthropic's proprietary CLI-based agentic coding tool.
Claude Code works with both local models and cloud APIs. It supports OpenRouter for unified access to 300+ models from a single API. Ollama integration lets you run models locally on your own GPU. Claude Code is a paid product. Practical minimum 16 GB VRAM for MoE models (Gemma 4 26B). 20-24 GB recommended for dense models. Community advice: 'Don't go below q6 if watching it, q8 if letting it run autonomously.' Minimum 32K context, 64K+ for reliable agentic use.
Can it run on my hardware?
Minimum
Practical minimum 16 GB VRAM for MoE models (Gemma 4 26B). 20-24 GB recommended for dense models. Community advice: 'Don't go below q6 if watching it, q8 if letting it run autonomously.' Minimum 32K context, 64K+ for reliable agentic use.
Recommended
24 GB VRAM (RTX 3090/4090) for Qwen 2.5 Coder 32B at Q4_K_M with 64K context. For MoE models like Gemma 4 26B, 16 GB VRAM can work at Q4. Use llama.cpp directly instead of Ollama for better control over quants and context.
Approximate VRAM needed for recommended local models at Q4 with 8K context:
| Model | Params | Q4 VRAM | Min GPU |
|---|---|---|---|
| Gemma 4 26B (MoE) | 26B | ~18.0 GB | 24 GB |
| Qwen3 30B-A3B (MoE) | 30B | ~19.8 GB | 24 GB |
| Qwen 3.5 35B-A3B (MoE) | 35B | ~23.0 GB | 24 GB |
| GPT-OSS 20B | 21B | ~13.7 GB | 16 GB |
| Qwen 2.5 Coder 32B Instruct | 32.5B | ~22.9 GB | 24 GB |
App compatibility
| Feature | Supported |
|---|---|
| Local models | Yes |
| OpenRouter | Yes |
| OpenAI-compatible API | Yes |
| Ollama | Yes |
| LM Studio | Yes |
| Anthropic API | Yes |
| Google API | No |
| Mistral API | No |
| Docker | No |
| Works offline | No |
| Needs GPU | Yes |
Recommended models
Best local models
Local vs cloud: which should you use?
Use local models if
- You want privacy — data never leaves your machine
- You already have a GPU with sufficient VRAM
- You want zero per-token API costs
- You need offline access
- You have at least 16-24 GB VRAM for recommended models
Use cloud/API if
- Your GPU has insufficient VRAM for the models you need
- You want access to frontier model quality
- You need maximum coding/reasoning performance
- You don't want to manage local model downloads and updates
- OpenRouter lets you switch between 300+ models with one API key
Setup overview
Setting up Claude Code is straightforward. It runs on cli, macos, linux. Full documentation is available at https://docs.anthropic.com/en/docs/claude-code.
Limitations
- Small local models — 20K system prompt overwhelms anything below 16 GB VRAM
- Non-Anthropic models — significant quality drop in tool calling
- Free-tier OpenRouter models — can be unreliable
Related
Recommended GPUs
Compatible models
Frequently asked questions
- What is Claude Code?
- Claude Code is Anthropic's official agentic coding CLI. Reads your entire codebase, plans and executes changes across files, runs tests, and iterates on failures. Claude Code is Anthropic's proprietary CLI-based agentic coding tool.
- Does Claude Code need a GPU?
- Practical minimum 16 GB VRAM for MoE models (Gemma 4 26B). 20-24 GB recommended for dense models. Community advice: 'Don't go below q6 if watching it, q8 if letting it run autonomously.' Minimum 32K context, 64K+ for reliable agentic use.
- Can Claude Code use OpenRouter?
- Yes. Claude Code supports OpenRouter for accessing 300+ models through a single API. See the official setup guide for details.
- Can Claude Code use local models via Ollama?
- Yes. Claude Code works with Ollama for running models locally. Install Ollama, pull your model (e.g., `ollama pull qwen2.5:7b`), and connect Claude Code to the local Ollama server. GPU requirements depend on the model you choose, not Claude Code itself.
- What is the best local model for Claude Code?
- For Claude Code, the community-verified best local model is Gemma 4 26B (MoE). 24 GB VRAM (RTX 3090/4090) for Qwen 2.5 Coder 32B at Q4_K_M with 64K context. For MoE models like Gemma 4 26B, 16 GB VRAM can work at Q4. Use llama.cpp directly instead of Ollama for better control over quants and context.
- Can I run Claude Code on 12 GB VRAM?
- 12 GB VRAM is generally not sufficient for serious agentic coding with Claude Code. You can run smaller models (7B-14B at Q4) but tool-calling reliability and context handling will be limited. For the best experience, 24 GB VRAM (RTX 3090/4090) is the community-recommended minimum for local agentic coding.
- Is Claude Code free?
- Claude Code is a paid product. It requires a subscription or one-time payment.