Continue
Open-source AI code assistant for VS Code and JetBrains. Tab autocomplete, chat, and agent mode with separate models per role — like a local Copilot.
Coding Agent, Developer Tool
Yes
Yes
Yes
No — runs in the cloud
Copilot-like autocomplete with local models for privacy
Medium
VS Code, JetBrains
Open source — free
Continue is Open-source AI code assistant for VS Code and JetBrains. Tab autocomplete, chat, and agent mode with separate models per role — like a local Copilot. Continue (31K+ GitHub stars) is an open-source AI code assistant that integrates deeply into VS Code and JetBrains.
Continue works with both local models and cloud APIs. It supports OpenRouter for unified access to 300+ models from a single API. Ollama integration lets you run models locally on your own GPU. Continue is open source (https://github.com/continuedev/continue), so you can inspect the code and self-host. 8 GB VRAM for 7B autocomplete/chat models. 16 GB for 14B agent mode. Agent mode with local models requires explicit tool_use capability config.
Can it run on my hardware?
Minimum
8 GB VRAM for 7B autocomplete/chat models. 16 GB for 14B agent mode. Agent mode with local models requires explicit tool_use capability config.
Recommended
16 GB VRAM for Qwen3-14B at Q4 as agent model. 24 GB+ for Qwen3.5-35B-A3B MoE. Consider using a small local model for autocomplete and a cloud model via OpenRouter for agent tasks.
Approximate VRAM needed for recommended local models at Q4 with 8K context:
| Model | Params | Q4 VRAM | Min GPU |
|---|---|---|---|
| Qwen 3.5 35B-A3B (MoE) | 35B | ~23.0 GB | 24 GB |
| Qwen3 32B | 32.8B | ~22.2 GB | 24 GB |
| Gemma 4 26B (MoE) | 26B | ~18.0 GB | 24 GB |
| Qwen3 8B | 8B | ~6.4 GB | 8 GB |
| Qwen3 14B | 14.8B | ~10.8 GB | 12 GB |
App compatibility
| Feature | Supported |
|---|---|
| Local models | Yes |
| OpenRouter | Yes |
| OpenAI-compatible API | Yes |
| Ollama | Yes |
| LM Studio | Yes |
| Anthropic API | Yes |
| Google API | Yes |
| Mistral API | No |
| Docker | No |
| Works offline | No |
| Needs GPU | No |
Recommended models
Best local models
Local vs cloud: which should you use?
Use local models if
- You want privacy — data never leaves your machine
- You already have a GPU with sufficient VRAM
- You want zero per-token API costs
- You need offline access
Use cloud/API if
- Your GPU has insufficient VRAM for the models you need
- You want access to frontier model quality
- You need maximum coding/reasoning performance
- You don't want to manage local model downloads and updates
- OpenRouter lets you switch between 300+ models with one API key
Setup overview
Setting up Continue is moderate in complexity. It runs on vscode, jetbrains. Full documentation is available at https://docs.continue.dev.
Limitations
- Autonomous multi-step agentic coding (use Cline/Roo Code)
- Reliable local model agent mode (immature — use Aider instead)
- Beginners — config.json editing required
Related
Compatible models
Frequently asked questions
- What is Continue?
- Continue is Open-source AI code assistant for VS Code and JetBrains. Tab autocomplete, chat, and agent mode with separate models per role — like a local Copilot. Continue (31K+ GitHub stars) is an open-source AI code assistant that integrates deeply into VS Code and JetBrains.
- Does Continue need a GPU?
- Continue itself does not require a GPU. However, the models you connect to it do. 8 GB VRAM for 7B autocomplete/chat models. 16 GB for 14B agent mode. Agent mode with local models requires explicit tool_use capability config.
- Can I run Continue on CPU only?
- Yes — Continue supports CPU-only operation, but performance will be significantly slower (5-10x) compared to GPU inference. CPU-only works best for models under 7B parameters with at least 16 GB of system RAM.
- Can Continue use OpenRouter?
- Yes. Continue supports OpenRouter for accessing 300+ models through a single API. Configure OpenRouter as a provider in Continue's settings with your API key.
- Can Continue use local models via Ollama?
- Yes. Continue works with Ollama for running models locally. Install Ollama, pull your model (e.g., `ollama pull qwen2.5:7b`), and connect Continue to the local Ollama server. GPU requirements depend on the model you choose, not Continue itself.
- What is the best local model for Continue?
- For Continue, the community-verified best local model is Qwen 3.5 35B-A3B (MoE). 16 GB VRAM for Qwen3-14B at Q4 as agent model. 24 GB+ for Qwen3.5-35B-A3B MoE. Consider using a small local model for autocomplete and a cloud model via OpenRouter for agent tasks.
- Can I run Continue on 12 GB VRAM?
- 12 GB VRAM is generally not sufficient for serious agentic coding with Continue. You can run smaller models (7B-14B at Q4) but tool-calling reliability and context handling will be limited. For the best experience, 24 GB VRAM (RTX 3090/4090) is the community-recommended minimum for local agentic coding.
- Is Continue free and open source?
- Yes. Continue is open source and completely free. You can find the source code on GitHub at https://github.com/continuedev/continue.