KoboldCPP
Single-binary local inference for roleplay and storytelling. GGUF models, zero install, bundled KoboldAI Lite UI. The community go-to for AI storytelling.
Local Llm Tool, Roleplay
Yes
No
No
Only for local models
AI storytelling and interactive fiction
Easy
Windows, macOS, Linux
Open source — free
KoboldCPP is Single-binary local inference for roleplay and storytelling. GGUF models, zero install, bundled KoboldAI Lite UI. The community go-to for AI storytelling. KoboldCPP (10.5K GitHub stars) is a single-binary, zero-install local inference engine optimized for storytelling and roleplay.
KoboldCPP runs entirely on your local hardware. KoboldCPP is open source (https://github.com/LostRuins/koboldcpp), so you can inspect the code and self-host. 12 GB VRAM sufficient for good roleplay models at 4K context. CPU-only works for 7-8B models with 16 GB system RAM. Fimbulvetr-Kuro-Lotus-10.7B runs well on RTX 3060 12 GB at 4K context with 48 GPU layers.
Can it run on my hardware?
Minimum
12 GB VRAM sufficient for good roleplay models at 4K context. CPU-only works for 7-8B models with 16 GB system RAM. Fimbulvetr-Kuro-Lotus-10.7B runs well on RTX 3060 12 GB at 4K context with 48 GPU layers.
Recommended
12 GB VRAM (RTX 3060): 10-13B roleplay models at Q4 with 4-8K context. 24 GB VRAM (RTX 3090): 20-35B models at Q4. Pair with SillyTavern for the best roleplay experience — KoboldCPP handles inference, SillyTavern handles character management.
Approximate VRAM needed for recommended local models at Q4 with 8K context:
| Model | Params | Q4 VRAM | Min GPU |
|---|---|---|---|
| Qwen3 14B | 14.8B | ~10.8 GB | 12 GB |
| Qwen3 8B | 8B | ~6.4 GB | 8 GB |
| Gemma 3 12B Instruct | 12.2B | ~8.9 GB | 12 GB |
| Mistral Nemo 12B Instruct | 12.2B | ~9.2 GB | 12 GB |
| Command-R 35B | 35B | ~34.1 GB | 48 GB+ |
App compatibility
| Feature | Supported |
|---|---|
| Local models | Yes |
| OpenRouter | No |
| OpenAI-compatible API | Yes |
| Ollama | No |
| LM Studio | No |
| Anthropic API | No |
| Google API | No |
| Mistral API | No |
| Docker | No |
| Works offline | Yes |
| Needs GPU | No |
Recommended models
Local vs cloud: which should you use?
Use local models if
- You want privacy — data never leaves your machine
- You already have a GPU with sufficient VRAM
- You want zero per-token API costs
- You need offline access
Use cloud/API if
- Your GPU has insufficient VRAM for the models you need
- You want access to frontier model quality
- You need maximum coding/reasoning performance
- You don't want to manage local model downloads and updates
Setup overview
Setting up KoboldCPP is straightforward. It runs on windows, macos, linux.
Limitations
- Coding agents (not designed for tool calling)
- General chat (use Open WebUI or LM Studio)
- Cloud API access (local inference only)
Related
Recommended GPUs
Compatible models
Related apps
Frequently asked questions
- What is KoboldCPP?
- KoboldCPP is Single-binary local inference for roleplay and storytelling. GGUF models, zero install, bundled KoboldAI Lite UI. The community go-to for AI storytelling. KoboldCPP (10.5K GitHub stars) is a single-binary, zero-install local inference engine optimized for storytelling and roleplay.
- Does KoboldCPP need a GPU?
- KoboldCPP itself does not require a GPU. However, the models you connect to it do. 12 GB VRAM sufficient for good roleplay models at 4K context. CPU-only works for 7-8B models with 16 GB system RAM. Fimbulvetr-Kuro-Lotus-10.7B runs well on RTX 3060 12 GB at 4K context with 48 GPU layers.
- Can I run KoboldCPP on CPU only?
- Yes — KoboldCPP supports CPU-only operation, but performance will be significantly slower (5-10x) compared to GPU inference. CPU-only works best for models under 7B parameters with at least 16 GB of system RAM.
- What models work best with KoboldCPP?
- Models that work well with KoboldCPP include: Qwen3 14B, Qwen3 8B, Gemma 3 12B Instruct, Mistral Nemo 12B Instruct. The best model depends on your GPU's VRAM and your use case.
- Is KoboldCPP free and open source?
- Yes. KoboldCPP is open source and completely free. You can find the source code on GitHub at https://github.com/LostRuins/koboldcpp.