CanItRun Logocanitrun.
← All apps

KoboldCPP

Single-binary local inference for roleplay and storytelling. GGUF models, zero install, bundled KoboldAI Lite UI. The community go-to for AI storytelling.

App type

Local Llm Tool, Roleplay

Local models

Yes

OpenRouter

No

Ollama

No

GPU required

Only for local models

Best for

AI storytelling and interactive fiction

Setup difficulty

Easy

Platforms

Windows, macOS, Linux

Pricing

Open source — free

KoboldCPP is Single-binary local inference for roleplay and storytelling. GGUF models, zero install, bundled KoboldAI Lite UI. The community go-to for AI storytelling. KoboldCPP (10.5K GitHub stars) is a single-binary, zero-install local inference engine optimized for storytelling and roleplay.

KoboldCPP runs entirely on your local hardware. KoboldCPP is open source (https://github.com/LostRuins/koboldcpp), so you can inspect the code and self-host. 12 GB VRAM sufficient for good roleplay models at 4K context. CPU-only works for 7-8B models with 16 GB system RAM. Fimbulvetr-Kuro-Lotus-10.7B runs well on RTX 3060 12 GB at 4K context with 48 GPU layers.

Can it run on my hardware?

Minimum

12 GB VRAM sufficient for good roleplay models at 4K context. CPU-only works for 7-8B models with 16 GB system RAM. Fimbulvetr-Kuro-Lotus-10.7B runs well on RTX 3060 12 GB at 4K context with 48 GPU layers.

Recommended

12 GB VRAM (RTX 3060): 10-13B roleplay models at Q4 with 4-8K context. 24 GB VRAM (RTX 3090): 20-35B models at Q4. Pair with SillyTavern for the best roleplay experience — KoboldCPP handles inference, SillyTavern handles character management.

Approximate VRAM needed for recommended local models at Q4 with 8K context:

ModelParamsQ4 VRAMMin GPU
Qwen3 14B14.8B~10.8 GB12 GB
Qwen3 8B8B~6.4 GB8 GB
Gemma 3 12B Instruct12.2B~8.9 GB12 GB
Mistral Nemo 12B Instruct12.2B~9.2 GB12 GB
Command-R 35B35B~34.1 GB48 GB+

Check your GPU against these models in the calculator →

App compatibility

FeatureSupported
Local modelsYes
OpenRouterNo
OpenAI-compatible APIYes
OllamaNo
LM StudioNo
Anthropic APINo
Google APINo
Mistral APINo
DockerNo
Works offlineYes
Needs GPUNo

Recommended models

Best local models

Local vs cloud: which should you use?

Use local models if

  • You want privacy — data never leaves your machine
  • You already have a GPU with sufficient VRAM
  • You want zero per-token API costs
  • You need offline access

Use cloud/API if

  • Your GPU has insufficient VRAM for the models you need
  • You want access to frontier model quality
  • You need maximum coding/reasoning performance
  • You don't want to manage local model downloads and updates

Setup overview

Setting up KoboldCPP is straightforward. It runs on windows, macos, linux.

Limitations

  • Coding agents (not designed for tool calling)
  • General chat (use Open WebUI or LM Studio)
  • Cloud API access (local inference only)

Related

Recommended GPUs

Compatible models

Related apps

Frequently asked questions

What is KoboldCPP?
KoboldCPP is Single-binary local inference for roleplay and storytelling. GGUF models, zero install, bundled KoboldAI Lite UI. The community go-to for AI storytelling. KoboldCPP (10.5K GitHub stars) is a single-binary, zero-install local inference engine optimized for storytelling and roleplay.
Does KoboldCPP need a GPU?
KoboldCPP itself does not require a GPU. However, the models you connect to it do. 12 GB VRAM sufficient for good roleplay models at 4K context. CPU-only works for 7-8B models with 16 GB system RAM. Fimbulvetr-Kuro-Lotus-10.7B runs well on RTX 3060 12 GB at 4K context with 48 GPU layers.
Can I run KoboldCPP on CPU only?
Yes — KoboldCPP supports CPU-only operation, but performance will be significantly slower (5-10x) compared to GPU inference. CPU-only works best for models under 7B parameters with at least 16 GB of system RAM.
What models work best with KoboldCPP?
Models that work well with KoboldCPP include: Qwen3 14B, Qwen3 8B, Gemma 3 12B Instruct, Mistral Nemo 12B Instruct. The best model depends on your GPU's VRAM and your use case.
Is KoboldCPP free and open source?
Yes. KoboldCPP is open source and completely free. You can find the source code on GitHub at https://github.com/LostRuins/koboldcpp.