Question 1

What is KoboldCPP?

Accepted Answer

KoboldCPP is Single-binary local inference for roleplay and storytelling. GGUF models, zero install, bundled KoboldAI Lite UI. The community go-to for AI storytelling. KoboldCPP (10.5K GitHub stars) is a single-binary, zero-install local inference engine optimized for storytelling and roleplay.

Question 2

Does KoboldCPP need a GPU?

Accepted Answer

KoboldCPP itself does not require a GPU. However, the models you connect to it do. 12 GB VRAM sufficient for good roleplay models at 4K context. CPU-only works for 7-8B models with 16 GB system RAM. Fimbulvetr-Kuro-Lotus-10.7B runs well on RTX 3060 12 GB at 4K context with 48 GPU layers.

Question 3

Can I run KoboldCPP on CPU only?

Accepted Answer

Yes — KoboldCPP supports CPU-only operation, but performance will be significantly slower (5-10x) compared to GPU inference. CPU-only works best for models under 7B parameters with at least 16 GB of system RAM.

Question 4

What models work best with KoboldCPP?

Accepted Answer

Models that work well with KoboldCPP include: Qwen3 14B, Qwen3 8B, Gemma 3 12B Instruct, Mistral Nemo 12B Instruct. The best model depends on your GPU's VRAM and your use case.

Question 5

Is KoboldCPP free and open source?

Accepted Answer

Yes. KoboldCPP is open source and completely free. You can find the source code on GitHub at https://github.com/LostRuins/koboldcpp.

Model	Params	Q4 VRAM	Min GPU
Qwen3 14B	14.8B	~10.8 GB	12 GB
Qwen3 8B	8B	~6.4 GB	8 GB
Gemma 3 12B Instruct	12.2B	~8.9 GB	12 GB
Mistral Nemo 12B Instruct	12.2B	~9.2 GB	12 GB
Command-R 35B	35B	~34.1 GB	48 GB+

Feature	Supported
Local models	Yes
OpenRouter	No
OpenAI-compatible API	Yes
Ollama	No
LM Studio	No
Anthropic API	No
Google API	No
Mistral API	No
Docker	No
Works offline	Yes
Needs GPU	No

KoboldCPP

Can it run on my hardware?

App compatibility

Recommended models

Best local models

Local vs cloud: which should you use?

Use local models if

Use cloud/API if

Setup overview

Limitations

Related

Recommended GPUs

Compatible models

Related apps

Frequently asked questions