Question 1

What is llama.cpp?

Accepted Answer

llama.cpp is The engine underneath Ollama — but faster. Full control over quants, context, and grammars. Grammar file support enables GPT-OSS tool calling in Cline. llama.cpp is the open-source inference engine that powers Ollama, LM Studio, and most local LLM tools.

Question 2

Does llama.cpp need a GPU?

Accepted Answer

llama.cpp itself does not require a GPU. However, the models you connect to it do. Same VRAM requirements as Ollama for equivalent models, but 10-20% faster. Grammar file support lets you run GPT-OSS-20B with Cline tool calling at 16 GB VRAM (MXFP4 quant).

Question 3

Can I run llama.cpp on CPU only?

Accepted Answer

Yes — llama.cpp supports CPU-only operation, but performance will be significantly slower (5-10x) compared to GPU inference. CPU-only works best for models under 7B parameters with at least 16 GB of system RAM.

Question 4

What models work best with llama.cpp?

Accepted Answer

Models that work well with llama.cpp include: Qwen3 32B, Qwen3 30B-A3B (MoE), Qwen 2.5 Coder 32B Instruct, GPT-OSS 20B, Llama 3.1 70B Instruct. The best model depends on your GPU's VRAM and your use case.

Question 5

Is llama.cpp free and open source?

Accepted Answer

Yes. llama.cpp is open source and completely free. You can find the source code on GitHub at https://github.com/ggerganov/llama.cpp.

Model	Params	Q4 VRAM	Min GPU
Qwen3 32B	32.8B	~22.2 GB	24 GB
Qwen3 30B-A3B (MoE)	30B	~19.8 GB	24 GB
Qwen 2.5 Coder 32B Instruct	32.5B	~22.9 GB	24 GB
GPT-OSS 20B	21B	~13.7 GB	16 GB
Llama 3.1 70B Instruct	70B	~47.1 GB	48 GB+

Feature	Supported
Local models	Yes
OpenRouter	No
OpenAI-compatible API	Yes
Ollama	No
LM Studio	No
Anthropic API	No
Google API	No
Mistral API	No
Docker	No
Works offline	Yes
Needs GPU	No

llama.cpp

Can it run on my hardware?

App compatibility

Recommended models

Best local models

Local vs cloud: which should you use?

Use local models if

Use cloud/API if

Setup overview

Limitations

Related

Recommended GPUs

Compatible models

Related apps

Frequently asked questions