Local LLM Tools

6apps · local AI compatibility & hardware requirements

Local LLM tools are the engines that run models on your own GPU or CPU. They handle quantization, GPU acceleration, context management, and serve models via API. The choice between Ollama, LM Studio, llama.cpp, and vLLM depends on whether you want simplicity (Ollama, LM Studio), maximum performance (llama.cpp), or production serving (vLLM).

KoboldCPP
Single-binary local inference for roleplay and storytelling. GGUF models, zero install, bundled KoboldAI Lite UI. The community go-to for AI storytelling.
Runs locally
llama.cpp
The engine underneath Ollama — but faster. Full control over quants, context, and grammars. Grammar file support enables GPT-OSS tool calling in Cline.
Runs locally
LM Studio
Desktop app for running local LLMs with zero setup. In-app model browser, visual GPU fit indicator, and one-click GGUF downloads from Hugging Face.
Runs locally
Ollama
The industry standard for running LLMs locally. Simple CLI, massive model library (100K+), OpenAI-compatible API on port 11434. Powers Open WebUI, Continue, and more.
Runs locally
text-generation-webui
Power-user local LLM frontend with maximum backend flexibility. Transformers, ExLlamaV2/V3, llama.cpp, GPTQ, AWQ — all in one web UI.
Runs locally
vLLM
Production-grade LLM serving engine. PagedAttention for efficient KV cache, high throughput, multi-user API serving. For deployments, not single-user chat.
Runs locally

Want to check if your GPU can run the models these apps need? Use the homepage calculator to see which models fit your hardware with estimated tokens per second.