Local LLM Tools
6apps · local AI compatibility & hardware requirements
Local LLM tools are the engines that run models on your own GPU or CPU. They handle quantization, GPU acceleration, context management, and serve models via API. The choice between Ollama, LM Studio, llama.cpp, and vLLM depends on whether you want simplicity (Ollama, LM Studio), maximum performance (llama.cpp), or production serving (vLLM).
- KoboldCPPSingle-binary local inference for roleplay and storytelling. GGUF models, zero install, bundled KoboldAI Lite UI. The community go-to for AI storytelling.Runs locally
- llama.cppThe engine underneath Ollama — but faster. Full control over quants, context, and grammars. Grammar file support enables GPT-OSS tool calling in Cline.Runs locally
- LM StudioDesktop app for running local LLMs with zero setup. In-app model browser, visual GPU fit indicator, and one-click GGUF downloads from Hugging Face.Runs locally
- OllamaThe industry standard for running LLMs locally. Simple CLI, massive model library (100K+), OpenAI-compatible API on port 11434. Powers Open WebUI, Continue, and more.Runs locally
- text-generation-webuiPower-user local LLM frontend with maximum backend flexibility. Transformers, ExLlamaV2/V3, llama.cpp, GPTQ, AWQ — all in one web UI.Runs locally
- vLLMProduction-grade LLM serving engine. PagedAttention for efficient KV cache, high throughput, multi-user API serving. For deployments, not single-user chat.Runs locally
Want to check if your GPU can run the models these apps need? Use the homepage calculator to see which models fit your hardware with estimated tokens per second.