AnythingLLM
All-in-one local AI workspace with best-in-class RAG. Upload documents, chat with your files, build no-code AI agents, and connect 30+ LLM providers.
Productivity, Self Hosted
Yes
Yes
Yes
Only for local models
Chat with your documents (PDFs, codebases, documentation)
Easy
macOS, Windows, Linux, Docker
Open source — free
AnythingLLM is All-in-one local AI workspace with best-in-class RAG. Upload documents, chat with your files, build no-code AI agents, and connect 30+ LLM providers. AnythingLLM (53K+ GitHub stars) is the most complete local AI workspace tool.
AnythingLLM runs entirely on your local hardware. It supports OpenRouter for unified access to 300+ models from a single API. Ollama integration lets you run models locally on your own GPU. AnythingLLM is open source (https://github.com/Mintplex-Labs/anything-llm), so you can inspect the code and self-host. Desktop app runs on standard hardware — no GPU required. For local RAG with 7B models: 16 GB system RAM. Embedding model (nomic-embed-text) is lightweight.
Can it run on my hardware?
Minimum
Desktop app runs on standard hardware — no GPU required. For local RAG with 7B models: 16 GB system RAM. Embedding model (nomic-embed-text) is lightweight.
Recommended
16 GB system RAM + 8 GB VRAM for local RAG with 7B models. 24 GB VRAM for 32B models with RAG. For production RAG with best quality, use OpenRouter with Claude or GPT models — GPU not needed.
Approximate VRAM needed for recommended local models at Q4 with 8K context:
| Model | Params | Q4 VRAM | Min GPU |
|---|---|---|---|
| Qwen3 32B | 32.8B | ~22.2 GB | 24 GB |
| Llama 3.1 8B Instruct | 8B | ~6.3 GB | 8 GB |
| Gemma 3 12B Instruct | 12.2B | ~8.9 GB | 12 GB |
| Qwen3 14B | 14.8B | ~10.8 GB | 12 GB |
| Mistral Nemo 12B Instruct | 12.2B | ~9.2 GB | 12 GB |
App compatibility
| Feature | Supported |
|---|---|
| Local models | Yes |
| OpenRouter | Yes |
| OpenAI-compatible API | Yes |
| Ollama | Yes |
| LM Studio | Yes |
| Anthropic API | Yes |
| Google API | Yes |
| Mistral API | No |
| Docker | Yes |
| Works offline | Yes |
| Needs GPU | No |
Recommended models
Best local models
Local vs cloud: which should you use?
Use local models if
- You want privacy — data never leaves your machine
- You already have a GPU with sufficient VRAM
- You want zero per-token API costs
- You need offline access
Use cloud/API if
- Your GPU has insufficient VRAM for the models you need
- You want access to frontier model quality
- You need maximum coding/reasoning performance
- You don't want to manage local model downloads and updates
- OpenRouter lets you switch between 300+ models with one API key
Setup overview
Setting up AnythingLLM is straightforward. It runs on macos, windows, linux, docker. Full documentation is available at https://docs.anythingllm.com.
Limitations
- Coding agent workflows (use Cline or Aider instead)
- Simple chat (use Open WebUI if you don't need RAG)
- Built-in LLM only — the built-in engine is basic, pair with Ollama or OpenRouter
Related
Compatible models
Related apps
Frequently asked questions
- What is AnythingLLM?
- AnythingLLM is All-in-one local AI workspace with best-in-class RAG. Upload documents, chat with your files, build no-code AI agents, and connect 30+ LLM providers. AnythingLLM (53K+ GitHub stars) is the most complete local AI workspace tool.
- Does AnythingLLM need a GPU?
- AnythingLLM itself does not require a GPU. However, the models you connect to it do. Desktop app runs on standard hardware — no GPU required. For local RAG with 7B models: 16 GB system RAM. Embedding model (nomic-embed-text) is lightweight.
- Can I run AnythingLLM on CPU only?
- Yes — AnythingLLM supports CPU-only operation, but performance will be significantly slower (5-10x) compared to GPU inference. CPU-only works best for models under 7B parameters with at least 16 GB of system RAM.
- Can AnythingLLM use OpenRouter?
- Yes. AnythingLLM supports OpenRouter for accessing 300+ models through a single API. Configure OpenRouter as a provider in AnythingLLM's settings with your API key.
- Can AnythingLLM use local models via Ollama?
- Yes. AnythingLLM works with Ollama for running models locally. Install Ollama, pull your model (e.g., `ollama pull qwen2.5:7b`), and connect AnythingLLM to the local Ollama server. GPU requirements depend on the model you choose, not AnythingLLM itself.
- Is AnythingLLM free and open source?
- Yes. AnythingLLM is open source and completely free. You can find the source code on GitHub at https://github.com/Mintplex-Labs/anything-llm.