text-generation-webui
Power-user local LLM frontend with maximum backend flexibility. Transformers, ExLlamaV2/V3, llama.cpp, GPTQ, AWQ — all in one web UI.
Local Llm Tool, Chat Frontend
Yes
No
No
Yes — for local inference
Power users needing maximum backend/model format flexibility
Hard
Web, Windows, Linux
Open source — free
text-generation-webui is Power-user local LLM frontend with maximum backend flexibility. Transformers, ExLlamaV2/V3, llama.cpp, GPTQ, AWQ — all in one web UI. text-generation-webui (formerly Oobabooga, now TextGen v4.x, 47K GitHub stars) is a self-hosted web UI for local LLMs that supports the widest range of backends: Transformers, ExLlamaV2/V3, llama.cpp, AutoGPTQ, AutoAWQ, and more.
text-generation-webui runs entirely on your local hardware. text-generation-webui is open source (https://github.com/oobabooga/text-generation-webui), so you can inspect the code and self-host. 8 GB VRAM minimum for 7B models. The web UI itself is lightweight. GPU requirements come from the model and backend you choose. ExLlamaV2 is the fastest for NVIDIA GPUs.
Can it run on my hardware?
Minimum
8 GB VRAM minimum for 7B models. The web UI itself is lightweight. GPU requirements come from the model and backend you choose. ExLlamaV2 is the fastest for NVIDIA GPUs.
Recommended
24 GB VRAM (RTX 3090/4090) for 30B models with ExLlamaV2. 48 GB+ for 70B at full precision. The Python environment needs separate management — use conda or venv to avoid system package conflicts.
Approximate VRAM needed for recommended local models at Q4 with 8K context:
| Model | Params | Q4 VRAM | Min GPU |
|---|---|---|---|
| Qwen3 32B | 32.8B | ~22.2 GB | 24 GB |
| Llama 3.1 70B Instruct | 70B | ~47.1 GB | 48 GB+ |
| Qwen3 235B-A22B (MoE) | 235B | ~149.9 GB | 48 GB+ |
| Mistral Small 22B | 22.2B | ~16.1 GB | 24 GB |
App compatibility
| Feature | Supported |
|---|---|
| Local models | Yes |
| OpenRouter | No |
| OpenAI-compatible API | Yes |
| Ollama | No |
| LM Studio | No |
| Anthropic API | No |
| Google API | No |
| Mistral API | No |
| Docker | No |
| Works offline | Yes |
| Needs GPU | Yes |
Recommended models
Local vs cloud: which should you use?
Use local models if
- You want privacy — data never leaves your machine
- You already have a GPU with sufficient VRAM
- You want zero per-token API costs
- You need offline access
- You have at least 16-24 GB VRAM for recommended models
Use cloud/API if
- Your GPU has insufficient VRAM for the models you need
- You want access to frontier model quality
- You need maximum coding/reasoning performance
- You don't want to manage local model downloads and updates
Setup overview
Setting up text-generation-webui is complex and requires technical knowledge. It runs on web, windows, linux.
Limitations
- Beginners — use LM Studio or Ollama instead
- Stable production setups — updates can break things
- OpenRouter/cloud API access (local inference frontend)
Related
Recommended GPUs
Compatible models
Related apps
Frequently asked questions
- What is text-generation-webui?
- text-generation-webui is Power-user local LLM frontend with maximum backend flexibility. Transformers, ExLlamaV2/V3, llama.cpp, GPTQ, AWQ — all in one web UI. text-generation-webui (formerly Oobabooga, now TextGen v4.x, 47K GitHub stars) is a self-hosted web UI for local LLMs that supports the widest range of backends: Transformers, ExLlamaV2/V3, llama.cpp, AutoGPTQ, AutoAWQ, and more.
- Does text-generation-webui need a GPU?
- 8 GB VRAM minimum for 7B models. The web UI itself is lightweight. GPU requirements come from the model and backend you choose. ExLlamaV2 is the fastest for NVIDIA GPUs.
- Can I run text-generation-webui on CPU only?
- Yes — text-generation-webui supports CPU-only operation, but performance will be significantly slower (5-10x) compared to GPU inference. CPU-only works best for models under 7B parameters with at least 16 GB of system RAM.
- What models work best with text-generation-webui?
- Models that work well with text-generation-webui include: Qwen3 32B, Llama 3.1 70B Instruct, Qwen3 235B-A22B (MoE), Mistral Small 22B. The best model depends on your GPU's VRAM and your use case.
- Is text-generation-webui free and open source?
- Yes. text-generation-webui is open source and completely free. You can find the source code on GitHub at https://github.com/oobabooga/text-generation-webui.