CanItRun Logocanitrun.
← All apps

text-generation-webui

Power-user local LLM frontend with maximum backend flexibility. Transformers, ExLlamaV2/V3, llama.cpp, GPTQ, AWQ — all in one web UI.

App type

Local Llm Tool, Chat Frontend

Local models

Yes

OpenRouter

No

Ollama

No

GPU required

Yes — for local inference

Best for

Power users needing maximum backend/model format flexibility

Setup difficulty

Hard

Platforms

Web, Windows, Linux

Pricing

Open source — free

text-generation-webui is Power-user local LLM frontend with maximum backend flexibility. Transformers, ExLlamaV2/V3, llama.cpp, GPTQ, AWQ — all in one web UI. text-generation-webui (formerly Oobabooga, now TextGen v4.x, 47K GitHub stars) is a self-hosted web UI for local LLMs that supports the widest range of backends: Transformers, ExLlamaV2/V3, llama.cpp, AutoGPTQ, AutoAWQ, and more.

text-generation-webui runs entirely on your local hardware. text-generation-webui is open source (https://github.com/oobabooga/text-generation-webui), so you can inspect the code and self-host. 8 GB VRAM minimum for 7B models. The web UI itself is lightweight. GPU requirements come from the model and backend you choose. ExLlamaV2 is the fastest for NVIDIA GPUs.

Can it run on my hardware?

Minimum

8 GB VRAM minimum for 7B models. The web UI itself is lightweight. GPU requirements come from the model and backend you choose. ExLlamaV2 is the fastest for NVIDIA GPUs.

Recommended

24 GB VRAM (RTX 3090/4090) for 30B models with ExLlamaV2. 48 GB+ for 70B at full precision. The Python environment needs separate management — use conda or venv to avoid system package conflicts.

Approximate VRAM needed for recommended local models at Q4 with 8K context:

ModelParamsQ4 VRAMMin GPU
Qwen3 32B32.8B~22.2 GB24 GB
Llama 3.1 70B Instruct70B~47.1 GB48 GB+
Qwen3 235B-A22B (MoE)235B~149.9 GB48 GB+
Mistral Small 22B22.2B~16.1 GB24 GB

Check your GPU against these models in the calculator →

App compatibility

FeatureSupported
Local modelsYes
OpenRouterNo
OpenAI-compatible APIYes
OllamaNo
LM StudioNo
Anthropic APINo
Google APINo
Mistral APINo
DockerNo
Works offlineYes
Needs GPUYes

Recommended models

Best local models

Local vs cloud: which should you use?

Use local models if

  • You want privacy — data never leaves your machine
  • You already have a GPU with sufficient VRAM
  • You want zero per-token API costs
  • You need offline access
  • You have at least 16-24 GB VRAM for recommended models

Use cloud/API if

  • Your GPU has insufficient VRAM for the models you need
  • You want access to frontier model quality
  • You need maximum coding/reasoning performance
  • You don't want to manage local model downloads and updates

Setup overview

Setting up text-generation-webui is complex and requires technical knowledge. It runs on web, windows, linux.

Limitations

  • Beginners — use LM Studio or Ollama instead
  • Stable production setups — updates can break things
  • OpenRouter/cloud API access (local inference frontend)

Related

Recommended GPUs

Compatible models

Related apps

Frequently asked questions

What is text-generation-webui?
text-generation-webui is Power-user local LLM frontend with maximum backend flexibility. Transformers, ExLlamaV2/V3, llama.cpp, GPTQ, AWQ — all in one web UI. text-generation-webui (formerly Oobabooga, now TextGen v4.x, 47K GitHub stars) is a self-hosted web UI for local LLMs that supports the widest range of backends: Transformers, ExLlamaV2/V3, llama.cpp, AutoGPTQ, AutoAWQ, and more.
Does text-generation-webui need a GPU?
8 GB VRAM minimum for 7B models. The web UI itself is lightweight. GPU requirements come from the model and backend you choose. ExLlamaV2 is the fastest for NVIDIA GPUs.
Can I run text-generation-webui on CPU only?
Yes — text-generation-webui supports CPU-only operation, but performance will be significantly slower (5-10x) compared to GPU inference. CPU-only works best for models under 7B parameters with at least 16 GB of system RAM.
What models work best with text-generation-webui?
Models that work well with text-generation-webui include: Qwen3 32B, Llama 3.1 70B Instruct, Qwen3 235B-A22B (MoE), Mistral Small 22B. The best model depends on your GPU's VRAM and your use case.
Is text-generation-webui free and open source?
Yes. text-generation-webui is open source and completely free. You can find the source code on GitHub at https://github.com/oobabooga/text-generation-webui.