SillyTavern
Self-hosted chat interface for AI roleplay and creative writing. Deep character creation, lorebooks, group chats, and first-class OpenRouter support.
Chat Frontend, Roleplay
Yes
Yes
Yes
Only for local models
AI character roleplay and interactive fiction
Medium
Web, macOS, Linux, Windows
Open source — free
SillyTavern is Self-hosted chat interface for AI roleplay and creative writing. Deep character creation, lorebooks, group chats, and first-class OpenRouter support. SillyTavern is the community standard for AI roleplay with 10K+ GitHub stars.
SillyTavern runs entirely on your local hardware. It supports OpenRouter for unified access to 300+ models from a single API. Ollama integration lets you run models locally on your own GPU. SillyTavern is open source (https://github.com/SillyTavern/SillyTavern), so you can inspect the code and self-host. SillyTavern itself has no GPU requirement — runs on a Raspberry Pi 4 with 2 GB RAM. All GPU requirements come from the model backend you connect. For local roleplay, 12 GB VRAM is sufficient for good 13B-class models.
Can it run on my hardware?
Minimum
SillyTavern itself has no GPU requirement — runs on a Raspberry Pi 4 with 2 GB RAM. All GPU requirements come from the model backend you connect. For local roleplay, 12 GB VRAM is sufficient for good 13B-class models.
Recommended
12 GB VRAM (RTX 3060) for Mistral Nemo 12B or Gemma 3 12B at Q4. 24 GB VRAM (RTX 3090) for Qwen3-32B at Q4 or Qwen3-235B-A22B MoE at IQ4. Pair with KoboldCPP for the best local roleplay experience.
Approximate VRAM needed for recommended local models at Q4 with 8K context:
| Model | Params | Q4 VRAM | Min GPU |
|---|---|---|---|
| Qwen3 32B | 32.8B | ~22.2 GB | 24 GB |
| Mistral Nemo 12B Instruct | 12.2B | ~9.2 GB | 12 GB |
| Gemma 3 12B Instruct | 12.2B | ~8.9 GB | 12 GB |
| Qwen3 235B-A22B (MoE) | 235B | ~149.9 GB | 48 GB+ |
| Command-R 35B | 35B | ~34.1 GB | 48 GB+ |
App compatibility
| Feature | Supported |
|---|---|
| Local models | Yes |
| OpenRouter | Yes |
| OpenAI-compatible API | Yes |
| Ollama | Yes |
| LM Studio | Yes |
| Anthropic API | Yes |
| Google API | Yes |
| Mistral API | Yes |
| Docker | No |
| Works offline | Yes |
| Needs GPU | No |
Recommended models
Best local models
Local vs cloud: which should you use?
Use local models if
- You want privacy — data never leaves your machine
- You already have a GPU with sufficient VRAM
- You want zero per-token API costs
- You need offline access
Use cloud/API if
- Your GPU has insufficient VRAM for the models you need
- You want access to frontier model quality
- You need maximum coding/reasoning performance
- You don't want to manage local model downloads and updates
- OpenRouter lets you switch between 300+ models with one API key
Setup overview
Setting up SillyTavern is moderate in complexity. It runs on web, macos, linux, windows. Full documentation is available at https://docs.sillytavern.app.
Limitations
- General productivity chat (use Open WebUI or LibreChat instead)
- RAG or document Q&A (no built-in RAG)
- Beginners — steep learning curve for advanced features
Related
Compatible models
Related apps
Frequently asked questions
- What is SillyTavern?
- SillyTavern is Self-hosted chat interface for AI roleplay and creative writing. Deep character creation, lorebooks, group chats, and first-class OpenRouter support. SillyTavern is the community standard for AI roleplay with 10K+ GitHub stars.
- Does SillyTavern need a GPU?
- SillyTavern itself does not require a GPU. However, the models you connect to it do. SillyTavern itself has no GPU requirement — runs on a Raspberry Pi 4 with 2 GB RAM. All GPU requirements come from the model backend you connect. For local roleplay, 12 GB VRAM is sufficient for good 13B-class models.
- Can I run SillyTavern on CPU only?
- Yes — SillyTavern supports CPU-only operation, but performance will be significantly slower (5-10x) compared to GPU inference. CPU-only works best for models under 7B parameters with at least 16 GB of system RAM.
- Can SillyTavern use OpenRouter?
- Yes. SillyTavern supports OpenRouter for accessing 300+ models through a single API. Configure OpenRouter as a provider in SillyTavern's settings with your API key.
- Can SillyTavern use local models via Ollama?
- Yes. SillyTavern works with Ollama for running models locally. Install Ollama, pull your model (e.g., `ollama pull qwen2.5:7b`), and connect SillyTavern to the local Ollama server. GPU requirements depend on the model you choose, not SillyTavern itself.
- What models work best with SillyTavern?
- Models that work well with SillyTavern include: Qwen3 32B, Mistral Nemo 12B Instruct, Gemma 3 12B Instruct, Qwen3 235B-A22B (MoE), Command-R 35B, Gemma 4 26B (MoE). The best model depends on your GPU's VRAM and your use case.
- Is SillyTavern free and open source?
- Yes. SillyTavern is open source and completely free. You can find the source code on GitHub at https://github.com/SillyTavern/SillyTavern.