LM Studio
Desktop app for running local LLMs with zero setup. In-app model browser, visual GPU fit indicator, and one-click GGUF downloads from Hugging Face.
Local Llm Tool, Chat Frontend
Yes
No
No
Only for local models
Easiest way to run LLMs locally without CLI
Easy
macOS, Windows, Linux
Free (not open source)
LM Studio is Desktop app for running local LLMs with zero setup. In-app model browser, visual GPU fit indicator, and one-click GGUF downloads from Hugging Face. LM Studio is a desktop application (macOS, Windows, Linux beta) for downloading, running, and serving GGUF-quantized LLMs locally.
LM Studio runs entirely on your local hardware. LM Studio is free for personal use but not open source. 4 GB+ VRAM minimum. 8 GB VRAM recommended for usable speeds with 7B models. Apple Silicon Macs with 16 GB+ unified memory run very well via Metal acceleration. CPU-only works but is 5-10x slower for 7B+ models.
Can it run on my hardware?
Minimum
4 GB+ VRAM minimum. 8 GB VRAM recommended for usable speeds with 7B models. Apple Silicon Macs with 16 GB+ unified memory run very well via Metal acceleration. CPU-only works but is 5-10x slower for 7B+ models.
Recommended
8 GB VRAM: Llama 3.1 8B, Mistral 7B, Gemma 3 12B (Q4). 12 GB VRAM: Mistral Nemo 12B, Qwen 2.5 14B. 16 GB VRAM: Qwen 3 32B (Q4), Mistral Small 22B. 24 GB+: Llama 3.1 70B (Q4), Qwen 3 235B-A22B MoE. LM Studio's in-app GPU indicator shows what fits before downloading.
Approximate VRAM needed for recommended local models at Q4 with 8K context:
| Model | Params | Q4 VRAM | Min GPU |
|---|---|---|---|
| Qwen3 32B | 32.8B | ~22.2 GB | 24 GB |
| Qwen3 14B | 14.8B | ~10.8 GB | 12 GB |
| Qwen 2.5 7B Instruct | 7.6B | ~5.3 GB | 8 GB |
| Llama 3.1 8B Instruct | 8B | ~6.3 GB | 8 GB |
| Gemma 3 12B Instruct | 12.2B | ~8.9 GB | 12 GB |
| Mistral Nemo 12B Instruct | 12.2B | ~9.2 GB | 12 GB |
| Llama 3.1 70B Instruct | 70B | ~47.1 GB | 48 GB+ |
| Qwen3 235B-A22B (MoE) | 235B | ~149.9 GB | 48 GB+ |
App compatibility
| Feature | Supported |
|---|---|
| Local models | Yes |
| OpenRouter | No |
| OpenAI-compatible API | Yes |
| Ollama | No |
| LM Studio | Yes |
| Anthropic API | No |
| Google API | No |
| Mistral API | No |
| Docker | No |
| Works offline | Yes |
| Needs GPU | No |
Recommended models
Best local models
Qwen3 32B
32.8B params · ~22.2 GB at Q4 · Dense
Qwen3 14B
14.8B params · ~10.8 GB at Q4 · Dense
Qwen 2.5 7B Instruct
7.6B params · ~5.3 GB at Q4 · Dense
Llama 3.1 8B Instruct
8B params · ~6.3 GB at Q4 · Dense
Gemma 3 12B Instruct
12.2B params · ~8.9 GB at Q4 · Dense
Mistral Nemo 12B Instruct
12.2B params · ~9.2 GB at Q4 · Dense
Llama 3.1 70B Instruct
70B params · ~47.1 GB at Q4 · Dense
Qwen3 235B-A22B (MoE)
235B params · ~149.9 GB at Q4 · MoE
Local vs cloud: which should you use?
Use local models if
- You want privacy — data never leaves your machine
- You already have a GPU with sufficient VRAM
- You want zero per-token API costs
- You need offline access
Use cloud/API if
- Your GPU has insufficient VRAM for the models you need
- You want access to frontier model quality
- You need maximum coding/reasoning performance
- You don't want to manage local model downloads and updates
Setup overview
Setting up LM Studio is straightforward. It runs on macos, windows, linux. Full documentation is available at https://lmstudio.ai/docs.
Limitations
- OpenRouter/cloud API access (local inference only)
- Multi-user deployments
- Advanced formats beyond GGUF (no AWQ, GPTQ, exl2)
- RAG or document Q&A
Related
Recommended GPUs
Compatible models
Related apps
Frequently asked questions
- What is LM Studio?
- LM Studio is Desktop app for running local LLMs with zero setup. In-app model browser, visual GPU fit indicator, and one-click GGUF downloads from Hugging Face. LM Studio is a desktop application (macOS, Windows, Linux beta) for downloading, running, and serving GGUF-quantized LLMs locally.
- Does LM Studio need a GPU?
- LM Studio itself does not require a GPU. However, the models you connect to it do. 4 GB+ VRAM minimum. 8 GB VRAM recommended for usable speeds with 7B models. Apple Silicon Macs with 16 GB+ unified memory run very well via Metal acceleration. CPU-only works but is 5-10x slower for 7B+ models.
- Can I run LM Studio on CPU only?
- Yes — LM Studio supports CPU-only operation, but performance will be significantly slower (5-10x) compared to GPU inference. CPU-only works best for models under 7B parameters with at least 16 GB of system RAM.
- What models work best with LM Studio?
- Models that work well with LM Studio include: Qwen3 32B, Qwen3 14B, Qwen 2.5 7B Instruct, Llama 3.1 8B Instruct, Gemma 3 12B Instruct, Mistral Nemo 12B Instruct. The best model depends on your GPU's VRAM and your use case.
- Is LM Studio free?
- LM Studio is free for personal use but is not open source.