Can I run LM Studio on CPU only?

Yes — LM Studio supports CPU-only operation, but performance will be significantly slower (5-10x) compared to GPU inference. CPU-only works best for models under 7B parameters with at least 16 GB of system RAM.

LM Studio is free for personal use but is not open source.

← All apps

LM Studio

Local Llm Tool Chat Frontend

Desktop app for running local LLMs with zero setup. In-app model browser, visual GPU fit indicator, and one-click GGUF downloads from Hugging Face.

App type

Local Llm Tool, Chat Frontend

Local models

Yes

OpenRouter

Ollama

GPU required

Only for local models

Best for

Easiest way to run LLMs locally without CLI

Setup difficulty

Easy

Platforms

macOS, Windows, Linux

Pricing

Free (not open source)

LM Studio is Desktop app for running local LLMs with zero setup. In-app model browser, visual GPU fit indicator, and one-click GGUF downloads from Hugging Face. LM Studio is a desktop application (macOS, Windows, Linux beta) for downloading, running, and serving GGUF-quantized LLMs locally.

LM Studio runs entirely on your local hardware. LM Studio is free for personal use but not open source. 4 GB+ VRAM minimum. 8 GB VRAM recommended for usable speeds with 7B models. Apple Silicon Macs with 16 GB+ unified memory run very well via Metal acceleration. CPU-only works but is 5-10x slower for 7B+ models.

Can it run on my hardware?

Minimum

4 GB+ VRAM minimum. 8 GB VRAM recommended for usable speeds with 7B models. Apple Silicon Macs with 16 GB+ unified memory run very well via Metal acceleration. CPU-only works but is 5-10x slower for 7B+ models.

Recommended

8 GB VRAM: Llama 3.1 8B, Mistral 7B, Gemma 3 12B (Q4). 12 GB VRAM: Mistral Nemo 12B, Qwen 2.5 14B. 16 GB VRAM: Qwen 3 32B (Q4), Mistral Small 22B. 24 GB+: Llama 3.1 70B (Q4), Qwen 3 235B-A22B MoE. LM Studio's in-app GPU indicator shows what fits before downloading.

Approximate VRAM needed for recommended local models at Q4 with 8K context:

Model	Params	Q4 VRAM	Min GPU
Qwen3 32B	32.8B	~22.2 GB	24 GB
Qwen3 14B	14.8B	~10.8 GB	12 GB
Qwen 2.5 7B Instruct	7.6B	~5.3 GB	8 GB
Llama 3.1 8B Instruct	8B	~6.3 GB	8 GB
Gemma 3 12B Instruct	12.2B	~8.9 GB	12 GB
Mistral Nemo 12B Instruct	12.2B	~9.2 GB	12 GB
Llama 3.1 70B Instruct	70B	~47.1 GB	48 GB+
Qwen3 235B-A22B (MoE)	235B	~149.9 GB	48 GB+

Check your GPU against these models in the calculator →

App compatibility

Feature	Supported
Local models	Yes
OpenRouter	No
OpenAI-compatible API	Yes
Ollama	No
LM Studio	Yes
Anthropic API	No
Google API	No
Mistral API	No
Docker	No
Works offline	Yes
Needs GPU	No

Recommended models

Best local models

Qwen3 32B

32.8B params · ~22.2 GB at Q4 · Dense

Qwen3 14B

14.8B params · ~10.8 GB at Q4 · Dense

Qwen 2.5 7B Instruct

7.6B params · ~5.3 GB at Q4 · Dense

Llama 3.1 8B Instruct

8B params · ~6.3 GB at Q4 · Dense

Gemma 3 12B Instruct

12.2B params · ~8.9 GB at Q4 · Dense

Mistral Nemo 12B Instruct

12.2B params · ~9.2 GB at Q4 · Dense

Llama 3.1 70B Instruct

70B params · ~47.1 GB at Q4 · Dense

Qwen3 235B-A22B (MoE)

235B params · ~149.9 GB at Q4 · MoE

Local vs cloud: which should you use?

Use local models if

You want privacy — data never leaves your machine
You already have a GPU with sufficient VRAM
You want zero per-token API costs
You need offline access

Use cloud/API if

Your GPU has insufficient VRAM for the models you need
You want access to frontier model quality
You need maximum coding/reasoning performance
You don't want to manage local model downloads and updates

Setup overview

Setting up LM Studio is straightforward. It runs on macos, windows, linux. Full documentation is available at https://lmstudio.ai/docs.

Website →Docs →

Limitations

OpenRouter/cloud API access (local inference only)
Multi-user deployments
Advanced formats beyond GGUF (no AWQ, GPTQ, exl2)
RAG or document Q&A

Recommended GPUs

NVIDIA RTX 4090 NVIDIA RTX 3090 NVIDIA RTX 3060 12GB NVIDIA RTX 4060 Ti 16GB Apple M4 Max (48GB)Apple M4 Max (64GB)

Compatible models

Qwen3 32B Qwen3 14B Qwen 2.5 7B Instruct Llama 3.1 8B Instruct Gemma 3 12B Instruct Mistral Nemo 12B Instruct Llama 3.1 70B Instruct Qwen3 235B-A22B (MoE)

Frequently asked questions

What is LM Studio?: LM Studio is Desktop app for running local LLMs with zero setup. In-app model browser, visual GPU fit indicator, and one-click GGUF downloads from Hugging Face. LM Studio is a desktop application (macOS, Windows, Linux beta) for downloading, running, and serving GGUF-quantized LLMs locally.
Does LM Studio need a GPU?: LM Studio itself does not require a GPU. However, the models you connect to it do. 4 GB+ VRAM minimum. 8 GB VRAM recommended for usable speeds with 7B models. Apple Silicon Macs with 16 GB+ unified memory run very well via Metal acceleration. CPU-only works but is 5-10x slower for 7B+ models.
Can I run LM Studio on CPU only?: Yes — LM Studio supports CPU-only operation, but performance will be significantly slower (5-10x) compared to GPU inference. CPU-only works best for models under 7B parameters with at least 16 GB of system RAM.
What models work best with LM Studio?: Models that work well with LM Studio include: Qwen3 32B, Qwen3 14B, Qwen 2.5 7B Instruct, Llama 3.1 8B Instruct, Gemma 3 12B Instruct, Mistral Nemo 12B Instruct. The best model depends on your GPU's VRAM and your use case.
Is LM Studio free?: LM Studio is free for personal use but is not open source.