CanItRun Logocanitrun.
← All apps

Ollama

The industry standard for running LLMs locally. Simple CLI, massive model library (100K+), OpenAI-compatible API on port 11434. Powers Open WebUI, Continue, and more.

App type

Local Llm Tool, Developer Tool

Local models

Yes

OpenRouter

No

Ollama

Yes

GPU required

Only for local models

Best for

Running LLMs locally as a backend for other apps

Setup difficulty

Easy

Platforms

macOS, Linux, Windows, CLI, Docker

Pricing

Open source — free

Ollama is The industry standard for running LLMs locally. Simple CLI, massive model library (100K+), OpenAI-compatible API on port 11434. Powers Open WebUI, Continue, and more. Ollama is the most popular local LLM runtime with 120K+ GitHub stars.

Ollama runs entirely on your local hardware. Ollama integration lets you run models locally on your own GPU. Ollama is open source (https://github.com/ollama/ollama), so you can inspect the code and self-host. No GPU required — runs on CPU for small models (3B-8B) with sufficient system RAM. For 7B models, 8 GB VRAM recommended for usable speeds. Default context window is only 2K — increase it for coding agents.

Can it run on my hardware?

Minimum

No GPU required — runs on CPU for small models (3B-8B) with sufficient system RAM. For 7B models, 8 GB VRAM recommended for usable speeds. Default context window is only 2K — increase it for coding agents.

Recommended

8 GB VRAM: 7B models at Q8. 12 GB VRAM: 13-14B at Q4. 16 GB VRAM: 14B at Q8 or MoE models. 24 GB VRAM (RTX 3090/4090): 27-32B at Q4 or 70B at Q2. 48 GB+ (dual GPU): 70B at Q4 or 235B MoE at IQ4.

Approximate VRAM needed for recommended local models at Q4 with 8K context:

ModelParamsQ4 VRAMMin GPU
Qwen3 32B32.8B~22.2 GB24 GB
Qwen3 14B14.8B~10.8 GB12 GB
Qwen 2.5 7B Instruct7.6B~5.3 GB8 GB
Llama 3.1 8B Instruct8B~6.3 GB8 GB
Gemma 3 12B Instruct12.2B~8.9 GB12 GB
Mistral Nemo 12B Instruct12.2B~9.2 GB12 GB
DeepSeek R1 Distill Qwen 32B32.5B~22.9 GB24 GB
Llama 3.1 70B Instruct70B~47.1 GB48 GB+

Check your GPU against these models in the calculator →

App compatibility

FeatureSupported
Local modelsYes
OpenRouterNo
OpenAI-compatible APIYes
OllamaYes
LM StudioNo
Anthropic APINo
Google APINo
Mistral APINo
DockerYes
Works offlineYes
Needs GPUNo

Recommended models

Best local models

Local vs cloud: which should you use?

Use local models if

  • You want privacy — data never leaves your machine
  • You already have a GPU with sufficient VRAM
  • You want zero per-token API costs
  • You need offline access

Use cloud/API if

  • Your GPU has insufficient VRAM for the models you need
  • You want access to frontier model quality
  • You need maximum coding/reasoning performance
  • You don't want to manage local model downloads and updates

Setup overview

Setting up Ollama is straightforward. It runs on macos, linux, windows, cli, docker. Full documentation is available at https://github.com/ollama/ollama/tree/main/docs.

Limitations

  • GUI-only users (CLI tool — pair with Open WebUI for a GUI)
  • Maximum performance (raw llama.cpp is 10-20% faster)
  • Cloud/API model access (local inference only)
  • RAG or document Q&A on its own

Related

Recommended GPUs

Compatible models

Related apps

Frequently asked questions

What is Ollama?
Ollama is The industry standard for running LLMs locally. Simple CLI, massive model library (100K+), OpenAI-compatible API on port 11434. Powers Open WebUI, Continue, and more. Ollama is the most popular local LLM runtime with 120K+ GitHub stars.
Does Ollama need a GPU?
Ollama itself does not require a GPU. However, the models you connect to it do. No GPU required — runs on CPU for small models (3B-8B) with sufficient system RAM. For 7B models, 8 GB VRAM recommended for usable speeds. Default context window is only 2K — increase it for coding agents.
Can I run Ollama on CPU only?
Yes — Ollama supports CPU-only operation, but performance will be significantly slower (5-10x) compared to GPU inference. CPU-only works best for models under 7B parameters with at least 16 GB of system RAM.
Can Ollama use local models via Ollama?
Yes. Ollama works with Ollama for running models locally. Install Ollama, pull your model (e.g., `ollama pull qwen2.5:7b`), and connect Ollama to the local Ollama server. GPU requirements depend on the model you choose, not Ollama itself.
What models work best with Ollama?
Models that work well with Ollama include: Qwen3 32B, Qwen3 14B, Qwen 2.5 7B Instruct, Llama 3.1 8B Instruct, Gemma 3 12B Instruct, Mistral Nemo 12B Instruct. The best model depends on your GPU's VRAM and your use case.
Is Ollama free and open source?
Yes. Ollama is open source and completely free. You can find the source code on GitHub at https://github.com/ollama/ollama.