CanItRun Logocanitrun.

Mixture of Experts LLMs

28models · local AI VRAM requirements & GPU compatibility

Mixture of Experts (MoE) models have a large total parameter count but only activate a fraction per token. This means you need enough VRAM to load all the weights, but inference speed is determined by the active parameter count — often dramatically faster than a dense model of equivalent quality. The trade-off: MoE models can be hard to fit on a single GPU.

Want to check your specific GPU? Use the homepage calculator to see which of these models fit your hardware with estimated tokens per second.