Mixture of Experts LLMs
28models · local AI VRAM requirements & GPU compatibility
Mixture of Experts (MoE) models have a large total parameter count but only activate a fraction per token. This means you need enough VRAM to load all the weights, but inference speed is determined by the active parameter count — often dramatically faster than a dense model of equivalent quality. The trade-off: MoE models can be hard to fit on a single GPU.
- DeepSeek V4 Pro 1.6TDeepSeek · 1600B params (49B active)1010.0 GBQ4_K_M
- Kimi K2.6Moonshot AI · 1000B params (32B active)633.6 GBQ4_K_M
- GLM-5.1 754BZ.ai · 754B params (44B active)487.2 GBQ4_K_M
- GLM-5 744BZ.ai · 744B params (40B active)480.9 GBQ4_K_M
- DeepSeek V3 671BDeepSeek · 671B params (37B active)423.7 GBQ4_K_M
- DeepSeek R1 671BDeepSeek · 671B params (37B active)423.7 GBQ4_K_M
- MiniMax M1 456BMiniMax · 456B params (46B active)290.5 GBQ4_K_M
- Llama 4 Maverick 400BMeta · 400B params (17B active)256.7 GBQ4_K_M
- GLM-4.7 358BZ.ai · 358B params (32B active)229.2 GBQ4_K_M
- GLM-4.5 355BZ.ai · 355B params (32B active)227.3 GBQ4_K_M
- GLM-4.6 355BZ.ai · 355B params (32B active)227.3 GBQ4_K_M
- DeepSeek V4 Flash 284BDeepSeek · 284B params (13B active)179.9 GBQ4_K_M
- Qwen3 235B-A22B (MoE)Alibaba · 235B params (22B active)149.9 GBQ4_K_M
- MiniMax M2.5 229BMiniMax · 229B params (10B active)146.7 GBQ4_K_M
- MiniMax M2.7 229BMiniMax · 229B params (10B active)146.7 GBQ4_K_M
- Mixtral 8x22B Instruct v0.1Mistral AI · 141B params (39B active)91.0 GBQ4_K_M
- Qwen 3.5 122B-A10B (MoE)Alibaba · 122B params (10B active)79.3 GBQ4_K_Mfits 80 GB
- Nemotron 3 Super 120BNVIDIA · 120B params (12B active)76.5 GBQ4_K_Mfits 80 GB
- GPT-OSS 120BOpenAI · 117B params (5B active)74.5 GBQ4_K_Mfits 80 GB
- Llama 4 Scout 109BMeta · 109B params (17B active)71.7 GBQ4_K_Mfits 80 GB
- GLM-4.5 Air 106BZ.ai · 106B params (12B active)68.6 GBQ4_K_Mfits 80 GB
- GLM-4.6V 106BZ.ai · 106B params (12B active)68.6 GBQ4_K_Mfits 80 GB
- Mixtral 8x7B Instruct v0.1Mistral AI · 46.7B params (12.9B active)30.6 GBQ4_K_Mfits 48 GB
- Qwen 3.5 35B-A3B (MoE)Alibaba · 35B params (3B active)23.0 GBQ4_K_Mfits 24 GB
- Nemotron 3 Nano 30BNVIDIA · 32B params (3B active)20.7 GBQ4_K_Mfits 24 GB
- Qwen3 30B-A3B (MoE)Alibaba · 30B params (3B active)19.8 GBQ4_K_Mfits 24 GB
- Gemma 4 26B (MoE)Google · 26B params (3.8B active)18.0 GBQ4_K_Mfits 24 GB
- GPT-OSS 20BOpenAI · 21B params (4B active)13.7 GBQ4_K_Mfits 16 GB
Want to check your specific GPU? Use the homepage calculator to see which of these models fit your hardware with estimated tokens per second.