Mixture of Experts LLMs
28models · local AI VRAM requirements & GPU compatibility
Mixture of Experts (MoE) models have a large total parameter count but only activate a fraction per token. This means you need enough VRAM to load all the weights, but inference speed is determined by the active parameter count — often dramatically faster than a dense model of equivalent quality. The trade-off: MoE models can be hard to fit on a single GPU.
- DeepSeek V4 Pro 1.6TDeepSeek · 1600B params (49B active)897.1 GBQ4_K_M
- Kimi K2.6Moonshot AI · 1000B params (32B active)563.0 GBQ4_K_M
- GLM-5.1 754BZ.ai · 754B params (44B active)434.0 GBQ4_K_M
- GLM-5 744BZ.ai · 744B params (40B active)428.4 GBQ4_K_M
- DeepSeek V3 671BDeepSeek · 671B params (37B active)376.3 GBQ4_K_M
- DeepSeek R1 671BDeepSeek · 671B params (37B active)376.3 GBQ4_K_M
- MiniMax M1 456BMiniMax · 456B params (46B active)258.4 GBQ4_K_M
- Llama 4 Maverick 400BMeta · 400B params (17B active)228.5 GBQ4_K_M
- GLM-4.7 358BZ.ai · 358B params (32B active)203.9 GBQ4_K_M
- GLM-4.5 355BZ.ai · 355B params (32B active)202.3 GBQ4_K_M
- GLM-4.6 355BZ.ai · 355B params (32B active)202.3 GBQ4_K_M
- DeepSeek V4 Flash 284BDeepSeek · 284B params (13B active)159.8 GBQ4_K_M
- Qwen3 235B-A22B (MoE)Alibaba · 235B params (22B active)133.4 GBQ4_K_M
- MiniMax M2.5 229BMiniMax · 229B params (10B active)130.6 GBQ4_K_M
- MiniMax M2.7 229BMiniMax · 229B params (10B active)130.6 GBQ4_K_M
- Mixtral 8x22B Instruct v0.1Mistral AI · 141B params (39B active)81.1 GBQ4_K_M
- Qwen 3.5 122B-A10B (MoE)Alibaba · 122B params (10B active)70.7 GBQ4_K_Mfits 80 GB
- Nemotron 3 Super 120BNVIDIA · 120B params (12B active)68.0 GBQ4_K_Mfits 80 GB
- GPT-OSS 120BOpenAI · 117B params (5B active)66.2 GBQ4_K_Mfits 80 GB
- Llama 4 Scout 109BMeta · 109B params (17B active)64.0 GBQ4_K_Mfits 80 GB
- GLM-4.5 Air 106BZ.ai · 106B params (12B active)61.1 GBQ4_K_Mfits 80 GB
- GLM-4.6V 106BZ.ai · 106B params (12B active)61.1 GBQ4_K_Mfits 80 GB
- Mixtral 8x7B Instruct v0.1Mistral AI · 46.7B params (12.9B active)27.4 GBQ4_K_Mfits 48 GB
- Qwen 3.5 35B-A3B (MoE)Alibaba · 35B params (3B active)20.5 GBQ4_K_Mfits 24 GB
- Nemotron 3 Nano 30BNVIDIA · 32B params (3B active)18.4 GBQ4_K_Mfits 24 GB
- Qwen3 30B-A3B (MoE)Alibaba · 30B params (3B active)17.7 GBQ4_K_Mfits 24 GB
- Gemma 4 26B (MoE)Google · 26B params (3.8B active)16.1 GBQ4_K_Mfits 24 GB
- GPT-OSS 20BOpenAI · 21B params (4B active)12.2 GBQ4_K_Mfits 16 GB
Want to check your specific GPU? Use the homepage calculator to see which of these models fit your hardware with estimated tokens per second.