NVIDIA DGX Spark (128GB)

The NVIDIA DGX Spark (128GB) has 128GB VRAM and 273 GB/s memory bandwidth. It can run 49 of our 54 tracked models natively in VRAM at 8k context.

Vendor	NVIDIA
Architecture	Grace Blackwell
VRAM	128 GB (unified)
Memory type	LPDDR5X
Memory bandwidth	273 GB/s
Compute backend	CUDA
Tier	Workstation
Released	2025
Models (native)	49 / 54
Models (offload)	0 / 54

Software: ARM-based Grace CPU + Blackwell GPU share unified memory. llama.cpp CUDA backend works; NVIDIA NIM microservices also supported.

Models this GPU runs natively in VRAM (49)

Qwen3 235B-A22B (MoE)
235B · MMLU-Pro —
Q3_K_M · ~34.1 t/s
Mixtral 8x22B Instruct v0.1
141B · MMLU-Pro 40.0
Q6_K · ~10.3 t/s
Qwen 3.5 122B-A10B (MoE)
122B · MMLU-Pro —
Q6_K · ~40 t/s
Llama 4 Scout 109B
109B · MMLU-Pro 70.0
Q6_K · ~23.6 t/s
Qwen 2.5 72B Instruct
72B · MMLU-Pro 58.1
Q8 · ~3.8 t/s
Llama 3.3 70B Instruct
70B · MMLU-Pro 68.9
Q8 · ~3.9 t/s
DeepSeek R1 Distill Llama 70B
70B · MMLU-Pro 70.0
Q8 · ~3.9 t/s
Llama 3.1 70B Instruct
70B · MMLU-Pro 66.4
Q8 · ~3.9 t/s
Mixtral 8x7B Instruct v0.1
46.7B · MMLU-Pro 29.7
FP16 · ~11.6 t/s
Command-R 35B
35B · MMLU-Pro 33.0
FP16 · ~3.9 t/s
Qwen 3.5 35B-A3B (MoE)
35B · MMLU-Pro —
FP16 · ~50.1 t/s
Qwen 3.6 35B
35B · MMLU-Pro —
FP16 · ~3.9 t/s
Yi 1.5 34B Chat
34.4B · MMLU-Pro 37.0
FP16 · ~4 t/s
Qwen3 32B
32.8B · MMLU-Pro —
FP16 · ~4.2 t/s
Qwen 2.5 32B Instruct
32.5B · MMLU-Pro 55.1
FP16 · ~4.2 t/s
Qwen 2.5 Coder 32B Instruct
32.5B · MMLU-Pro 50.4
FP16 · ~4.2 t/s
DeepSeek R1 Distill Qwen 32B
32.5B · MMLU-Pro 65.0
FP16 · ~4.2 t/s
Gemma 4 31B
31B · MMLU-Pro —
FP16 · ~4.4 t/s
Qwen3 30B-A3B (MoE)
30B · MMLU-Pro —
FP16 · ~50.1 t/s
Gemma 2 27B Instruct
27.2B · MMLU-Pro 38.0
FP16 · ~5 t/s
Gemma 3 27B Instruct
27B · MMLU-Pro —
FP16 · ~5.1 t/s
Qwen 3.6 27B
27B · MMLU-Pro —
FP16 · ~5.1 t/s
Gemma 4 26B (MoE)
26B · MMLU-Pro —
FP16 · ~39.5 t/s
Mistral Small 3.1 24B Instruct
24B · MMLU-Pro —
FP16 · ~5.7 t/s
Mistral Small 22B
22.2B · MMLU-Pro 49.2
FP16 · ~6.1 t/s
Qwen3 14B
14.8B · MMLU-Pro —
FP16 · ~9.2 t/s
Qwen 2.5 14B Instruct
14.7B · MMLU-Pro 51.2
FP16 · ~9.3 t/s
Phi-4 14B Instruct
14B · MMLU-Pro 56.1
FP16 · ~9.8 t/s
Mistral Nemo 12B Instruct
12.2B · MMLU-Pro 35.6
FP16 · ~11.2 t/s
Gemma 3 12B Instruct
12.2B · MMLU-Pro —
FP16 · ~11.2 t/s
Gemma 2 9B Instruct
9.2B · MMLU-Pro 32.0
FP16 · ~14.8 t/s
Llama 3.1 8B Instruct
8B · MMLU-Pro 37.5
FP16 · ~17.1 t/s
DeepSeek R1 Distill Llama 8B
8B · MMLU-Pro 41.0
FP16 · ~17.1 t/s
Qwen3 8B
8B · MMLU-Pro —
FP16 · ~17.1 t/s
Qwen 2.5 7B Instruct
7.6B · MMLU-Pro 36.5
FP16 · ~18 t/s
Mistral 7B Instruct v0.3
7.25B · MMLU-Pro 30.0
FP16 · ~18.8 t/s
Gemma 3 4B Instruct
4B · MMLU-Pro —
FP16 · ~34.1 t/s
Gemma 4 E4B
4B · MMLU-Pro —
FP16 · ~34.1 t/s
Phi-3.5 Mini Instruct
3.8B · MMLU-Pro 35.6
FP16 · ~35.9 t/s
Llama 3.2 3B Instruct
3.2B · MMLU-Pro 24.0
FP16 · ~42.7 t/s
Qwen 2.5 3B Instruct
3.1B · MMLU-Pro 32.4
FP16 · ~44 t/s
Gemma 2 2B Instruct
2.6B · MMLU-Pro 17.8
FP16 · ~52.5 t/s
Gemma 4 E2B
2B · MMLU-Pro —
FP16 · ~68.3 t/s
SmolLM2 1.7B Instruct
1.7B · MMLU-Pro 19.0
FP16 · ~80.3 t/s
Qwen 2.5 1.5B Instruct
1.5B · MMLU-Pro 16.8
FP16 · ~91 t/s
Llama 3.2 1B Instruct
1.24B · MMLU-Pro 12.5
FP16 · ~110.1 t/s
Gemma 3 1B Instruct
1B · MMLU-Pro —
FP16 · ~136.5 t/s
Qwen 2.5 0.5B Instruct
0.5B · MMLU-Pro 10.0
FP16 · ~273 t/s
SmolLM2 360M Instruct
0.36B · MMLU-Pro 8.0
FP16 · ~379.2 t/s

Too large for this GPU (5)

Llama 3.1 405B Instruct
DeepSeek V3 671B
DeepSeek R1 671B
Llama 4 Maverick 400B
Kimi K2.6