CanItRun Logocanitrun.

NVIDIA DGX Spark (128GB)

The NVIDIA DGX Spark (128GB) has 128 GB VRAM and 273 GB/s memory bandwidth. It can run 58 of our 70 tracked models natively in VRAM at 8k context.

With 128 GB LPDDR5X, the NVIDIA DGX Spark (128GB) is a workstation-tier GPU that can run 58 models natively. It handles 70B-class models at Q4 quantization.

The NVIDIA DGX Spark is a compact desktop AI supercomputer released in 2025 (previously announced as Project DIGITS). It pairs an ARM-based Grace CPU with a Blackwell GPU on a single chip, sharing 128GB of LPDDR5X unified memory — the same architecture as the GB10 Grace Blackwell Superchip. At 273 GB/s, bandwidth is lower than discrete HBM-based GPUs, but the 128GB capacity means you can run 70B models at Q4 and many 405B models with CPU offload — all on a device the size of a Mac mini, at a starting price under $4,000.

NVIDIA DGX Spark (128GB): 2025 desktop AI supercomputer with ARM Grace CPU + Blackwell GPU sharing 128GB LPDDR5X at 273 GB/s.

70B at Q4 native. 405B at Q2-Q3 with CPU offload. Compact form factor at ~$4,000 starting price.

llama.cpp CUDA backend works. NVIDIA NIM microservices supported. ARM architecture may need compilation.

VendorNVIDIA
ArchitectureGrace Blackwell
VRAM128 GB (unified)
Memory typeLPDDR5X
Memory bandwidth273 GB/s
Compute backendCUDA
TierWorkstation
Released2025
Models (native)58 / 70
Models (offload)0 / 70
Software: ARM-based Grace CPU + Blackwell GPU share unified memory. llama.cpp CUDA backend works; NVIDIA NIM microservices also supported.

Popular models for this GPU

Models this GPU runs natively in VRAM (58)

Too large for this GPU (12)

Compare NVIDIA DGX Spark (128GB) with other GPUs

Frequently asked questions

How much VRAM does the NVIDIA DGX Spark (128GB) have?
The NVIDIA DGX Spark (128GB) has 128 GB of LPDDR5X with 273 GB/s memory bandwidth (unified system memory, shared between CPU and GPU).
What is the NVIDIA DGX Spark (128GB) best for?
With 128 GB of VRAM, the NVIDIA DGX Spark (128GB) is a server-class GPU designed for running the largest open-weight models (70B–405B) at high quantization with ample context.
What LLMs can the NVIDIA DGX Spark (128GB) run locally?
The NVIDIA DGX Spark (128GB) can run 58 of the 70 open-weight models tracked by CanItRun natively in VRAM at 8k context. Top options include: Llama 3.3 70B Instruct at NVFP4, Llama 3.1 8B Instruct at FP32, Llama 3.2 3B Instruct at FP32.
Can the NVIDIA DGX Spark (128GB) run Llama 3.3 70B Instruct?
Yes. The NVIDIA DGX Spark (128GB) runs Llama 3.3 70B Instruct natively in VRAM at NVFP4 quantization, achieving approximately 7.8 tokens per second.
Can the NVIDIA DGX Spark (128GB) run Qwen 3.6 27B?
Yes. The NVIDIA DGX Spark (128GB) runs Qwen 3.6 27B natively in VRAM at FP32 quantization, achieving approximately 2.5 tokens per second.
Can the NVIDIA DGX Spark (128GB) run Llama 3.1 8B Instruct?
Yes. The NVIDIA DGX Spark (128GB) runs Llama 3.1 8B Instruct natively in VRAM at FP32 quantization, achieving approximately 8.5 tokens per second.