How much VRAM does the CPU only (system RAM) have?

The CPU only (system RAM) has 0 GB of DDR4 / DDR5 with 80 GB/s memory bandwidth.

What is the CPU only (system RAM) best for?

With 0 GB of VRAM, the CPU only (system RAM) is best for running compact models (1B–8B) at low quantization, suitable for edge inference, prototyping, and lightweight tasks.

What LLMs can the CPU only (system RAM) run locally?

The CPU only (system RAM) cannot run any of the 80 tracked models fully in VRAM at 8k context. It may handle smaller models with CPU offload.

Can the CPU only (system RAM) run Llama 3.3 70B Instruct?

The CPU only (system RAM) does not have enough VRAM to run Llama 3.3 70B Instruct. You would need more VRAM or a lower quantization level.

Can the CPU only (system RAM) run Qwen 3.6 27B?

The CPU only (system RAM) can run Qwen 3.6 27B with CPU offload at Q6_K quantization, but inference will be slower than native VRAM execution.

Can the CPU only (system RAM) run Llama 3.1 8B Instruct?

The CPU only (system RAM) can run Llama 3.1 8B Instruct with CPU offload at BF16 quantization, but inference will be slower than native VRAM execution.

CPU only (system RAM)

The CPU only (system RAM) has 0 GB VRAM and 80 GB/s memory bandwidth. It can run 0 of our 80 tracked models natively in VRAM at 8k context.

With 0 GB DDR4 / DDR5, the CPU only (system RAM) is a integrated-tier GPU that can run 0 models natively. It's best for smaller models under 8B parameters.

CPU only (system RAM): x86-64/ARM with DDR4/DDR5 at ~80 GB/s — CPU inference fallback.

7B at Q4 ~1-5 t/s depending on AVX2/AVX-512 support. 14B ~0.5-2 t/s.

llama.cpp CPU backend. AVX2 or AVX-512 recommended. Expect 1-5 t/s for 7B on modern desktop CPU.

Vendor	Generic
Architecture	x86-64 / ARM
VRAM	0 GB
Memory type	DDR4 / DDR5
Memory bandwidth	80 GB/s
Compute backend	CPU
Tier	Integrated
Released	2024
Models (native)	0 / 80
Models (offload)	45 / 80

Software: llama.cpp CPU backend. AVX2 or AVX-512 recommended. Expect 1–5 t/s for 7B models on a modern desktop CPU.

Models this GPU runs natively in VRAM (0)

None.

Models that fit with CPU offload (45)

These use system RAM for layers that don't fit in VRAM — expect much slower inference.

Too large for this GPU (35)

Explore

Browse all guides →Browse all models →Browse all GPUs →

Frequently asked questions

How much VRAM does the CPU only (system RAM) have?: The CPU only (system RAM) has 0 GB of DDR4 / DDR5 with 80 GB/s memory bandwidth.
What is the CPU only (system RAM) best for?: With 0 GB of VRAM, the CPU only (system RAM) is best for running compact models (1B–8B) at low quantization, suitable for edge inference, prototyping, and lightweight tasks.
What LLMs can the CPU only (system RAM) run locally?: The CPU only (system RAM) cannot run any of the 80 tracked models fully in VRAM at 8k context. It may handle smaller models with CPU offload.
Can the CPU only (system RAM) run Llama 3.3 70B Instruct?: The CPU only (system RAM) does not have enough VRAM to run Llama 3.3 70B Instruct. You would need more VRAM or a lower quantization level.
Can the CPU only (system RAM) run Qwen 3.6 27B?: The CPU only (system RAM) can run Qwen 3.6 27B with CPU offload at Q6_K quantization, but inference will be slower than native VRAM execution.
Can the CPU only (system RAM) run Llama 3.1 8B Instruct?: The CPU only (system RAM) can run Llama 3.1 8B Instruct with CPU offload at BF16 quantization, but inference will be slower than native VRAM execution.