CanItRun Logocanitrun.

NVIDIA RTX A6000

The NVIDIA RTX A6000 has 48 GB VRAM and 768 GB/s memory bandwidth. It can run 52 of our 70 tracked models natively in VRAM at 8k context.

With 48 GB GDDR6, the NVIDIA RTX A6000 is a workstation-tier GPU that can run 52 models natively. It handles 70B-class models at Q4 quantization.

The NVIDIA RTX A6000 is an Ampere-generation workstation GPU with 48GB of ECC GDDR6 VRAM. Released in 2020, it bridges consumer RTX cards and data center A100s — running full CUDA drivers, supporting NVLink for multi-GPU setups, and fitting inside standard workstation towers. Its 48GB capacity lets it run 34B models at Q4 and most 70B models with CPU offload, making it a long-running workhorse for on-prem AI labs.

NVIDIA RTX A6000: 2020 Ampere workstation with 48GB GDDR6 at 768 GB/s. Bridges consumer and datacenter.

70B at Q4 native. ~25-35 t/s for 7B, ~10-15 t/s for 70B.

Full CUDA with ECC memory. NVLink support for multi-GPU. Long-running workhorse for on-prem AI labs.

VendorNVIDIA
ArchitectureAmpere
VRAM48 GB
Memory typeGDDR6
Memory bandwidth768 GB/s
Compute backendCUDA
TierWorkstation
Released2020
Models (native)52 / 70
Models (offload)2 / 70
Software: Full llama.cpp and Ollama support out of the box. CUDA 12.x recommended; driver ≥ 525 required.

Popular models for this GPU

Models this GPU runs natively in VRAM (52)

Models that fit with CPU offload (2)

These use system RAM for layers that don't fit in VRAM — expect much slower inference.

Too large for this GPU (16)

Compare NVIDIA RTX A6000 with other GPUs

Frequently asked questions

How much VRAM does the NVIDIA RTX A6000 have?
The NVIDIA RTX A6000 has 48 GB of GDDR6 with 768 GB/s memory bandwidth.
What is the NVIDIA RTX A6000 best for?
With 48 GB of VRAM, the NVIDIA RTX A6000 is ideal for running 70B-class models at Q4 quantization and large MoE models — a workstation sweet spot for local inference.
What LLMs can the NVIDIA RTX A6000 run locally?
The NVIDIA RTX A6000 can run 52 of the 70 open-weight models tracked by CanItRun natively in VRAM at 8k context. Top options include: Llama 3.3 70B Instruct at NVFP4, Llama 3.1 8B Instruct at FP32, Llama 3.2 3B Instruct at FP32.
Can the NVIDIA RTX A6000 run Llama 3.3 70B Instruct?
Yes. The NVIDIA RTX A6000 runs Llama 3.3 70B Instruct natively in VRAM at NVFP4 quantization, achieving approximately 21.9 tokens per second.
Can the NVIDIA RTX A6000 run Qwen 3.6 27B?
Yes. The NVIDIA RTX A6000 runs Qwen 3.6 27B natively in VRAM at NVFP4 quantization, achieving approximately 56.9 tokens per second.
Can the NVIDIA RTX A6000 run Llama 3.1 8B Instruct?
Yes. The NVIDIA RTX A6000 runs Llama 3.1 8B Instruct natively in VRAM at FP32 quantization, achieving approximately 24 tokens per second.