CanItRun Logocanitrun.

NVIDIA A100 80GB

The NVIDIA A100 80GB has 80 GB VRAM and 2039 GB/s memory bandwidth. It can run 54 of our 70 tracked models natively in VRAM at 8k context.

With 80 GB HBM2e, the NVIDIA A100 80GB is a datacenter-tier GPU that can run 54 models natively. It handles 70B-class models at Q4 quantization.

NVIDIA A100 80GB: 2020 Ampere datacenter with 80GB HBM2e at 2039 GB/s — predecessor to H100.

70B at Q4 native, 405B at Q2. ~40-60 t/s for 7B, ~15-25 t/s for 70B.

Full CUDA support. Widely deployed on AWS p4d and Azure NDv4. Still excellent for inference.

VendorNVIDIA
ArchitectureAmpere
VRAM80 GB
Memory typeHBM2e
Memory bandwidth2039 GB/s
Compute backendCUDA
TierDatacenter
Released2020
Models (native)54 / 70
Models (offload)3 / 70
Software: Typically cloud-accessed. vLLM and TensorRT-LLM give best batched-inference performance.

Popular models for this GPU

Models this GPU runs natively in VRAM (54)

Models that fit with CPU offload (3)

These use system RAM for layers that don't fit in VRAM — expect much slower inference.

Too large for this GPU (13)

Compare NVIDIA A100 80GB with other GPUs

Frequently asked questions

How much VRAM does the NVIDIA A100 80GB have?
The NVIDIA A100 80GB has 80 GB of HBM2e with 2039 GB/s memory bandwidth.
What is the NVIDIA A100 80GB best for?
With 80 GB of VRAM, the NVIDIA A100 80GB is a server-class GPU designed for running the largest open-weight models (70B–405B) at high quantization with ample context.
What LLMs can the NVIDIA A100 80GB run locally?
The NVIDIA A100 80GB can run 54 of the 70 open-weight models tracked by CanItRun natively in VRAM at 8k context. Top options include: Llama 3.3 70B Instruct at NVFP4, Llama 3.1 8B Instruct at FP32, Llama 3.2 3B Instruct at FP32.
Can the NVIDIA A100 80GB run Llama 3.3 70B Instruct?
Yes. The NVIDIA A100 80GB runs Llama 3.3 70B Instruct natively in VRAM at NVFP4 quantization, achieving approximately 58.3 tokens per second.
Can the NVIDIA A100 80GB run Qwen 3.6 27B?
Yes. The NVIDIA A100 80GB runs Qwen 3.6 27B natively in VRAM at BF16 quantization, achieving approximately 37.8 tokens per second.
Can the NVIDIA A100 80GB run Llama 3.1 8B Instruct?
Yes. The NVIDIA A100 80GB runs Llama 3.1 8B Instruct natively in VRAM at FP32 quantization, achieving approximately 63.7 tokens per second.