How much VRAM does the NVIDIA A100 80GB have?

The NVIDIA A100 80GB has 80 GB of HBM2e with 2039 GB/s memory bandwidth.

What is the NVIDIA A100 80GB best for?

With 80 GB of VRAM, the NVIDIA A100 80GB is a server-class GPU designed for running the largest open-weight models (70B–405B) at high quantization with ample context.

What LLMs can the NVIDIA A100 80GB run locally?

The NVIDIA A100 80GB can run 56 of the 80 open-weight models tracked by CanItRun natively in VRAM at 8k context. Top options include: Llama 3.3 70B Instruct at NVFP4, Llama 3.1 8B Instruct at FP32, Llama 3.2 3B Instruct at FP32.

Can the NVIDIA A100 80GB run Llama 3.3 70B Instruct?

Yes. The NVIDIA A100 80GB runs Llama 3.3 70B Instruct natively in VRAM at NVFP4 quantization, achieving approximately 35.2 tokens per second.

Can the NVIDIA A100 80GB run Qwen 3.6 27B?

Yes. The NVIDIA A100 80GB runs Qwen 3.6 27B natively in VRAM at BF16 quantization, achieving approximately 23.8 tokens per second.

Can the NVIDIA A100 80GB run Llama 3.1 8B Instruct?

Yes. The NVIDIA A100 80GB runs Llama 3.1 8B Instruct natively in VRAM at FP32 quantization, achieving approximately 40.1 tokens per second.

Can I rent the NVIDIA A100 80GB instead of buying it?

Yes — RunPod and similar cloud GPU providers let you rent NVIDIA A100 80GB instances by the hour, with no long-term contract. This is often cheaper than buying if you only need it occasionally, and lets you try the GPU before committing to a purchase.

NVIDIA A100 80GB

The NVIDIA A100 80GB has 80 GB VRAM and 2039 GB/s memory bandwidth. It can run 56 of our 80 tracked models natively in VRAM at 8k context.

With 80 GB HBM2e, the NVIDIA A100 80GB is a datacenter-tier GPU that can run 56 models natively. It handles 70B-class models at Q4 quantization.

The NVIDIA A100 80GB is the high-memory SXM variant of NVIDIA's Ampere datacenter GPU, featuring 80GB HBM2e at 2,039 GB/s. It runs 70B models at Q4–Q5 in a single GPU, and remains widely available in cloud instances (AWS p4de, GCP a2-ultragpu). While superseded by the H100 for new deployments, A100s are heavily discounted on the secondhand market, making them popular for cost-conscious inference setups.

NVIDIA A100 80GB: 2020 Ampere datacenter with 80GB HBM2e at 2039 GB/s — predecessor to H100.

70B at Q4 native, 405B at Q2. ~40-60 t/s for 7B, ~15-25 t/s for 70B.

Full CUDA support. Widely deployed on AWS p4d and Azure NDv4. Still excellent for inference.

Vendor	NVIDIA
Architecture	Ampere
VRAM	80 GB
Memory type	HBM2e
Memory bandwidth	2039 GB/s
Compute backend	CUDA
Tier	Datacenter
Released	2020
Models (native)	56 / 80
Models (offload)	4 / 80

Software: Typically cloud-accessed. vLLM and TensorRT-LLM give best batched-inference performance.

Cloud GPU Rental

Don't want to buy a NVIDIA A100 80GB? RunPod is a cloud GPU rental service — rent one by the hour instead, no contract, no upfront hardware cost.

Pay by the hour · no contract · pods start in about a minute.

Rent a NVIDIA A100 80GB on RunPod ↗ (+$5 signup credit)

Affiliate link — CanItRun may earn a commission. Doesn't affect the fit calculation above.

Popular models for this GPU

Mixtral 8x22B Instruct v0.1 Qwen 3.5 122B-A10B (MoE)Nemotron 3 Super 120B GPT-OSS 120B Llama 4 Scout 109B

Models this GPU runs natively in VRAM (56)

Models that fit with CPU offload (4)

These use system RAM for layers that don't fit in VRAM — expect much slower inference.

Too large for this GPU (20)

Compare NVIDIA A100 80GB with other GPUs

Continue reading

vram-guides10 min

DeepSeek Family Guide: R1, V3 & Distilled Models

hardware10 min

Used Enterprise GPUs for LLMs: A6000, A100, and Beyond

Frequently asked questions

How much VRAM does the NVIDIA A100 80GB have?: The NVIDIA A100 80GB has 80 GB of HBM2e with 2039 GB/s memory bandwidth.
What is the NVIDIA A100 80GB best for?: With 80 GB of VRAM, the NVIDIA A100 80GB is a server-class GPU designed for running the largest open-weight models (70B–405B) at high quantization with ample context.
What LLMs can the NVIDIA A100 80GB run locally?: The NVIDIA A100 80GB can run 56 of the 80 open-weight models tracked by CanItRun natively in VRAM at 8k context. Top options include: Llama 3.3 70B Instruct at NVFP4, Llama 3.1 8B Instruct at FP32, Llama 3.2 3B Instruct at FP32.
Can the NVIDIA A100 80GB run Llama 3.3 70B Instruct?: Yes. The NVIDIA A100 80GB runs Llama 3.3 70B Instruct natively in VRAM at NVFP4 quantization, achieving approximately 35.2 tokens per second.
Can the NVIDIA A100 80GB run Qwen 3.6 27B?: Yes. The NVIDIA A100 80GB runs Qwen 3.6 27B natively in VRAM at BF16 quantization, achieving approximately 23.8 tokens per second.
Can the NVIDIA A100 80GB run Llama 3.1 8B Instruct?: Yes. The NVIDIA A100 80GB runs Llama 3.1 8B Instruct natively in VRAM at FP32 quantization, achieving approximately 40.1 tokens per second.
Can I rent the NVIDIA A100 80GB instead of buying it?: Yes — RunPod and similar cloud GPU providers let you rent NVIDIA A100 80GB instances by the hour, with no long-term contract. This is often cheaper than buying if you only need it occasionally, and lets you try the GPU before committing to a purchase.