CanItRun Logocanitrun.

NVIDIA H100 80GB

The NVIDIA H100 80GB has 80 GB VRAM and 3350 GB/s memory bandwidth. It can run 54 of our 70 tracked models natively in VRAM at 8k context.

With 80 GB HBM3, the NVIDIA H100 80GB is a datacenter-tier GPU that can run 54 models natively. It handles 70B-class models at Q4 quantization.

NVIDIA H100 80GB: March 2022 Hopper architecture with 80GB HBM3 at 3350 GB/s — datacenter flagship.

Runs 70B at Q4 native, 405B at Q2-Q3. ~50-80 t/s for 7B, ~20-30 t/s for 70B Q4.

Best-in-class throughput with vLLM and TensorRT-LLM. Excellent multi-GPU NVLink scaling. Cloud-only for most users.

VendorNVIDIA
ArchitectureHopper
VRAM80 GB
Memory typeHBM3
Memory bandwidth3350 GB/s
Compute backendCUDA
TierDatacenter
Released2022
Models (native)54 / 70
Models (offload)3 / 70
Software: Best-in-class inference throughput. vLLM and TensorRT-LLM are recommended; excellent multi-GPU NVLink scaling.

Popular models for this GPU

Models this GPU runs natively in VRAM (54)

Models that fit with CPU offload (3)

These use system RAM for layers that don't fit in VRAM — expect much slower inference.

Too large for this GPU (13)

Compare NVIDIA H100 80GB with other GPUs

Frequently asked questions

How much VRAM does the NVIDIA H100 80GB have?
The NVIDIA H100 80GB has 80 GB of HBM3 with 3350 GB/s memory bandwidth.
What is the NVIDIA H100 80GB best for?
With 80 GB of VRAM, the NVIDIA H100 80GB is a server-class GPU designed for running the largest open-weight models (70B–405B) at high quantization with ample context.
What LLMs can the NVIDIA H100 80GB run locally?
The NVIDIA H100 80GB can run 54 of the 70 open-weight models tracked by CanItRun natively in VRAM at 8k context. Top options include: Llama 3.3 70B Instruct at NVFP4, Llama 3.1 8B Instruct at FP32, Llama 3.2 3B Instruct at FP32.
Can the NVIDIA H100 80GB run Llama 3.3 70B Instruct?
Yes. The NVIDIA H100 80GB runs Llama 3.3 70B Instruct natively in VRAM at NVFP4 quantization, achieving approximately 95.7 tokens per second.
Can the NVIDIA H100 80GB run Qwen 3.6 27B?
Yes. The NVIDIA H100 80GB runs Qwen 3.6 27B natively in VRAM at BF16 quantization, achieving approximately 62 tokens per second.
Can the NVIDIA H100 80GB run Llama 3.1 8B Instruct?
Yes. The NVIDIA H100 80GB runs Llama 3.1 8B Instruct natively in VRAM at FP32 quantization, achieving approximately 104.7 tokens per second.