How much VRAM does the NVIDIA H100 80GB have?

The NVIDIA H100 80GB has 80 GB of HBM3 with 3350 GB/s memory bandwidth.

What is the NVIDIA H100 80GB best for?

With 80 GB of VRAM, the NVIDIA H100 80GB is a server-class GPU designed for running the largest open-weight models (70B–405B) at high quantization with ample context.

What LLMs can the NVIDIA H100 80GB run locally?

The NVIDIA H100 80GB can run 56 of the 80 open-weight models tracked by CanItRun natively in VRAM at 8k context. Top options include: Llama 3.3 70B Instruct at NVFP4, Llama 3.1 8B Instruct at FP32, Llama 3.2 3B Instruct at FP32.

Can the NVIDIA H100 80GB run Llama 3.3 70B Instruct?

Yes. The NVIDIA H100 80GB runs Llama 3.3 70B Instruct natively in VRAM at NVFP4 quantization, achieving approximately 57.8 tokens per second.

Can the NVIDIA H100 80GB run Qwen 3.6 27B?

Yes. The NVIDIA H100 80GB runs Qwen 3.6 27B natively in VRAM at BF16 quantization, achieving approximately 39.2 tokens per second.

Can the NVIDIA H100 80GB run Llama 3.1 8B Instruct?

Yes. The NVIDIA H100 80GB runs Llama 3.1 8B Instruct natively in VRAM at FP32 quantization, achieving approximately 65.8 tokens per second.

Can I rent the NVIDIA H100 80GB instead of buying it?

Yes — RunPod and similar cloud GPU providers let you rent NVIDIA H100 80GB instances by the hour, with no long-term contract. This is often cheaper than buying if you only need it occasionally, and lets you try the GPU before committing to a purchase.

NVIDIA H100 80GB

The NVIDIA H100 80GB has 80 GB VRAM and 3350 GB/s memory bandwidth. It can run 56 of our 80 tracked models natively in VRAM at 8k context.

With 80 GB HBM3, the NVIDIA H100 80GB is a datacenter-tier GPU that can run 56 models natively. It handles 70B-class models at Q4 quantization.

The NVIDIA H100 80GB is the Hopper-generation datacenter GPU with 80GB HBM3 at an industry-leading 3,350 GB/s bandwidth. With FP8 Transformer Engine and 4th-gen NVLink, it delivers the fastest single-GPU LLM inference available — running 70B models at Q8 and 405B models across multi-GPU NVLink clusters. The de facto standard for production LLM serving.

NVIDIA H100 80GB: March 2022 Hopper architecture with 80GB HBM3 at 3350 GB/s — datacenter flagship.

Runs 70B at Q4 native, 405B at Q2-Q3. ~50-80 t/s for 7B, ~20-30 t/s for 70B Q4.

Best-in-class throughput with vLLM and TensorRT-LLM. Excellent multi-GPU NVLink scaling. Cloud-only for most users.

Vendor	NVIDIA
Architecture	Hopper
VRAM	80 GB
Memory type	HBM3
Memory bandwidth	3350 GB/s
Compute backend	CUDA
Tier	Datacenter
Released	2022
Models (native)	56 / 80
Models (offload)	4 / 80

Software: Best-in-class inference throughput. vLLM and TensorRT-LLM are recommended; excellent multi-GPU NVLink scaling.

Cloud GPU Rental

Don't want to buy a NVIDIA H100 80GB? RunPod is a cloud GPU rental service — rent one by the hour instead, no contract, no upfront hardware cost.

Pay by the hour · no contract · pods start in about a minute.

Rent a NVIDIA H100 80GB on RunPod ↗ (+$5 signup credit)

Affiliate link — CanItRun may earn a commission. Doesn't affect the fit calculation above.

Popular models for this GPU

Mixtral 8x22B Instruct v0.1 Qwen 3.5 122B-A10B (MoE)Nemotron 3 Super 120B GPT-OSS 120B Llama 4 Scout 109B

Models this GPU runs natively in VRAM (56)

Models that fit with CPU offload (4)

These use system RAM for layers that don't fit in VRAM — expect much slower inference.

Too large for this GPU (20)

Compare NVIDIA H100 80GB with other GPUs

Continue reading

vram-guides10 min

DeepSeek Family Guide: R1, V3 & Distilled Models

hardware10 min

Used Enterprise GPUs for LLMs: A6000, A100, and Beyond

Frequently asked questions

How much VRAM does the NVIDIA H100 80GB have?: The NVIDIA H100 80GB has 80 GB of HBM3 with 3350 GB/s memory bandwidth.
What is the NVIDIA H100 80GB best for?: With 80 GB of VRAM, the NVIDIA H100 80GB is a server-class GPU designed for running the largest open-weight models (70B–405B) at high quantization with ample context.
What LLMs can the NVIDIA H100 80GB run locally?: The NVIDIA H100 80GB can run 56 of the 80 open-weight models tracked by CanItRun natively in VRAM at 8k context. Top options include: Llama 3.3 70B Instruct at NVFP4, Llama 3.1 8B Instruct at FP32, Llama 3.2 3B Instruct at FP32.
Can the NVIDIA H100 80GB run Llama 3.3 70B Instruct?: Yes. The NVIDIA H100 80GB runs Llama 3.3 70B Instruct natively in VRAM at NVFP4 quantization, achieving approximately 57.8 tokens per second.
Can the NVIDIA H100 80GB run Qwen 3.6 27B?: Yes. The NVIDIA H100 80GB runs Qwen 3.6 27B natively in VRAM at BF16 quantization, achieving approximately 39.2 tokens per second.
Can the NVIDIA H100 80GB run Llama 3.1 8B Instruct?: Yes. The NVIDIA H100 80GB runs Llama 3.1 8B Instruct natively in VRAM at FP32 quantization, achieving approximately 65.8 tokens per second.
Can I rent the NVIDIA H100 80GB instead of buying it?: Yes — RunPod and similar cloud GPU providers let you rent NVIDIA H100 80GB instances by the hour, with no long-term contract. This is often cheaper than buying if you only need it occasionally, and lets you try the GPU before committing to a purchase.