How much VRAM does the NVIDIA L40S have?

The NVIDIA L40S has 48 GB of GDDR6 with 864 GB/s memory bandwidth.

What is the NVIDIA L40S best for?

With 48 GB of VRAM, the NVIDIA L40S is ideal for running 70B-class models at Q4 quantization and large MoE models — a workstation sweet spot for local inference.

What LLMs can the NVIDIA L40S run locally?

The NVIDIA L40S can run 49 of the 80 open-weight models tracked by CanItRun natively in VRAM at 8k context. Top options include: Llama 3.3 70B Instruct at NVFP4, Llama 3.1 8B Instruct at FP32, Llama 3.2 3B Instruct at FP32.

Can the NVIDIA L40S run Llama 3.3 70B Instruct?

Yes. The NVIDIA L40S runs Llama 3.3 70B Instruct natively in VRAM at NVFP4 quantization, achieving approximately 14.9 tokens per second.

Can the NVIDIA L40S run Qwen 3.6 27B?

Yes. The NVIDIA L40S runs Qwen 3.6 27B natively in VRAM at NVFP4 quantization, achieving approximately 37.2 tokens per second.

Can the NVIDIA L40S run Llama 3.1 8B Instruct?

Yes. The NVIDIA L40S runs Llama 3.1 8B Instruct natively in VRAM at FP32 quantization, achieving approximately 17 tokens per second.

Can I rent the NVIDIA L40S instead of buying it?

Yes — RunPod and similar cloud GPU providers let you rent NVIDIA L40S instances by the hour, with no long-term contract. This is often cheaper than buying if you only need it occasionally, and lets you try the GPU before committing to a purchase.

NVIDIA L40S

The NVIDIA L40S has 48 GB VRAM and 864 GB/s memory bandwidth. It can run 49 of our 80 tracked models natively in VRAM at 8k context.

With 48 GB GDDR6, the NVIDIA L40S is a datacenter-tier GPU that can run 49 models natively. It handles 70B-class models at Q4 quantization.

The NVIDIA L40S is a 2023 data center GPU built on the Ada Lovelace architecture with 48GB of GDDR6 VRAM. Unlike HBM-based chips, its 864 GB/s bandwidth is lower, but the higher VRAM capacity over the A100 40GB lets it hold larger models entirely in GPU memory. The L40S also includes hardware-accelerated video encode and decode, making it a popular choice for mixed multimedia and LLM inference workloads in cloud deployments.

NVIDIA L40S: October 2022 Ada Lovelace datacenter GPU with 48GB GDDR6 at 864 GB/s.

70B at Q4 native. 405B at Q2 with CPU offload. ~25-40 t/s for 7B, ~10-15 t/s for 70B.

Full CUDA support. Popular for mixed multimedia + LLM workloads. 48GB is the sweet spot capacity.

Vendor	NVIDIA
Architecture	Ada Lovelace
VRAM	48 GB
Memory type	GDDR6
Memory bandwidth	864 GB/s
Compute backend	CUDA
Tier	Datacenter
Released	2023
Models (native)	49 / 80
Models (offload)	7 / 80

Software: Typically cloud-accessed. vLLM and TensorRT-LLM give best batched-inference performance.

Cloud GPU Rental

Don't want to buy a NVIDIA L40S? RunPod is a cloud GPU rental service — rent one by the hour instead, no contract, no upfront hardware cost.

Pay by the hour · no contract · pods start in about a minute.

Rent a NVIDIA L40S on RunPod ↗ (+$5 signup credit)

Affiliate link — CanItRun may earn a commission. Doesn't affect the fit calculation above.

Popular models for this GPU

Qwen 2.5 72B Instruct Llama 3.3 70B Instruct DeepSeek R1 Distill Llama 70B Llama 3.1 70B Instruct Mixtral 8x7B Instruct v0.1

Models this GPU runs natively in VRAM (49)

Models that fit with CPU offload (7)

These use system RAM for layers that don't fit in VRAM — expect much slower inference.

Too large for this GPU (24)

Compare NVIDIA L40S with other GPUs

Explore

Browse all guides →Browse all models →Browse all GPUs →

Frequently asked questions

How much VRAM does the NVIDIA L40S have?: The NVIDIA L40S has 48 GB of GDDR6 with 864 GB/s memory bandwidth.
What is the NVIDIA L40S best for?: With 48 GB of VRAM, the NVIDIA L40S is ideal for running 70B-class models at Q4 quantization and large MoE models — a workstation sweet spot for local inference.
What LLMs can the NVIDIA L40S run locally?: The NVIDIA L40S can run 49 of the 80 open-weight models tracked by CanItRun natively in VRAM at 8k context. Top options include: Llama 3.3 70B Instruct at NVFP4, Llama 3.1 8B Instruct at FP32, Llama 3.2 3B Instruct at FP32.
Can the NVIDIA L40S run Llama 3.3 70B Instruct?: Yes. The NVIDIA L40S runs Llama 3.3 70B Instruct natively in VRAM at NVFP4 quantization, achieving approximately 14.9 tokens per second.
Can the NVIDIA L40S run Qwen 3.6 27B?: Yes. The NVIDIA L40S runs Qwen 3.6 27B natively in VRAM at NVFP4 quantization, achieving approximately 37.2 tokens per second.
Can the NVIDIA L40S run Llama 3.1 8B Instruct?: Yes. The NVIDIA L40S runs Llama 3.1 8B Instruct natively in VRAM at FP32 quantization, achieving approximately 17 tokens per second.
Can I rent the NVIDIA L40S instead of buying it?: Yes — RunPod and similar cloud GPU providers let you rent NVIDIA L40S instances by the hour, with no long-term contract. This is often cheaper than buying if you only need it occasionally, and lets you try the GPU before committing to a purchase.