CanItRun Logocanitrun.

Command-R 35B

Command-R 35B needs roughly 34.1 GB VRAM at Q4_K_M quantization (90.4 GB at FP16). 54 GPUs we track can run it fully in VRAM at 8k context.

54 GPUs run this natively · 36 with CPU offload

Cohere35B params125k contextCC-BY-NC 4.0Non-commercial only

Command-R 35B is a 35B parameter dense model developed by Cohere. August 2024 RAG and tool-use specialist from Cohere. 128K context with full attention (no GQA).

To run Command-R 35B locally: Q4_K_M ~20-22GB — fits on 24GB GPU but full attention means heavy KV cache at long context. 32GB+ recommended for RAG workloads.

Industry-leading retrieval-augmented generation. Strong multilingual support across 10 languages.

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP32140.0 GB10.74 GB168.8 GB
BF1670.0 GB10.74 GB90.4 GB
FP1670.0 GB10.74 GB90.4 GB
Q8_035.0 GB10.74 GB51.2 GB
Q6_K28.7 GB10.74 GB44.2 GB
Q5_K_M22.5 GB10.74 GB37.3 GB
Q4_K_Mrec19.7 GB10.74 GB34.1 GB
Q3_K_M15.1 GB10.74 GB28.9 GB
Q2_K11.5 GB10.74 GB24.9 GB
NVFP4cuda17.5 GB10.74 GB31.6 GB

KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.

Benchmarks

GPUs that run Command-R 35B natively (54)

Plus 36 GPUs that run it with CPU offload (slower)

Notes

Full attention (no GQA) — heavy KV cache at long context.

Hugging Face ↗Ollama ↗Released 2024-08-30

Compare Command-R 35B with other models

Frequently asked questions

What are the VRAM requirements for Command-R 35B?
Command-R 35B requires approximately 34.1 GB of VRAM at Q4_K_M quantization, 51.2 GB at Q8, and 90.4 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
How many parameters does Command-R 35B have?
Command-R 35B has 35 billion parameters.
How capable is Command-R 35B?
Command-R 35B has an MMLU-Pro score of 33, making it well-suited for lightweight tasks, prototyping, and resource-constrained environments.
Can Command-R 35B run on a 16 GB GPU?
No. At Q4_K_M, Command-R 35B needs 34.1 GB of VRAM — more than 16 GB. You will need a 48 GB GPU like the RTX 6000 Ada or a dual-GPU setup.
Can Command-R 35B run on a 24 GB GPU?
No. Even at Q4_K_M, Command-R 35B needs 34.1 GB. Consider a 48 GB card like the RTX 6000 Ada or a dual RTX 4090 setup.
What is the smallest quantization for Command-R 35B that fits in 24 GB of VRAM?
Command-R 35B cannot fit in 24 GB of VRAM at any standard quantization level. The minimum needed is 24.9 GB at Q2_K.
What GPU do I need to run Command-R 35B locally?
You need a 48 GB GPU or a dual-GPU setup. At Q4_K_M, Command-R 35B needs 34.1 GB VRAM. Options: RTX 6000 Ada (48 GB), A6000 (48 GB), or 2× RTX 4090.