Command-R 35B
Command-R 35B needs roughly 34.1 GB VRAM at Q4_K_M quantization (90.4 GB at FP16). 54 GPUs we track can run it fully in VRAM at 8k context.
54 GPUs run this natively · 36 with CPU offload
Command-R 35B is a 35B parameter dense model developed by Cohere. August 2024 RAG and tool-use specialist from Cohere. 128K context with full attention (no GQA).
To run Command-R 35B locally: Q4_K_M ~20-22GB — fits on 24GB GPU but full attention means heavy KV cache at long context. 32GB+ recommended for RAG workloads.
Industry-leading retrieval-augmented generation. Strong multilingual support across 10 languages.
VRAM at each quantization
Assumes 8k context. KV cache grows linearly with context length.
| Quant | Weights | KV cache | Total |
|---|---|---|---|
| FP32 | 140.0 GB | 10.74 GB | 168.8 GB |
| BF16 | 70.0 GB | 10.74 GB | 90.4 GB |
| FP16 | 70.0 GB | 10.74 GB | 90.4 GB |
| Q8_0 | 35.0 GB | 10.74 GB | 51.2 GB |
| Q6_K | 28.7 GB | 10.74 GB | 44.2 GB |
| Q5_K_M | 22.5 GB | 10.74 GB | 37.3 GB |
| Q4_K_Mrec | 19.7 GB | 10.74 GB | 34.1 GB |
| Q3_K_M | 15.1 GB | 10.74 GB | 28.9 GB |
| Q2_K | 11.5 GB | 10.74 GB | 24.9 GB |
| NVFP4cuda | 17.5 GB | 10.74 GB | 31.6 GB |
KV cache shown at 8k context (FP16). NVFP4 requires a CUDA GPU. Enable TurboQuant in the calculator to see reduced KV cache estimates.
Benchmarks
GPUs that run Command-R 35B natively (54)
- NVIDIA RTX 5090Q3_K_M · 119.1 t/s
- NVIDIA H100 80GBNVFP4 · 191.4 t/s
- NVIDIA A100 80GBNVFP4 · 116.5 t/s
- NVIDIA A100 40GBNVFP4 · 88.9 t/s
- NVIDIA L40SNVFP4 · 49.4 t/s
- NVIDIA RTX A6000NVFP4 · 43.9 t/s
- NVIDIA RTX 5000 AdaQ3_K_M · 38.3 t/s
- NVIDIA RTX 6000 AdaNVFP4 · 54.9 t/s
- NVIDIA RTX Pro 6000BF16 · 19.2 t/s
- NVIDIA DGX Spark (128GB)BF16 · 3.9 t/s
- AMD Radeon PRO W7800Q3_K_M · 38.3 t/s
- AMD Radeon PRO W7900Q6_K · 30.1 t/s
- AMD Instinct MI300XFP32 · 37.9 t/s
- AMD Radeon AI Pro 9700 32GBQ3_K_M · 42.5 t/s
- AMD Strix Halo (128GB)BF16 · 3.7 t/s
- AMD Strix Halo (96GB)BF16 · 3.7 t/s
- AMD Strix Halo (64GB)Q8_0 · 7.3 t/s
- Apple M5 Max (128GB)BF16 · 8.8 t/s
- Apple M5 Max (64GB)Q8_0 · 17.5 t/s
- Apple M5 Max (48GB)Q5_K_M · 27.2 t/s
- Apple M5 Pro (48GB)Q5_K_M · 13.6 t/s
- Apple M5 Pro (36GB)Q3_K_M · 20.4 t/s
- Apple M5 (32GB)Q2_K · 13.3 t/s
- Apple M4 Ultra (384GB)FP32 · 7.8 t/s
- Apple M4 Ultra (192GB)FP32 · 7.8 t/s
- Apple M4 Max (128GB)BF16 · 7.8 t/s
- Apple M4 Max (96GB)BF16 · 7.8 t/s
- Apple M4 Max (64GB)Q8_0 · 15.6 t/s
- Apple M4 Max (48GB)Q5_K_M · 24.2 t/s
- Apple M4 Pro (48GB)Q5_K_M · 12.1 t/s
- Apple M4 (32GB)Q2_K · 10.4 t/s
- Apple M3 Ultra (512GB)FP32 · 5.9 t/s
- Apple M3 Ultra (256GB)FP32 · 5.9 t/s
- Apple M3 Ultra (96GB)BF16 · 11.7 t/s
- Apple M3 Max (128GB)BF16 · 5.7 t/s
- Apple M3 Max (96GB)BF16 · 5.7 t/s
- Apple M3 Max (64GB)Q8_0 · 11.4 t/s
- Apple M3 Max (48GB)Q5_K_M · 17.7 t/s
- Apple M3 Max (36GB)Q3_K_M · 26.6 t/s
- Apple M3 Pro (36GB)Q3_K_M · 10 t/s
- Apple M2 Ultra (384GB)FP32 · 5.7 t/s
- Apple M2 Ultra (192GB)FP32 · 5.7 t/s
- Apple M2 Max (96GB)BF16 · 5.7 t/s
- Apple M2 Max (64GB)Q8_0 · 11.4 t/s
- Apple M2 Max (32GB)Q2_K · 34.7 t/s
- Apple M2 Pro (32GB)Q2_K · 17.4 t/s
- Apple M1 Ultra (128GB)BF16 · 11.4 t/s
- Apple M1 Ultra (64GB)Q8_0 · 22.9 t/s
- Apple M1 Max (64GB)Q8_0 · 11.4 t/s
- Apple M1 Max (32GB)Q2_K · 34.7 t/s
- Apple M1 Pro (32GB)Q2_K · 17.4 t/s
- Intel Data Center GPU Max 1550BF16 · 46.8 t/s
- Intel Data Center GPU Max 1100Q6_K · 42.8 t/s
- Intel Arc 140V (32GB)Q2_K · 11.9 t/s
Plus 36 GPUs that run it with CPU offload (slower)
- NVIDIA RTX 5080NVFP4 · 13.7 t/s
- NVIDIA RTX 5070 TiNVFP4 · 12.8 t/s
- NVIDIA RTX 5070NVFP4 · 9.6 t/s
- NVIDIA RTX 5060 Ti 16GBNVFP4 · 6.4 t/s
- NVIDIA RTX 5060NVFP4 · 6.4 t/s
- NVIDIA RTX 5050NVFP4 · 4.6 t/s
- NVIDIA RTX 4090NVFP4 · 14.4 t/s
- NVIDIA RTX 4080NVFP4 · 10.2 t/s
- NVIDIA RTX 4070 TiNVFP4 · 7.2 t/s
- NVIDIA RTX 4070NVFP4 · 7.2 t/s
- NVIDIA RTX 4060 Ti 16GBNVFP4 · 4.1 t/s
- NVIDIA RTX 4060NVFP4 · 3.9 t/s
- NVIDIA RTX 3090NVFP4 · 13.4 t/s
- NVIDIA RTX 3090 TiNVFP4 · 14.4 t/s
- NVIDIA RTX 3080 10GBNVFP4 · 10.9 t/s
- NVIDIA RTX 3060 12GBNVFP4 · 5.1 t/s
- NVIDIA RTX 4000 AdaNVFP4 · 4.6 t/s
- NVIDIA RTX 4500 AdaNVFP4 · 6.2 t/s
- AMD Radeon RX 7900 XTXQ6_K · 8.4 t/s
- AMD Radeon RX 7900 XTQ6_K · 7 t/s
- AMD Radeon RX 7900 GREQ5_K_M · 6.4 t/s
- AMD Radeon RX 6800 XTQ5_K_M · 5.7 t/s
- Intel Arc B580 12GBQ5_K_M · 5.1 t/s
- Intel Arc B570 10GBQ4_K_M · 4.8 t/s
- Intel Arc Pro B70 24GBQ6_K · 4 t/s
- Intel Arc Pro B60 24GBQ6_K · 3.3 t/s
- Intel Arc A770 16GBQ5_K_M · 6.2 t/s
- Intel Arc A770 8GBQ3_K_M · 8.5 t/s
- Intel Arc A750 8GBQ3_K_M · 8.5 t/s
- Intel Arc A580 8GBQ3_K_M · 8.5 t/s
- Intel Arc A380 6GBQ3_K_M · 3.1 t/s
- Intel Arc A310 4GBQ3_K_M · 2.1 t/s
- Intel Arc Pro A60 12GBQ5_K_M · 4.3 t/s
- Intel Arc Pro A50 6GBQ3_K_M · 3.2 t/s
- Intel Arc Pro A40 6GBQ3_K_M · 3.2 t/s
- CPU only (system RAM)Q2_K · 0.9 t/s
Notes
Full attention (no GQA) — heavy KV cache at long context.
Compare Command-R 35B with other models
Frequently asked questions
- What are the VRAM requirements for Command-R 35B?
- Command-R 35B requires approximately 34.1 GB of VRAM at Q4_K_M quantization, 51.2 GB at Q8, and 90.4 GB at FP16. These numbers assume 8k context window; VRAM scales linearly with context length due to the KV cache.
- How many parameters does Command-R 35B have?
- Command-R 35B has 35 billion parameters.
- How capable is Command-R 35B?
- Command-R 35B has an MMLU-Pro score of 33, making it well-suited for lightweight tasks, prototyping, and resource-constrained environments.
- Can Command-R 35B run on a 16 GB GPU?
- No. At Q4_K_M, Command-R 35B needs 34.1 GB of VRAM — more than 16 GB. You will need a 48 GB GPU like the RTX 6000 Ada or a dual-GPU setup.
- Can Command-R 35B run on a 24 GB GPU?
- No. Even at Q4_K_M, Command-R 35B needs 34.1 GB. Consider a 48 GB card like the RTX 6000 Ada or a dual RTX 4090 setup.
- What is the smallest quantization for Command-R 35B that fits in 24 GB of VRAM?
- Command-R 35B cannot fit in 24 GB of VRAM at any standard quantization level. The minimum needed is 24.9 GB at Q2_K.
- What GPU do I need to run Command-R 35B locally?
- You need a 48 GB GPU or a dual-GPU setup. At Q4_K_M, Command-R 35B needs 34.1 GB VRAM. Options: RTX 6000 Ada (48 GB), A6000 (48 GB), or 2× RTX 4090.