CanItRun Logocanitrun.

Command-R 35B

Command-R 35B needs roughly 17.5GB VRAM at Q4 quantization (70.0GB at FP16). 33 GPUs we track can run it fully in VRAM at 8k context.

Cohere35B params125k contextCC-BY-NC 4.0Non-commercial only

VRAM at each quantization

Assumes 8k context. KV cache grows linearly with context length.

QuantWeightsKV cacheTotal
FP1670.0 GB10.74 GB90.4 GB
Q835.0 GB10.74 GB51.2 GB
Q6_K26.3 GB10.74 GB41.4 GB
Q5_K_M21.9 GB10.74 GB36.5 GB
Q4_K_M17.5 GB10.74 GB31.6 GB
Q3_K_M14.0 GB10.74 GB27.7 GB
Q2_K10.5 GB10.74 GB23.8 GB

Benchmarks

GPUs that run Command-R 35B natively (33)

Plus 15 GPUs that run it with CPU offload (slower)

Notes

Full attention (no GQA) — heavy KV cache at long context.

Hugging Face ↗Ollama ↗Released 2024-08-30