NVIDIA A10 vs A100 GPUs for LLM and Stable Diffusion inference

Company

Baseten

Date Published

Sept. 15, 2023

Author

Philip Kiely

Word count

1636

Language

English

Hacker News points

None

URL

www.baseten.co/blog/nvidia-a10-vs-a100-gpus-for-llm-and-stable-diffusion-inference

Summary

The NVIDIA A10 and A100 GPUs are two popular choices for model inference tasks, including large language models like Llama 2 and Stable Diffusion. The A10 is a cost-effective choice capable of running many recent models, while the A100 is an inference powerhouse for large models, with higher performance in FP16 Tensor Core calculations. However, the A100 is also much more expensive to use, with a price per minute of $0.10240 compared to the A10's $0.02012. To balance latency and cost, users can consider using multiple GPUs in a single instance, such as combining 2-8 A10s or 1-8 A100s, which can also help run larger models like Llama 2-chat 13B. Ultimately, the choice between the A10 and A100 depends on the user's needs and budget, with the A10 offering a cost-effective alternative for many workloads.