NVIDIA A10 vs A10G for ML model inference

Company

Baseten

Date Published

Nov. 28, 2023

Author

Philip Kiely

Word count

1056

Language

English

Hacker News points

None

URL

www.baseten.co/blog/nvidia-a10-vs-a10g-for-ml-model-inference

Summary

The NVIDIA A10 and A10G GPUs are interchangeable for most model inference tasks due to their shared GPU memory and bandwidth, despite having different specs, particularly in tensor core compute. The A10 prioritizes tensor compute, while the A10G has a higher CUDA core performance. However, for most model inference tasks, including running seven billion parameter LLMs like Whisper and Stable Diffusion XL, the A10 and A10G have similar performance due to being memory bound rather than compute bound. This is confirmed through calculations of ops:byte ratio and arithmetic intensity, which show that the A10 and A10G have comparable performance for most popular models. The key factor in choosing a GPU for model inference is ensuring enough VRAM to run the model, with memory bandwidth having a higher impact on inference speed than tensor core compute.