Supercharging NVIDIA H200 and H100 GPU Cluster Performance With Together Kernel Collection

Post Details

Company

Together AI

Date Published

Sept. 5, 2024

Author

Together AI

Word Count

1,781

Language

English

Hacker News Points

-

Source URL

www.together.ai/blog/nvidia-h200-and-h100-gpu-cluster-performance-together-kernel-collection

Summary

The NVIDIA H200 Tensor Core GPU is a high-performance computing (HPC) and artificial intelligence (AI) workhorse, designed to excel in both AI and HPC workloads. With its advanced Hopper architecture, the H200 provides 40% faster inference performance on Llama 2 13B and 90% faster performance on Llama 2 70B, demonstrating significant improvement in handling large-scale language models. The GPU's substantial memory and bandwidth allow it to handle even the most data-intensive applications with ease, minimizing bottlenecks and enabling real-time processing of vast datasets. Together AI's custom-built Together Kernel Collection (TKC) offers up to 24% speedup for operators used frequently in training and up to 75% speedup for fundamental operations used in FP8 inference, significantly accelerating common AI operations.