Home / Companies / Together AI / Blog / Post Details
Content Deep Dive

Supercharging NVIDIA H200 and H100 GPU Cluster Performance With Together Kernel Collection

Blog post from Together AI

Post Details
Company
Date Published
Author
Together AI
Word Count
1,781
Language
English
Hacker News Points
-
Summary

The NVIDIA H200 Tensor Core GPU is a high-performance computing (HPC) and artificial intelligence (AI) workhorse, designed to excel in both AI and HPC workloads. With its advanced Hopper architecture, the H200 provides 40% faster inference performance on Llama 2 13B and 90% faster performance on Llama 2 70B, demonstrating significant improvement in handling large-scale language models. The GPU's substantial memory and bandwidth allow it to handle even the most data-intensive applications with ease, minimizing bottlenecks and enabling real-time processing of vast datasets. Together AI's custom-built Together Kernel Collection (TKC) offers up to 24% speedup for operators used frequently in training and up to 75% speedup for fundamental operations used in FP8 inference, significantly accelerating common AI operations.