Home / Companies / Together AI / Blog / Post Details
Content Deep Dive

Inside the Together AI kernels team

Blog post from Together AI

Post Details
Company
Date Published
Author
Together AI
Word Count
3,484
Language
English
Hacker News Points
-
Summary

On Memorial Day 2022, Dan Fu, Tri Dao, and their colleagues challenged the AI establishment by publishing FlashAttention, which demonstrated significant GPU performance improvements by focusing on memory movement and compute patterns. This breakthrough not only garnered attention from AI leaders like Andrej Karpathy but also highlighted the overlooked potential in GPU optimization. Their work emphasized the crucial role of efficient software, or kernels, in bridging the gap between AI models and hardware capabilities, a concept further advanced by their ThunderKittens library, which dramatically simplifies code for new GPU generations. The efforts of Together AI, with their academic-industry partnership model, have led to substantial performance gains in AI applications, as seen in their Megakernel project, which significantly reduced latency for a real-time voice agent company. This approach underscores the necessity of optimized AI infrastructure for the AI Native Cloud, where custom solutions tailored to specific workloads can make a decisive impact on performance and scalability.