Inside the Together AI kernels team

Post Details

Company

Together AI

Date Published

April 1, 2026

Author

Together AI

Word Count

3,484

Language

English

Hacker News Points

-

Source URL

www.together.ai/blog/inside-the-together-ai-kernels-team

Summary

On Memorial Day 2022, Dan Fu, Tri Dao, and their colleagues challenged the AI establishment by publishing FlashAttention, which demonstrated significant GPU performance improvements by focusing on memory movement and compute patterns. This breakthrough not only garnered attention from AI leaders like Andrej Karpathy but also highlighted the overlooked potential in GPU optimization. Their work emphasized the crucial role of efficient software, or kernels, in bridging the gap between AI models and hardware capabilities, a concept further advanced by their ThunderKittens library, which dramatically simplifies code for new GPU generations. The efforts of Together AI, with their academic-industry partnership model, have led to substantial performance gains in AI applications, as seen in their Megakernel project, which significantly reduced latency for a real-time voice agent company. This approach underscores the necessity of optimized AI infrastructure for the AI Native Cloud, where custom solutions tailored to specific workloads can make a decisive impact on performance and scalability.