Home / Companies / Together AI / Blog / Post Details
Content Deep Dive

Key research and product announcements at the AI Native Conf

Blog post from Together AI

Post Details
Company
Date Published
Author
Together AI
Word Count
2,407
Language
English
Hacker News Points
-
Summary

Together Research announced several advancements at the AI Native Conf, showcasing innovations like FlashAttention-4, a kernel co-design that significantly enhances the performance of large-scale language models on NVIDIA GPUs, achieving faster processing at lower costs. Their Megakernel implementation, tailored for real-time voice agents, dramatically improved performance metrics by optimizing the entire model in one kernel. The introduction of together.compile automates kernel optimization, boosting production efficiency for video models. The new Reinforcement Learning API provides teams with control over RL training configurations, enhancing rollout efficiency. ThunderAgent overcomes challenges in agentic workflows by treating them as cohesive scheduling units, resulting in substantial throughput improvements. ATLAS-2, a speculative decoding method, continuously updates speculator models in real-time, maintaining performance as traffic patterns shift. Additionally, Cache-aware prefill–decode disaggregation (CPD) optimizes long-context inference, achieving higher throughput by managing cache usage effectively. Together's approach emphasizes the synergistic relationship between research and production, aiming to expand AI infrastructure capabilities for demanding applications.