Key research and product announcements at the AI Native Conf

Post Details

Company

Together AI

Date Published

March 5, 2026

Author

Together AI

Word Count

2,407

Language

English

Hacker News Points

-

Source URL

www.together.ai/blog/ai-native-conf-research-and-product-announcements

Summary

Together Research announced several advancements at the AI Native Conf, showcasing innovations like FlashAttention-4, a kernel co-design that significantly enhances the performance of large-scale language models on NVIDIA GPUs, achieving faster processing at lower costs. Their Megakernel implementation, tailored for real-time voice agents, dramatically improved performance metrics by optimizing the entire model in one kernel. The introduction of together.compile automates kernel optimization, boosting production efficiency for video models. The new Reinforcement Learning API provides teams with control over RL training configurations, enhancing rollout efficiency. ThunderAgent overcomes challenges in agentic workflows by treating them as cohesive scheduling units, resulting in substantial throughput improvements. ATLAS-2, a speculative decoding method, continuously updates speculator models in real-time, maintaining performance as traffic patterns shift. Additionally, Cache-aware prefill–decode disaggregation (CPD) optimizes long-context inference, achieving higher throughput by managing cache usage effectively. Together's approach emphasizes the synergistic relationship between research and production, aiming to expand AI infrastructure capabilities for demanding applications.