Home / Companies / Together AI / Blog / Post Details
Content Deep Dive

AdapTive-LeArning Speculator System (ATLAS): A New Paradigm in LLM Inference via Runtime-Learning Accelerators

Blog post from Together AI

Post Details
Company
Date Published
Author
Junxiong Wang, Shirley Wu, Zelei Shao, Vikranth Srivatsa, Jue Wang, Roy Yuan, Qingyang Wu, Alpay Ariyak, Rupert Wu, Wai Tong Chung, Chenfeng Xu, Yonatan Oren, Pragaash Ponnusamy, Yineng Zhang, Avner May, Leon Song, Tri Dao, Percy Liang, Ce Zhang, Ben Athi
Word Count
2,048
Language
English
Hacker News Points
-
Summary

Together AI is enhancing the performance of large language models through its Adaptive-Learning Speculator System (ATLAS), part of the Together Turbo inference suite. ATLAS is designed to automatically improve performance without manual tuning by dynamically adapting to real-time usage patterns, unlike traditional static or custom-trained speculators. This system employs two cooperating speculators—a static one trained on a broad corpus and a lightweight adaptive one that updates with real-time traffic—guided by a confidence-aware controller to optimize speculation lookahead and enhance speed and accuracy. ATLAS has demonstrated significant performance gains, such as achieving up to 500 tokens per second on DeepSeek-V3.1, outperforming specialized hardware by dynamically aligning with evolving workloads. This advancement underscores Together AI's commitment to delivering scalable, efficient AI systems that are continuously optimized for speed and adaptability, thereby reducing latency and ensuring high-quality output in varied and rapidly changing environments.