AdapTive-LeArning Speculator System (ATLAS): A New Paradigm in LLM Inference via Runtime-Learning Accelerators

Post Details

Company

Together AI

Date Published

Oct. 10, 2025

Author

Junxiong Wang, Shirley Wu, Zelei Shao, Vikranth Srivatsa, Jue Wang, Roy Yuan, Qingyang Wu, Alpay Ariyak, Rupert Wu, Wai Tong Chung, Chenfeng Xu, Yonatan Oren, Pragaash Ponnusamy, Yineng Zhang, Avner May, Leon Song, Tri Dao, Percy Liang, Ce Zhang, Ben Athi

Word Count

2,048

Language

English

Hacker News Points

-

Source URL

www.together.ai/blog/adaptive-learning-speculator-system-atlas

Summary

Together AI is enhancing the performance of large language models through its Adaptive-Learning Speculator System (ATLAS), part of the Together Turbo inference suite. ATLAS is designed to automatically improve performance without manual tuning by dynamically adapting to real-time usage patterns, unlike traditional static or custom-trained speculators. This system employs two cooperating speculators—a static one trained on a broad corpus and a lightweight adaptive one that updates with real-time traffic—guided by a confidence-aware controller to optimize speculation lookahead and enhance speed and accuracy. ATLAS has demonstrated significant performance gains, such as achieving up to 500 tokens per second on DeepSeek-V3.1, outperforming specialized hardware by dynamically aligning with evolving workloads. This advancement underscores Together AI's commitment to delivering scalable, efficient AI systems that are continuously optimized for speed and adaptability, thereby reducing latency and ensuring high-quality output in varied and rapidly changing environments.