Scalable Chain of Thoughts via Elastic Reasoning

Company

Arize

Date Published

May 16, 2025

Author

Sarah Welsh

Word count

968

Language

English

Hacker News points

None

URL

arize.com/blog/scalable-chain-of-thoughts-via-elastic-reasoning

Summary

This paper introduces Elastic Reasoning, a novel framework designed to enhance the efficiency and scalability of large reasoning models by explicitly separating the reasoning process into two distinct phases: thinking and solution. This separation allows for independent allocation of computational budgets, addressing challenges related to uncontrolled output lengths in real-world deployments with strict resource constraints. Large Language Models have become incredibly powerful, especially when using Chain-of-Thought prompting to break down complex problems step-by-step, but this structured reasoning process enables them to achieve state-of-the-art results on tasks like math and programming. However, these outputs can be excessively long and unpredictable, leading to high test-time compute costs. The paper proposes a novel solution: explicitly separating the reasoning process into two distinct stages—thinking and solution—each with its own token budget. This separation is a game-changer, allowing for independent allocation of computational budgets, improved performance, and ensured output completeness. Elastic Reasoning implements this split through two key mechanisms: Separate Budgeting for Inference, and Budget-Constrained Rollout. The model generates reasoning within a block, smoothly transitioning to the solution phase when reaching its token limit. This guarantees both reasoning and a final answer are included. An RL fine-tuning strategy using the GRPO algorithm trains the model to handle truncated reasoning. Elastic Reasoning achieves impressive results on benchmark tasks, including accuracy under tight budgets, significant cost savings, concise reasoning, budget generalization, and code task performance. The paper also offers deeper insights into real-world applications, such as hallucination handling, evaluation considerations, extending to multi-tool agents, best-fit use cases, where it falls short, and toward lightweight LLMs.