RL Environments and the Hierarchy of Agentic Capabilities

Post Details

Company

Surge AI

Date Published

Nov. 3, 2025

Author

Surge AI Research Team

Word Count

4,073

Language

English

Hacker News Points

-

Source URL

surgehq.ai/blog/rl-envs-real-world

Summary

In 2025, the focus of artificial intelligence development shifted towards creating agents capable of performing economically valuable tasks in realistic environments, moving beyond simple chat interfaces. Despite advancements, models like GPT-5 and Claude Sonnet 4.5 still struggled with over 40% of tasks in reinforcement learning (RL) environments, highlighting the challenges in developing generally intelligent agents. These environments, exemplified by Corecraft, Inc., are designed to mimic real-world tasks, such as customer support, requiring models to perform complex operations like tool use, goal formation, and adaptability. A hierarchy of agentic capabilities was identified, ranging from basic tool use to common-sense reasoning, with current models exhibiting varied levels of proficiency. While newer models like GPT-5.2 and Claude Opus 4.5 showed modest improvements, they still face significant obstacles in achieving human-level common-sense reasoning, which remains a critical barrier to their real-world applicability. The year marked significant progress in AI agents' reliability and coherence, setting the stage for further exploration into their potential to match human intelligence, although the timeline for closing this gap remains uncertain.