Home / Companies / Surge AI / Blog / Post Details
Content Deep Dive

RL Environments and the Hierarchy of Agentic Capabilities

Blog post from Surge AI

Post Details
Company
Date Published
Author
Surge AI Research Team
Word Count
4,073
Language
English
Hacker News Points
-
Summary

In 2025, the focus of artificial intelligence development shifted towards creating agents capable of performing economically valuable tasks in realistic environments, moving beyond simple chat interfaces. Despite advancements, models like GPT-5 and Claude Sonnet 4.5 still struggled with over 40% of tasks in reinforcement learning (RL) environments, highlighting the challenges in developing generally intelligent agents. These environments, exemplified by Corecraft, Inc., are designed to mimic real-world tasks, such as customer support, requiring models to perform complex operations like tool use, goal formation, and adaptability. A hierarchy of agentic capabilities was identified, ranging from basic tool use to common-sense reasoning, with current models exhibiting varied levels of proficiency. While newer models like GPT-5.2 and Claude Opus 4.5 showed modest improvements, they still face significant obstacles in achieving human-level common-sense reasoning, which remains a critical barrier to their real-world applicability. The year marked significant progress in AI agents' reliability and coherence, setting the stage for further exploration into their potential to match human intelligence, although the timeline for closing this gap remains uncertain.