Company
Date Published
Author
Ornella Altunyan
Word count
747
Language
English
Hacker News points
None

Summary

Anthropic's recent release of Claude Sonnet 4.5 has set new standards in AI performance, particularly in coding and reasoning tasks, boasting a 77.2% score on SWE-bench Verified and extending autonomous operations to over 30 hours. The company's approach focuses on "aspirational evals," tests for capabilities not yet existing, which are crucial for identifying new applications beyond standard benchmarks. These evals help define product features that are currently limited by AI constraints, and with each model release, Anthropic assesses whether these features can now be developed. The leap from Claude Sonnet 4 to 4.5 exemplifies the "capability cliff," where AI models drastically improve, enabling new applications rather than just incremental advancements. Through Loop, Anthropic tests for unsupervised prompt optimization, demonstrating significant performance improvements and faster inference times with Claude Sonnet 4.5. This strategy of rapid evaluation and feature deployment allows Anthropic to quickly capitalize on new AI developments, providing an edge over competitors who follow traditional model assessment cycles.