Home / Companies / Arize / Blog / Post Details
Content Deep Dive

Models got an order of magnitude better at following instructions in one year

Blog post from Arize

Post Details
Company
Date Published
Author
Laurie Voss
Word Count
2,175
Company Posts That Month
16
Language
English
Hacker News Points
-
Summary

Over the past year, AI models have significantly improved in their ability to follow complex instructions, as demonstrated by the updated IFScale benchmark. This benchmark, originally detailed by Jaroslawicz et al. (2025), measures how well models can adhere to numerous constraints, such as including specific keywords in a business report. While older models struggled to maintain accuracy beyond 200-300 simultaneous instructions, current frontier models, like GPT 5.5 and Gemini 3.1 Pro, can now handle up to 5,000 instructions with high accuracy. This advancement has implications for AI engineering, allowing for more detailed prompts and reducing the need for compressed skill files, although it introduces new considerations regarding cost and processing time. Different models exhibit unique failure modes; for example, some models politely refuse complex tasks, while others overthink or misinterpret constraints. Despite these challenges, the ability to manage extensive instructions opens new possibilities for developing sophisticated AI applications.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
LLM 1 9,074 1,640 224 +53%