AI evals are becoming the new compute bottleneck

Post Details

Company

HuggingFace

Date Published

April 29, 2026

Author

Avijit Ghosh, Yifan Mai, Georgia Channing, and Leshem Choshen

Word Count

3,881

Language

-

Hacker News Points

-

Source URL

huggingface.co/blog/evaleval/eval-costs-bottleneck

Summary

AI evaluation is becoming a significant computational bottleneck due to escalating costs, which now often surpass those of model training. This shift is particularly evident in advanced benchmarks and scientific machine learning tasks, where evaluation expenses can exceed training costs by orders of magnitude. The Holistic Agent Leaderboard (HAL) highlights the high expense of evaluating AI models, with costs of up to $40,000 for a single benchmark run. Compressing evaluations for static benchmarks has proven effective, but agent and training-in-the-loop benchmarks resist such reductions, leading to high costs for reliable assessments. Additionally, the lack of standardized documentation leads to repeated evaluations, further driving up costs. As a result, the divide between institutions able to afford these evaluations and those that cannot is growing, impacting the ability to independently validate AI systems. Reducing these costs through shared documentation and resource pooling could mitigate the economic barrier that evaluations now pose.