Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

AI evals are becoming the new compute bottleneck

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Avijit Ghosh, Yifan Mai, Georgia Channing, and Leshem Choshen
Word Count
3,881
Language
-
Hacker News Points
-
Summary

AI evaluation is becoming a significant computational bottleneck due to escalating costs, which now often surpass those of model training. This shift is particularly evident in advanced benchmarks and scientific machine learning tasks, where evaluation expenses can exceed training costs by orders of magnitude. The Holistic Agent Leaderboard (HAL) highlights the high expense of evaluating AI models, with costs of up to $40,000 for a single benchmark run. Compressing evaluations for static benchmarks has proven effective, but agent and training-in-the-loop benchmarks resist such reductions, leading to high costs for reliable assessments. Additionally, the lack of standardized documentation leads to repeated evaluations, further driving up costs. As a result, the divide between institutions able to afford these evaluations and those that cannot is growing, impacting the ability to independently validate AI systems. Reducing these costs through shared documentation and resource pooling could mitigate the economic barrier that evaluations now pose.