Safety Evals Should Project Test-Time Compute

Post Details

Company

HuggingFace

Date Published

May 11, 2026

Author

Tommaso Cerruti

Word Count

2,521

Language

-

Hacker News Points

-

Source URL

huggingface.co/blog/Cerru02/safety-evals-should-project-ttc

Summary

Safety evaluations of AI models should consider the potential impact of test-time compute, as a model that seems safe under limited evaluation conditions may become unsafe when adversaries apply larger, adaptive, and economically rational computational resources. The conventional approach of assessing whether a model can perform dangerous actions is inadequate for modern AI systems, where adversaries can employ extensive inference-time efforts like generating numerous prompt variants, using other models to improve attacks, or employing adaptive compute allocation. This shift emphasizes the need for evaluations that factor in the broader risk surface, which includes the model's behavior under varying budgets, attacker strategies, and deployment configurations. The economic rationale for adversaries to invest significant resources in attacks further complicates this landscape, as the potential payoff can justify high expenditure. Static safety checks remain useful but are insufficient for systems capable of longer reasoning, adaptive search, and tool use. As a result, safety evaluations should incorporate test-time compute into the threat model, providing risk assessments that account for different levels of adversarial effort and labeling safety claims with the applicable conditions.