Back to The Future: Evaluating AI Agents on Predicting Future Events

Post Details

Company

Together AI

Date Published

July 17, 2025

Author

Federico Bianchi, Junlin Wang, Zain Hasan, Shang Zhu, Roy Yuan, Clémentine Fourrier, James Zou

Word Count

1,867

Language

English

Hacker News Points

-

Source URL

www.together.ai/blog/futurebench

Summary

FutureBench is a proposed benchmarking framework aimed at evaluating artificial intelligence models based on their ability to predict future events, rather than just relying on past information or static datasets. This approach emphasizes the importance of sophisticated reasoning, synthesis, and genuine understanding, as opposed to mere pattern matching, by using real-world prediction markets and live news to generate meaningful prediction tasks across various domains like science, economics, and geopolitics. By focusing on forecasting, FutureBench addresses challenges of data contamination common in traditional benchmarks and creates a more objective, verifiable measure of model performance. The framework operates on three levels—comparing agentic frameworks, tool performance, and model capabilities—allowing for a comprehensive analysis of how models gather and synthesize information to make predictions. Initial results have demonstrated varying strategies and reasoning patterns among different models, revealing insights into their information-gathering behaviors and decision-making processes. The benchmark is seen as a dynamic tool, evolving with community feedback to refine its question sourcing and experimental methods, although it faces challenges like high costs due to the extensive use of input tokens.