Home / Companies / Together AI / Blog / Post Details
Content Deep Dive

Back to The Future: Evaluating AI Agents on Predicting Future Events

Blog post from Together AI

Post Details
Company
Date Published
Author
Federico Bianchi, Junlin Wang, Zain Hasan, Shang Zhu, Roy Yuan, Clémentine Fourrier, James Zou
Word Count
1,867
Language
English
Hacker News Points
-
Summary

FutureBench is a proposed benchmarking framework aimed at evaluating artificial intelligence models based on their ability to predict future events, rather than just relying on past information or static datasets. This approach emphasizes the importance of sophisticated reasoning, synthesis, and genuine understanding, as opposed to mere pattern matching, by using real-world prediction markets and live news to generate meaningful prediction tasks across various domains like science, economics, and geopolitics. By focusing on forecasting, FutureBench addresses challenges of data contamination common in traditional benchmarks and creates a more objective, verifiable measure of model performance. The framework operates on three levels—comparing agentic frameworks, tool performance, and model capabilities—allowing for a comprehensive analysis of how models gather and synthesize information to make predictions. Initial results have demonstrated varying strategies and reasoning patterns among different models, revealing insights into their information-gathering behaviors and decision-making processes. The benchmark is seen as a dynamic tool, evolving with community feedback to refine its question sourcing and experimental methods, although it faces challenges like high costs due to the extensive use of input tokens.