TruthTensor: LLM Evalution in Prediction Markets Under Drift and Market Baseline
Blog post from HuggingFace
TruthTensor is a novel framework designed to evaluate large language models (LLMs) based on their ability to adhere to instructions in dynamic environments, particularly within prediction markets where conditions constantly shift. Unlike traditional evaluations that focus on static environments, TruthTensor assesses whether models maintain fidelity to their instructions or drift when faced with changing market conditions. Utilizing platforms like Polymarket, the framework tests models across various domains, such as politics and economics, by locking instructions and observing how models adapt their reasoning strategies to market fluctuations. It introduces a unique approach by triggering evaluations based on market price changes, ensuring a contamination-free environment, and comparing each model's performance against a human-finetuned baseline. This methodology highlights the importance of reasoning consistency rather than mere forecasting accuracy, offering insights into the models' internal belief adjustments and their capability to manage instruction adherence under drift.