Getting More from Your Test-Time Compute Budget with Portfolio Beam Search
Blog post from HuggingFace
Portfolio Beam Search (PBS) is an innovative test-time method that applies financial portfolio theory to large language model (LLM) inference, aiming to optimize compute budget by diversifying candidate solutions similar to financial assets in a portfolio. Unlike traditional methods that greedily select the highest-scoring paths, PBS evaluates candidate solutions based on their risk-adjusted potential, thereby avoiding reasoning ruts and enhancing accuracy and reasoning robustness. This method represents a shift in AI scaling from pretraining reliance to test-time compute scaling, where the processing time during the inference phase is strategically increased to address complex tasks. By framing decoding as an optimization problem, PBS balances expected output quality against model uncertainty and semantic diversity, enhancing exploration and exploitation. Evaluations on the MATH-500 benchmark have demonstrated that PBS significantly improves sample efficiency and compute budget utilization, allowing smaller models to achieve accuracy levels comparable to much larger architectures. This advancement opens new possibilities for scaling test-time compute in diverse domains, with ongoing research exploring the limits of this approach.