Population Based Bandits: Provably Efficient Online Hyperparameter Optimization

Company

Anyscale

Date Published

Nov. 16, 2020

Author

Jack Parker-Holder, Amog Kamsetty

Word count

1314

Language

English

Hacker News points

None

URL

www.anyscale.com/blog/population-based-bandits

Summary

Reinforcement learning (RL) algorithms are often brittle to hyperparameter choices, making automation of hyperparameter discovery essential. To address this, researchers have developed Population-Based Bandits (PB2), a new method for training neural networks that uses a probabilistic model to guide the search in an efficient way. PB2 is based on ideas from GP-bandit optimization literature and combines the benefits of Population-Based Training (PBT) and Bayesian Optimization. It leverages a Gaussian Process model to adapt its model hyperparameters during training, allowing it to efficiently find high-performing hyperparameter configurations with fewer agents than required by prior work. PB2 has been open-sourced in Ray Tune v1.0.1, enabling users to try it out themselves and explore its potential for efficient hyperparameter tuning in reinforcement learning tasks.