Company
Date Published
Author
Jack Parker-Holder, Amog Kamsetty
Word count
1314
Language
English
Hacker News points
None

Summary

Reinforcement learning (RL) algorithms are often brittle to hyperparameter choices, making automation of hyperparameter discovery essential. To address this, researchers have developed Population-Based Bandits (PB2), a new method for training neural networks that uses a probabilistic model to guide the search in an efficient way. PB2 is based on ideas from GP-bandit optimization literature and combines the benefits of Population-Based Training (PBT) and Bayesian Optimization. It leverages a Gaussian Process model to adapt its model hyperparameters during training, allowing it to efficiently find high-performing hyperparameter configurations with fewer agents than required by prior work. PB2 has been open-sourced in Ray Tune v1.0.1, enabling users to try it out themselves and explore its potential for efficient hyperparameter tuning in reinforcement learning tasks.