Reinforcement learning (RL) algorithms are often brittle to hyperparameter choices, making automation of hyperparameter discovery essential. To address this, researchers have developed Population-Based Bandits (PB2), a new method for training neural networks that uses a probabilistic model to guide the search in an efficient way. PB2 is based on ideas from GP-bandit optimization literature and combines the benefits of Population-Based Training (PBT) and Bayesian Optimization. It leverages a Gaussian Process model to adapt its model hyperparameters during training, allowing it to efficiently find high-performing hyperparameter configurations with fewer agents than required by prior work. PB2 has been open-sourced in Ray Tune v1.0.1, enabling users to try it out themselves and explore its potential for efficient hyperparameter tuning in reinforcement learning tasks.