When sleeping in saves you money: dynamic data snoozing for efficient online RL
Blog post from AI21 Labs
Dynamic difficulty snoozing is introduced as an effective method to improve compute efficiency in reinforcement learning (RL) training by addressing inefficiencies associated with dynamic sampling, which often results in training slowdowns due to overly-easy or overly-hard examples. By temporarily filtering out examples that are too easy, snoozing maintains training stability and efficiency without compromising the quality of the results. While dynamic filtering can cause imbalances and risks like task starvation in multi-task settings, dynamic snoozing offers a balanced approach by periodically reintroducing these examples to ensure a well-rounded training dataset. This technique, along with methods like probabilistic snoozing and dynamic example weighting, demonstrates a substantial potential for enhancing data efficiency, reducing compute waste, and maintaining training balance, particularly in complex multi-task learning scenarios.