Biggest takeaways from our RL tutorial: Long-term rewards, offline RL, and more

Post Details

Company

Anyscale

Date Published

April 6, 2022

Author

Christy Bergman

Word Count

1,114

Language

English

Hacker News Points

-

Source URL

www.anyscale.com/blog/biggest-takeaways-from-our-rl-tutorial-long-term-rewards-offline-rl-and-more

Summary

The Production RL Summit, hosted by Anyscale, concluded with a hands-on tutorial on Ray RLlib for building recommender systems. The tutorial was led by Sven Mika, the lead maintainer of RLlib, and covered topics such as reinforcement learning (RL), contextual bandits, and deep RL algorithms. Participants built their own Slate Recommender System using the RecSim environment and contextual bandit algorithms, including Thompson sampling and Linear UCB. The tutorial also explored offline RL algorithms, including behavioral cloning, CQL, MARWIL, and DQN. Additionally, participants learned how to deploy RL models using Ray Serve, a framework for serving trained machine learning models in production. The event concluded with a presentation at the ODSC-East conference in Boston, where attendees could engage with the Ray RLlib team on forums, Slack, and GitHub.