Home / Companies / Anyscale / Blog / Post Details
Content Deep Dive

Biggest takeaways from our RL tutorial: Long-term rewards, offline RL, and more

Blog post from Anyscale

Post Details
Company
Date Published
Author
Christy Bergman
Word Count
1,114
Language
English
Hacker News Points
-
Summary

The Production RL Summit, hosted by Anyscale, concluded with a hands-on tutorial on Ray RLlib for building recommender systems. The tutorial was led by Sven Mika, the lead maintainer of RLlib, and covered topics such as reinforcement learning (RL), contextual bandits, and deep RL algorithms. Participants built their own Slate Recommender System using the RecSim environment and contextual bandit algorithms, including Thompson sampling and Linear UCB. The tutorial also explored offline RL algorithms, including behavioral cloning, CQL, MARWIL, and DQN. Additionally, participants learned how to deploy RL models using Ray Serve, a framework for serving trained machine learning models in production. The event concluded with a presentation at the ODSC-East conference in Boston, where attendees could engage with the Ray RLlib team on forums, Slack, and GitHub.