Online RL for Cursor Tab

Company

Cursor

Date Published

Sept. 11, 2025

Author

Jacob

Word count

1357

Language

English

Hacker News points

None

URL

cursor.com/en/blog/tab-rl

Summary

The blog post discusses the development and implementation of a new Tab model by Cursor, designed to enhance developer productivity by predicting user actions in a code editor. This model, which now boasts a 28% higher accept rate while making 21% fewer suggestions, uses online reinforcement learning to refine its predictions based on real-time user feedback, a method that contrasts with the static dataset training of other large language model providers. By frequently deploying new models and collecting user interaction data, Cursor optimizes the model's "policy" to increase the likelihood of accepted suggestions, utilizing policy gradient methods to reinforce beneficial actions. This approach involves assigning rewards to accepted or rejected suggestions to train the model on when to make predictions, aiming for an acceptance probability of at least 25%. The infrastructure developed allows for rapid deployment and data collection, taking between 1.5 to 2 hours to roll out updates, enabling continuous improvement of the model. This iterative process has resulted in a model that significantly reduces unnecessary suggestions while enhancing the accuracy and usefulness of its predictions, thereby improving the coding experience for users.