Agentic RL: Token-In, Token-Out Done Right

Post Details

Company

Hugging Face

Date Published

May 29, 2026

Author

Quentin Gallouédec and Kashif Rasul

Word Count

3,670

Company Posts That Month

55

Language

-

Hacker News Points

-

Post removed?

No

Source URL

huggingface.co/blog/huggingface/tito

Summary

The article explores the challenges and solutions associated with training large language models (LLMs) using reinforcement learning (RL), emphasizing the importance of maintaining the Token-In, Token-Out (TITO) invariant. It highlights the pitfalls of re-tokenizing model outputs, which can lead to unreliable gradient signals due to non-reversible tokenization processes. The recommended solution is to avoid re-encoding decoded tokens, using a buffer to keep track of the model's sampled tokens, thus maintaining structural integrity and preventing token drift. The article further discusses methods to ensure chat templates are prefix-preserving for tool messages, which is crucial for maintaining the consistency of the training loop. It contrasts two approaches: a lighter, more generic TITO loop and a heavier model-specific renderer, each with its advantages. The piece concludes by emphasizing the need to understand and verify the prefix-preservation property of chat templates for effective model training without re-implementing templating logic.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	1	9,074	1,640	224	+53%
Reinforcement learning	1	90	44	24	-13%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.