Home / Companies / Google Cloud / Blog / Post Details
Content Deep Dive

How the community trained Gemma to "Think" with Tunix and TPUs

Blog post from Google Cloud

Post Details
Company
Date Published
Author
Wei Wei, Weiren Yu, Tianshu Bao, Lance Wang, and Chris Achard
Word Count
1,141
Language
English
Hacker News Points
-
Summary

Large Language Models (LLMs) can benefit from structured reasoning training, as demonstrated in the Google Tunix Hack: Train a model to show its work hackathon on Kaggle. The event challenged developers to enhance non-reasoning models into general reasoning ones using limited computational resources, resulting in over 11,000 participants and 300 high-quality submissions. The winning techniques included innovative combinations of supervised learning, preference optimization, and reinforcement learning, such as the G-RaR (Rubric-Based Reinforcement Learning) which trains models to produce structured reasoning by using a rubric-based reward system. Other notable approaches included Pinocchio-1B and IDEA-E, which focused on structured reasoning through stages of fine-tuning and reinforcement learning, employing various reward systems to enhance logical deduction and prevent premature guessing. These efforts showcased that structured reasoning can significantly improve LLMs' performance across various domains, including medical, legal, and robotics, even with limited computational resources, democratizing the ability to train such models using publicly available recipes and tools like Tunix and Kaggle TPUs.