Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

OlmoLogic: Boosting Reasoning via RLVR with Inductive Logic Programming

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Lukas Helff, Sebastian Sztwiertnia, Felix Friedrich, Hikaru Shindo, and Ahmad Omar
Word Count
2,702
Company Posts That Month
90
Language
-
Hacker News Points
-
Summary

OlmoLogic is a novel model designed to enhance reasoning capabilities by integrating Inductive Logic Programming (ILP) into the Olmo-3 Reinforcement Learning via Verifiers (RLVR) framework, focusing on logical reasoning, which is often neglected in favor of math and code. The model was trained intensively on 56 H100 GPUs over six days, aiming to improve its logical reasoning skills through tasks from the Scalable Logical Reasoning (SLR) suite, which includes a diverse set of 19,000 tasks that vary in complexity. OlmoLogic achieved significant improvements, tripling the accuracy on the SLR-Bench and showing gains across various logic benchmarks while maintaining performance in math, code, and instruction-following tasks. The training incorporated a Prolog interpreter to execute the logic programs proposed by the model, providing direct feedback used as RLVR rewards, and introduced a reward structure that emphasizes rule simplicity and correctness. The development also included Olmo 3.1 7B Think, a variant trained without SLR tasks for comparison, highlighting the impact of SLR on logical reasoning. Overall, OlmoLogic represents a significant step in integrating logical reasoning into AI models, providing a robust framework for reasoning tasks without altering the underlying training infrastructure.

Trends Found in this Post

No tracked trend matches for this post yet.