DeepMath: A lightweight math reasoning Agent with SmolAgents
Blog post from HuggingFace
DeepMath is a math reasoning agent developed by the Intel AI Software Group, designed to enhance the accuracy and efficiency of mathematical problem-solving in large language models (LLMs). Built on the Qwen3-4B Thinking model and fine-tuned with Group Relative Policy Optimization (GRPO), DeepMath reduces output length by up to 66% while often improving accuracy by emitting concise Python code snippets for intermediate steps, executed in a secure sandbox. The model's training focuses on offloading deterministic computation and encouraging concise, computation-driven reasoning, with GRPO rewarding correctness and brevity. Evaluated on datasets like MATH500, AIME, HMMT, and HLE, DeepMath demonstrates the benefits of combining a small executor with LLMs, offering a more interpretable and accurate math-solving agent without the need for massive models or extensive external tools.