This is an investigation into using Group Relative Policy Optimization (GRPO) to train smaller, open-weight language models on complex deduction tasks. The authors achieved impressive performance gains by training Qwen 14B and 32B models on challenging Temporal Clue puzzles, bringing open-weight models to the cutting edge of reasoning performance at significantly reduced costs. By leveraging reinforcement learning and carefully selecting hyperparameters, they demonstrated that smaller, open-weight models can be trained to frontier-level accuracy, improving the cost-accuracy trade-off in logical deduction tasks.