| HN Points | HN Title (Links to original post) | Submitted Date |
|---|---|---|
| 217 | Using reinforcement learning and $4.80 of GPU time to find the best HN post | 2024-10-28 |
| 199 | Using GRPO to Beat o1, o3-mini and R1 at “Temporal Clue” | 2025-03-06 |
| 81 | Show HN: RULER – Easily apply RL to any agent | 2025-07-11 |