Reviewer Two (but it's an OpenEnv)
Blog post from HuggingFace
Reviewer Two is a reinforcement learning environment developed on Meta's OpenEnv framework, designed to emulate the role of a critical yet constructive peer reviewer in research settings. It aims to train AI agents, referred to as Purple Agents, to iteratively refine research plans through guided feedback, simulating the process of real-world academic collaboration. Unlike traditional benchmarks, the dynamic evaluation protocol of Reviewer Two, built on Berkeley's AgentBeats platform, involves multi-turn interactions where agents receive feedback and adapt their strategies based on vague hints derived from hidden rubric criteria. The innovative feature of multi-turn adaptively penalised disclosure guidance allows agents two initial attempts to submit plans without penalties, but subsequent attempts incur penalties for ignoring feedback or failing to efficiently incorporate guidance. The environment uses a combination of rubric coverage, length, and format scores to evaluate and incentivize agents to produce coherent, concise, and well-structured research plans. This approach is seen as a step towards developing AI agents capable of meaningful collaboration in research, emphasizing skills such as iterative refinement, feedback incorporation, and constraint-based problem solving.