ShopRLVE-GYM: Adaptive Verifiable Environments for E-Commerce Conversational Agents
Blog post from HuggingFace
ShopRLVE-GYM expands on the RLVE framework by introducing eight multi-turn, tool-augmented environments specifically designed for e-commerce conversational agents to enhance real-world task completion. Each environment, including product discovery, cart building, and order tracking, comes with procedural problem generation and a 12-axis difficulty curriculum, allowing adaptive difficulty scaling based on agent capabilities. Through the use of a Qwen 3 1.7B model trained with Dynamic Sampling Policy Optimization (DAPO), early results indicate promising scalability and adaptability for e-commerce tasks. The framework addresses the challenge of constructing algorithmically verifiable reward functions, ensuring that agents optimize for task outcomes rather than merely imitating demonstrations. By integrating persona-driven user simulations and a composite reward system, ShopRLVE-GYM provides a robust testbed for training large language models (LLMs) in complex, real-world e-commerce contexts, bridging the gap identified in prior RLVE research.