Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

ShopRLVE-GYM: Adaptive Verifiable Environments for E-Commerce Conversational Agents

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Rahul Bajaj and Jaya Nupur
Word Count
4,976
Language
-
Hacker News Points
-
Summary

ShopRLVE-GYM expands on the RLVE framework by introducing eight multi-turn, tool-augmented environments specifically designed for e-commerce conversational agents to enhance real-world task completion. Each environment, including product discovery, cart building, and order tracking, comes with procedural problem generation and a 12-axis difficulty curriculum, allowing adaptive difficulty scaling based on agent capabilities. Through the use of a Qwen 3 1.7B model trained with Dynamic Sampling Policy Optimization (DAPO), early results indicate promising scalability and adaptability for e-commerce tasks. The framework addresses the challenge of constructing algorithmically verifiable reward functions, ensuring that agents optimize for task outcomes rather than merely imitating demonstrations. By integrating persona-driven user simulations and a composite reward system, ShopRLVE-GYM provides a robust testbed for training large language models (LLMs) in complex, real-world e-commerce contexts, bridging the gap identified in prior RLVE research.