DukaanBench: Can AI Run an Indian Grocery Store for 30 Days?
Blog post from HuggingFace
DukaanBench is an innovative AI benchmark that challenges language models to operate a simulated Indian kirana store for 30 days, assessing their ability to manage inventory, cash, customer trust, and marketing strategies. Each day, the model receives a comprehensive state of the shop and must return an executable JSON action to guide store operations, with the backend simulating customer interactions and updating variables like trust and inventory. The benchmark aims to evaluate not just profit-making capabilities but also the model's ability to maintain operational stability and customer relationships, with metrics including service rate, trust, and marketing effectiveness. The initial findings highlight the importance of aligning action with rationale, managing trust, and ensuring inventory awareness in marketing efforts. Part 1 introduces the environment and evaluation loop, while Part 2 will explore training a smaller, more specialized model to improve on these tasks, offering potential as a practical tool for shopkeepers rather than replacing them.
No tracked trend matches for this post yet.