Daniel Botha discusses the limitations of traditional model evaluation methods and suggests the use of games as a more effective alternative for testing AI models. He argues that traditional benchmarks offer limited insights into a model's real-world performance, and proposes gamification as a way to provide a dynamic and engaging evaluation process. The article highlights Google's introduction of the Kaggle Game Arena, a platform for observing AI models in action through classic games, and emphasizes that games offer a clear and unambiguous measure of a model's capabilities in strategic reasoning, long-term planning, and adaptability. Botha cites AI Town, a project by a16z-infra, as an innovative approach to model evaluation, showcasing how AI characters interact within a simulated environment to reveal their strengths and weaknesses. The piece concludes by suggesting that such interactive environments offer valuable insights into a model’s personality and behavior, ultimately enhancing user experience design.