Modern banking is evolving beyond traditional transactions to emphasize seamless, personalized customer experiences, with AI agents powered by large language models (LLMs) increasingly playing a pivotal role. These AI agents automate complex workflows, providing consistent, efficient customer support while navigating stringent regulatory environments. Several banks, including Wells Fargo, Bank of America, and the Commonwealth Bank, have successfully implemented AI-driven virtual assistants, enhancing user engagement and operational efficiency through billions of interactions. However, the deployment of AI in banking also presents challenges, such as regulatory compliance, handling multifaceted queries, and ensuring data security. The text discusses notable failures of AI support systems, highlighting the necessity of robust testing and human oversight. The Agent Leaderboard v2 assesses 17 LLMs across industries, using a synthetic dataset to simulate real-world banking scenarios, emphasizing the importance of action completion, tool selection quality, cost efficiency, and session duration in evaluating model performance. This comprehensive approach aims to guide banks in selecting the most suitable LLMs, balancing capability, cost, and compliance to meet evolving customer expectations and regulatory demands.