SAP is collaborating with Cohere to integrate the Command A Reasoning model into its SAP Business Technology Platform, aiming to enhance generative AI capabilities for enterprise reasoning tasks. This integration will enable customers, partners, and developers to create innovative and secure applications tailored to specific environments. The model's performance is evaluated using various benchmarks like BFCL and Taubench, where SAP reports scores based on function calling settings and averages across multiple runs and languages. Additionally, SAP employs a hierarchical deep research agent that breaks down complex tasks into subproblems, using iterative refinement to ensure high-quality outputs. The evaluation process for these models incorporates satisfaction scores from annotators and an automatic correctness score using LlamaIndex, contributing to the overall assessment of response quality and accuracy.