Cohere is releasing the weights of a new model to enhance accessibility for the AI research community, focusing on a range of capabilities such as conversational tasks, data analysis, and numerical manipulation in financial contexts. Their evaluation methodology has been improved using a PoLL judge ensemble, which increases agreement with human annotators. Performance is tested on benchmarks like ChatRAGBench and BFCL-v3, reporting scores based on tool use in real-world scenarios and the model's ability to avoid unnecessary tool calls. LangChain REACT agents demonstrate capabilities in breaking down complex questions and formulating research plans, evaluated using tools like Bamboogle and StrategyQA. The ToolTalk challenge further tests models' complex reasoning and user interaction abilities, but requires function-calling APIs not available in some models like Gemma 2 9B.