The rise of AI agents has led to a proliferation of open-source models that support tool calling, a method enabling large language models (LLMs) to interact with external applications for more dynamic and efficient performance. This development has highlighted the varying quality of tool calling among inference providers, with benchmarks becoming crucial in evaluating their success. Tool calling enhances model efficiency by allowing external task handling, thereby extending the model's relevance without frequent retraining. It involves single-turn and multi-turn interactions, with the latter introducing complexities that can affect output quality. Inference providers play a pivotal role in pre-processing, model execution, and post-processing to ensure successful tool calling, with techniques such as structured outputs, quantization, and parsing being vital. Baseten emerges as a notable platform in this space, emphasizing reliability and performance in tool calling through comprehensive pre-processing, model execution, and post-processing strategies, as demonstrated by their success in recent benchmarks.