Your agent would rather write code
Blog post from Pydantic
In an exploration of optimizing AI observability with Logfire, the developers initially created over 40 meticulously designed MCP tools to handle various tasks, such as SQL queries and managing dashboards, only to discover that a single exec tool allowing the execution of Python code was more efficient. This shift was inspired by the realization that large language models (LLMs) excel at writing code rather than selecting from extensive tool menus, as demonstrated by Cloudflare's similar transition to fewer tools. The exec tool, powered by a Python interpreter called Monty, simplifies complex tasks into single scripts that execute server-side, reducing API calls, execution time, and token usage. While this approach significantly improves efficiency, challenges remain, such as "vibe coding" errors by models and the ongoing development of Monty to support more Python features. The team emphasizes security by tightly controlling Monty's external interactions and uses an evaluation framework to ensure the reliability of the exec tool in multi-step processes. As the MCP evolves with features like interactive UIs and human-in-the-loop capabilities, the focus shifts towards more expressive execution with fewer tools, recognizing that letting models write code aligns better with their capabilities.