Coding agent tracing and evaluation: An open source tool to improve AI coding workflows
Blog post from Arize
Coding agent tracing and evaluation is an open-source tool designed to enhance AI coding workflows by providing detailed insights into the operations of coding agents such as Claude Code, Cursor, Codex, GitHub Copilot, and Gemini CLI. This tool allows developers to meticulously inspect each step of a coding agent's process, including file reads, tool calls, command executions, retries, and token usage, thereby facilitating a deeper understanding of agent behavior and workflow efficiency. By collecting and analyzing trace data, developers can identify ineffective workflows, build reusable skills, and determine which coding models and prompts yield the best results, ultimately leading to systematic improvements in AI coding practices. Traces can be sent to platforms like Arize AX or Phoenix for further inspection and evaluation, enabling developers to experiment with different configurations and track their impact over time. This approach encourages the development of shared practices and reusable skills across teams, integrating coding agents into the broader software development stack with a focus on observability, evaluation, and iterative improvement.