Home / Companies / Galileo / Blog / Post Details
Content Deep Dive

Fix AI like a professional eval engineer.

Blog post from Galileo

Post Details
Company
Date Published
Author
Pratik Bhavsar
Word Count
3,611
Language
English
Hacker News Points
-
Summary

Eval Engineer is a skill bundle designed for Claude Code and OpenAI Codex that aims to streamline the process of diagnosing and fixing issues in coding agents by integrating with Galileo, a system that provides valuable evidence from production environments. By examining logs, metrics, and traces, Eval Engineer can identify the root cause of a problem, propose a bounded fix, and create a verification plan to ensure that the solution is effective. It is not a replacement for existing systems but rather complements them by making the evaluation lifecycle part of the development workflow. This tool is particularly useful for AI engineers, researchers, field debugging engineers, and site reliability engineers, as it allows them to work within their existing environments while providing clear and verifiable artifacts for every diagnosis and fix plan. Eval Engineer is open-source and customizable, allowing teams to adapt it to their specific needs by configuring what evidence is relevant, which files can be edited, and what verification commands should be used. The tool emphasizes small, reviewable changes and encourages human oversight in product decisions, ensuring that automation does not replace expert judgment.