How We Used Evals (and an AI Agent) to Iteratively Improve an AI Newsletter Generator
Blog post from Arize
An AI-powered tool was developed to generate newsletters from recent tweets by using Claude, but initial results showed good writing with flawed details, such as incorrect URLs and missing links. To address these issues, a coding agent was employed to iteratively improve the process by running evaluations, fixing errors, and suggesting next steps autonomously. The improvement process revealed that data preprocessing, rather than prompt engineering, was more effective in enhancing output quality. Evaluators were used to measure dimensions such as faithfulness, structure adherence, and link accuracy, leading to enhancements in data handling and prompt instructions. However, human judgment was necessary to redefine what the evaluations should measure, leading to the introduction of a new evaluator focusing on content coverage rather than link completeness. This iterative approach highlighted the importance of accurate evaluations and human decision-making in guiding AI agents, demonstrating that while agents are efficient in optimizing tasks, humans are essential in setting objectives and ensuring meaningful outcomes. The entire project, including the code and evaluation suite, is available as open-source for further experimentation.