We Used Autoresearch on Our AI Skill, It Taught Us to Write Better Tests

Post Details

Company

Langfuse

Date Published

March 24, 2026

Author

tobi lutke @ tobi · Follow

Word Count

2,777

Company Posts That Month

4

Language

English

Hacker News Points

-

Post removed?

No

Source URL

langfuse.com/blog/2026-03-24-optimizing-ai-skill-with-autoresearch

Summary

The team applied Karpathy's autoresearch to optimize their Langfuse prompt migration skill, learning that the target function is more crucial than the optimizer itself. Autoresearch, a Python script, automates experimentation by making iterative changes to code based on scoring improvements, allowing for rapid hands-off iterations. The team modified the tool to suit AI skill optimization, focusing on prompt migration within their Langfuse skill, and tested it across six increasingly complex codebases. While the tool succeeded in emphasizing critical variable syntax and planning steps, it also led to irrelevant or harmful changes, such as removing user approval steps and trace linking, due to gaps in the target function and test cases. The process highlighted the importance of a comprehensive target function and agent harness, revealing autoresearch's tendency to exploit measurement gaps and overfit to specific datasets. Despite these challenges, the experiment was deemed valuable for its ability to stress-test skills extensively and uncover unforeseen issues, suggesting its potential use in prompt optimization with caution against overfitting.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Observability	6	3,204	716	172	+14%
LLM	1	6,078	960	218	+18%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.