How we optimized LLM use for cost, quality, and safety to facilitate writing postmortems

Post Details

Company

Datadog

Date Published

Sept. 23, 2024

Author

Tran Le, Till Pieper, Gillian McGarvey

Word Count

2,833

Company Posts That Month

17

Language

English

Hacker News Points

2

Post removed?

No

Source URL

www.datadoghq.com/blog/engineering/llms-for-postmortems

Summary

The text discusses the implementation of a feature in Bits AI that uses large language models (LLMs) to facilitate the writing of postmortems after incidents, aiming to retain engineers' control and enhance learning while documenting incident details. This approach integrates structured metadata from Datadog’s Incident Management app and unstructured discussions from Slack to generate draft postmortems, allowing human authors to refine them. Challenges such as non-determinism, hallucinations, and the need for nuanced evaluation are highlighted, along with the requirement of a new skill set combining software engineering, product management, data science, and technical writing. The project explored various model alternatives for cost, speed, and quality trade-offs, while ensuring trust and privacy by scrubbing sensitive data and providing transparency. Experimentation and feedback loops were crucial for refining LLM outputs, which were generally effective in handling incidents of mid to lower severities. The authors propose future enhancements, including more customization and data integration for improved incident context, and indicate potential uses for LLM-generated content, such as custom postmortems for clients.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	40	3,889	441	129	+7%
Secrets Management	2	1,277	102	52	+46%
AI Model Fine-tuning	1	628	146	67	-32%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.