The Rage Clicks of LLM apps: High-Signal Production Monitoring for AI Support Agents
Blog post from Langfuse
Annabell Schäfer's article discusses the challenges of monitoring Large Language Model (LLM)-powered applications, particularly in detecting non-binary, subtle signals of user dissatisfaction that don't manifest as clear-cut errors or exceptions. Traditional indicators such as rage clicks are easily identified in conventional apps, but LLM apps require more nuanced event detectors. The article highlights three key events worth detecting in customer support scenarios: user disagreement with an assistant's response, requests that fall outside the agent's defined scope, and user feedback on product features. Using profanity as a signal of dissatisfaction, Schäfer mentions Boris Cherny's approach of tracking "fucks per conversation" as a high-signal indicator. The article illustrates how template evaluators in Langfuse can identify these events, emphasizing the importance of binary, actionable, and narrow detectors. It concludes by suggesting that event detection—particularly focusing on user disagreement—provides valuable production monitoring insights that go beyond average quality scores, potentially revealing documentation gaps and guiding system prompt improvements.