Home / Companies / Confluent / Blog / Post Details
Content Deep Dive

Apache Kafka Lag Monitoring at AppsFlyer

Blog post from Confluent

Post Details
Company
Date Published
Author
Elad Leev
Word Count
2,455
Language
English
Hacker News Points
-
Summary

At AppsFlyer, a SaaS mobile marketing platform, visibility is crucial for monitoring its distributed systems, particularly for managing Apache Kafka, which is integral to its large-scale event-driven architecture. With Kafka facilitating the streaming of tens of billions of events daily, AppsFlyer recognized a gap in monitoring Kafka lag, which indicates how far behind a consumer is in processing data. Previously relying on a cumbersome Clojure service, AppsFlyer sought a more scalable, automated solution. After evaluating options like Kafka Lag Exporter and Remora, they opted for LinkedIn's Burrow due to its flexibility, modular design, and capability to monitor consumer lag effectively. Burrow's integration allows AppsFlyer to monitor clusters, visualize lag metrics, and develop time-based metrics to anticipate potential data loss due to retention issues. The team aims to enhance their system further by creating smart alerts and decoupling Burrow stacks to manage growing cluster traffic efficiently.