Company
Date Published
Author
Victoria Xia, Peilin Yang, Wade Waldron
Word count
1630
Language
English
Hacker News points
None

Summary

Twitter, a leading social media platform, has revamped its recommendation systems by implementing a new streaming data logging pipeline for its home timeline prediction system, utilizing Apache Kafka® and Kafka Streams, in order to handle billions of tweets daily. This upgrade, which replaces an older offline batch system, significantly reduces pipeline latency from seven days to one day, improving model quality and engineering efficiency. Central to this system is a customized left-join functionality in Kafka Streams that efficiently matches features and labels in machine learning models, allowing Twitter to maintain up-to-date models that adapt to changing user behaviors and trends. The blog post details this customization process, highlighting the unique challenges and solutions, such as handling consumer lag and ensuring data quality, while also acknowledging the contributions of numerous team members and the potential future enhancement of cooperative rebalancing to further bolster the pipeline's performance.