Company
Date Published
Author
Victoria Xia, Martin Kleppmann, Wade Waldron
Word count
4893
Language
English
Hacker News points
None

Summary

The talk describes open source tools that enable search on streams: Luwak is a Lucene-based library for running many thousands of queries over a single document, with optimizations that make this process efficient. Samza is a stream processing framework based on Kafka, allowing real-time computations to be distributed across a cluster of machines. The speaker discusses how these tools can be combined into an efficient and scalable streaming search engine, and how they can be used to build full-text search engines like those found in Twitter and Google Alerts, with optimizations such as indexing queries and partitioning streams to enable scaling and performance.