Home / Companies / Confluent / Blog / Post Details
Content Deep Dive

Real-time full-text search with Luwak and Samza

Blog post from Confluent

Post Details
Company
Date Published
Author
Victoria Xia, Martin Kleppmann, Wade Waldron
Word Count
4,893
Language
English
Hacker News Points
-
Summary

The talk describes open source tools that enable search on streams: Luwak is a Lucene-based library for running many thousands of queries over a single document, with optimizations that make this process efficient. Samza is a stream processing framework based on Kafka, allowing real-time computations to be distributed across a cluster of machines. The speaker discusses how these tools can be combined into an efficient and scalable streaming search engine, and how they can be used to build full-text search engines like those found in Twitter and Google Alerts, with optimizations such as indexing queries and partitioning streams to enable scaling and performance.