The simplest way to count 100 billion unique IDs: Part 1
Blog post from Tinybird
In exploring efficient methods for counting post views and unique viewers, the text contrasts Reddit's 2017 complex system using Kafka, Redis, and Cassandra with a simpler solution implemented using Tinybird. Reddit's approach involved multiple components, including HyperLogLog for space-efficient approximations, while the author proposes a streamlined method that stores raw events in a single location, counts uniques with a single SQL query, and delivers real-time results with high efficiency. This alternative approach is demonstrated through a Tinybird project, which provides a REST API for counting unique post viewers and emphasizes ease of deployment and scalability, evidenced by tests handling large volumes of data with minimal latency. The author acknowledges potential trade-offs, such as increased raw data storage and the necessity for query optimization at extreme scales but highlights the absence of complex data pipelines and distributed system challenges. The text invites readers to try the solution themselves and teases further discussion on scaling to even larger data volumes in future posts.