From 48 Seconds to 130 Milliseconds: Vector Search in Tinybird
Blog post from Tinybird
A customer in the audience intelligence space faced the challenge of managing a vast and rapidly growing dataset of social media posts, which required efficient semantic similarity searches to find posts most relevant to user queries. Initially reliant on AWS S3 Vectors, their solution proved insufficient due to limitations on search results, prompting them to explore alternatives. The team at Tinybird demonstrated that, by optimizing their data infrastructure using ClickHouse, they could overcome previous limitations. They achieved this by consolidating fragmented data partitions into a single HNSW graph and increasing the index cache size to fit their entire dataset into RAM, drastically reducing query response times from several minutes to milliseconds. This configuration allowed for efficient retrieval of up to 1,000 results with consistent latency, transforming their data retrieval process and eliminating the need for an additional dedicated vector-search service. The successful implementation provides a streamlined approach to querying semantic similarity in Tinybird, though ongoing challenges remain in maintaining performance during continuous data ingestion, which will be addressed in subsequent efforts.