Home / Companies / Mixpanel / Blog / Post Details
Content Deep Dive

How Mixpanel now runs 50% faster than before

Blog post from Mixpanel

Post Details
Company
Date Published
Author
-
Word Count
976
Language
English
Hacker News Points
-
Summary

Mixpanel has faced challenges in scaling its infrastructure to accommodate the growing data analysis needs of its customers, which include over 26,000 businesses and 9 trillion data points per year. To ensure quick and cost-effective insights, the company developed a distributed, column-oriented database called Arb. The scalability of Mixpanel's infrastructure has been a competitive advantage, and maintaining this edge is crucial. A unique aspect of their data workload is the flexibility of queries and data, allowing customers to perform complex real-time queries with mixed-type data schemas. Performance bottlenecks were identified in the query engine, particularly with filter performance, prompting Mixpanel to implement optimizations like vectorization and predicate pushdown, inspired by other high-scale analytics databases. By modifying its query engine to process batches of events, Mixpanel achieved a 2x improvement in query throughput for large queries. Despite the challenge of variable data types, Mixpanel inferred the required data types from queries to enable vectorization, converting mixed-type properties into uniformly typed predicates to improve performance. These enhancements have resulted in faster query latencies and better pricing models for customers, highlighting Mixpanel's commitment to delivering reliable and efficient data analysis experiences.