Company
Date Published
Author
Mason Hooten
Word count
191
Language
English
Hacker News points
None

Summary

At Strata+Hadoop World, James Burkhart, technical lead on real-time data infrastructure at Uber, shared insights into how Uber supports millions of analytical queries daily across real-time data using Apollo, its internal analytics querying language. He discussed architectural decisions and lessons learned from building an exactly-once ingest pipeline that captures raw events in both in-memory row storage and on-disk columnar storage. Additionally, James covered the use of a custom metalanguage and query layer by leveraging partial OLAP result set caching and query canonicalization to provide subsecond p95 latency analytical queries spanning hundreds of millions of recent events. As technical lead at Uber, James has expertise in time series data storage, processing, and retrieval, having previously worked on Blueflood, a time series database on top of Cassandra. Key insights from his talk are available online through various resources, including the Uber Engineering Blog, Uber Open Source, and other social media channels.