Company
Date Published
Author
Paul Dix
Word count
1565
Language
English
Hacker News points
None

Summary

The article discusses the design of a Parquet Catalog for InfluxDB IOx, a new in-memory columnar database that uses object storage for persistence. It explains why existing catalog standards like Apache Hive, Delta Lake, and Apache Iceberg were not suitable for their needs and how they decided to implement their own design. The catalog is focused on tracking what exists in object storage and efficiently keeping track of schema and statistics information for the Parquet files that InfluxDB IOx writes to object storage. It also supports soft deletes, allowing users to delete data but have it still be around for some period if needed. The design borrows many concepts from these three projects and uses the Parquet metadata format in Apache Thrift to keep information about metadata and statistics.