Designing a Parquet Catalog for InfluxDB IOx

Post Details

Company

InfluxData

Date Published

May 21, 2021

Author

Paul Dix

Word Count

1,565

Company Posts That Month

9

Language

English

Hacker News Points

-

Post removed?

No

Source URL

www.influxdata.com/blog/designing-a-parquet-catalog-for-influxdb-iox

Summary

The article discusses the design of a Parquet Catalog for InfluxDB IOx, a new in-memory columnar database that uses object storage for persistence. It explains why existing catalog standards like Apache Hive, Delta Lake, and Apache Iceberg were not suitable for their needs and how they decided to implement their own design. The catalog is focused on tracking what exists in object storage and efficiently keeping track of schema and statistics information for the Parquet files that InfluxDB IOx writes to object storage. It also supports soft deletes, allowing users to delete data but have it still be around for some period if needed. The design borrows many concepts from these three projects and uses the Parquet metadata format in Apache Thrift to keep information about metadata and statistics.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Observability	1	479	132	48	-10%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.