Home / Companies / Pydantic / Blog / Post Details
Content Deep Dive

How curvy is your data: an investigation into Hilbert curve sorting

Blog post from Pydantic

Post Details
Company
Date Published
Author
-
Word Count
2,032
Language
English
Hacker News Points
-
Summary

Fusionfire, the internal database supporting Logfire, experimented with implementing Hilbert curve sorting to optimize query performance by preserving locality across multiple columns and improving row group pruning, departing from the traditional lexicographic sort. Despite the theoretical advantages of Hilbert curves, which map multi-dimensional data into a single sort key to maintain proximity across dimensions, the experiment revealed a regression in query performance and row group pruning for Fusionfire's data, which exhibits extreme cardinality skew. The lexicographic sort, which orders columns by increasing cardinality, proved more effective for this specific data distribution, concentrating benefits on columns with fewer unique values, thus achieving tighter min/max ranges and better compression. While Hilbert sorting has been beneficial in systems with comparable column cardinalities and diverse query patterns, such as Databricks and Apache Hudi, the conditions of Fusionfire's data—characterized by a natural partition key and skewed cardinality—favored the existing lexicographic approach. The experiment underscored that while Hilbert curves can offer significant improvements in certain contexts, they did not align with the needs of Fusionfire's workload, highlighting the importance of aligning sorting strategies with data characteristics.