Home / Companies / Hex / Blog / Post Details
Content Deep Dive

Optimizing Multi-Modal Analysis by Lazy Loading Dataframes

Blog post from Hex

Post Details
Company
Hex
Date Published
Author
Dylan Scott
Word Count
1,661
Language
English
Hacker News Points
-
Summary

Hex has increased execution speeds up to 10x by migrating from pandas Dataframes to a DuckDB-based architecture that directly queries Arrow data stored remotely in S3, instead of materializing dataframes into local memory. This new architecture uses lazy loading for dataframes and has seen improvements to project runtimes in the ballpark of 5-10x speedups, with some internal projects going from 30+ second runtimes to just a handful of seconds. The performance gains are most pronounced in projects that primarily use SQL and no-code cells, while projects that include a lot of Python references will see less dramatic improvements.