Content Deep Dive
Optimizing Multi-Modal Analysis by Lazy Loading Dataframes
Blog post from Hex
Post Details
Company
Date Published
Author
Dylan Scott
Word Count
1,661
Language
English
Hacker News Points
-
Source URL
Summary
Hex has increased execution speeds up to 10x by migrating from pandas Dataframes to a DuckDB-based architecture that directly queries Arrow data stored remotely in S3, instead of materializing dataframes into local memory. This new architecture uses lazy loading for dataframes and has seen improvements to project runtimes in the ballpark of 5-10x speedups, with some internal projects going from 30+ second runtimes to just a handful of seconds. The performance gains are most pronounced in projects that primarily use SQL and no-code cells, while projects that include a lot of Python references will see less dramatic improvements.