Home / Companies / Hex / Blog / Post Details
Content Deep Dive

Optimizing Multi-Modal Analysis by Lazy Loading Dataframes

Blog post from Hex

Post Details
Company
Hex
Date Published
Author
Dylan Scott
Word Count
1,661
Company Posts That Month
4
Language
English
Hacker News Points
-
Summary

Hex has increased execution speeds up to 10x by migrating from pandas Dataframes to a DuckDB-based architecture that directly queries Arrow data stored remotely in S3, instead of materializing dataframes into local memory. This new architecture uses lazy loading for dataframes and has seen improvements to project runtimes in the ballpark of 5-10x speedups, with some internal projects going from 30+ second runtimes to just a handful of seconds. The performance gains are most pronounced in projects that primarily use SQL and no-code cells, while projects that include a lot of Python references will see less dramatic improvements.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
Real-time 2 3,932 887 192 +47%