Home / Companies / LabelBox / Blog / Post Details
Content Deep Dive

Streamlining large-scale data exports: The evolution of Labelbox’s Export System to a streamable architecture with fairness in place

Blog post from LabelBox

Post Details
Company
Date Published
Author
Labelbox
Word Count
625
Language
-
Hacker News Points
-
Summary

Labelbox has transformed its data export system to a streamable architecture to improve efficiency and address challenges faced by users. Initially, the export system encountered issues such as a time-consuming chunk combining phase and stateful communications, which led to stale exports. To overcome these, Labelbox introduced a streamable export system using an iterator approach, allowing for the processing of large datasets without loading them entirely into memory. To ensure a smooth transition, the streamable backend was back-ported to existing methods, with feature flags via LaunchDarkly managing the rollout. Performance issues were further addressed by implementing a sophisticated queue management system with BullMQ, introducing a round-robin system for fair processing across organizations, and employing horizontal scaling to enhance throughput. This evolution has significantly improved data export processes, ensuring users can handle large-scale exports more effectively while maintaining a commitment to ongoing innovation and system refinement.