How to Make a Pivot Table From a Billion Rows Of Data
Blog post from Sigma
Sigma offers a comprehensive feature that allows users to export data from workbooks into various formats such as CSV, Excel, and Google Sheets, with the capability to automate these exports based on specific conditions. The backend service, written in Rust, efficiently handles large data volumes using the Apache Arrow memory format and transforms query results from cloud data warehouses into diverse output formats. An essential aspect of this process is the transformation and export of pivot tables, which aggregate data across different dimensions and measures. The pivot table's layout is determined by various parameters, with computations primarily executed as SQL queries in the cloud data warehouse. Once the grouped and aggregated data is obtained, it is structured into a pivot table layout using a pivot index, which efficiently organizes data for export. The rendering of the pivot table in the requested output format involves the use of pivot serializers, employing the Visitor design pattern to facilitate format-specific implementations for CSV, Excel, and Google Sheets. The entire process is optimized for performance, employing Rust's asynchronous Tokio runtime to handle concurrent requests and using separate thread pools to prevent blocking other tasks.