Introducing support for UNLOAD in Starburst Galaxy
Blog post from Starburst
Starburst Galaxy has introduced the UNLOAD table function, designed to enhance data management by allowing seamless file writing without the need for table creation, thereby addressing common inefficiencies in data processing. This function offers flexibility in output formats, such as CSV and TEXTFILE, and includes options for compression and direct data feeds to downstream applications, which can streamline integration with machine learning models. The UNLOAD function, part of the system schema, requires specific access privileges to execute and allows output to storage options like AWS S3, Azure Data Lake Storage, and Google Cloud Storage. It supports parameters for input queries, file format, compression, and partitioning, though certain limitations remain, such as support only for VARCHAR columns in CSV format and constraints with Avro files. Currently in an experimental phase, this feature is positioned to optimize storage efficiency and performance, mitigating the traditional bottlenecks associated with JDBC in data processing pipelines.