How Do I Generate Redshift DDL for Existing Tables? A Quick Guide for Data Engineers
Blog post from Snowplow
Managing event data at scale using Snowplow pipelines in Amazon Redshift often requires migrating or replicating data between clusters, necessitating the regeneration of CREATE TABLE DDL statements for tables not present in the target cluster. Amazon Redshift, although lacking a built-in SHOW CREATE TABLE feature like MySQL, provides a solution through the v_generate_tbl_ddl view available in the amazon-redshift-utils repository. This utility allows users to generate full DDL statements directly from Redshift by setting up the relevant admin view, offering comprehensive details such as columns, data types, and distribution keys. Particularly valuable for Snowplow users, this method aids in deploying across multiple clusters, migrating between environments, and maintaining DDLs for compliance and reproducibility. While it surpasses the capabilities of pg_table_def by providing a complete DDL, users must have sufficient privileges and handle permissions separately, as the utility does not output GRANT statements. This approach significantly enhances the portability and auditability of schema-rich data across Redshift environments.