Home / Companies / Starburst / Blog / Post Details
Content Deep Dive

How to use dbt and Trino with Iceberg for a change data capture on a data lake

Blog post from Starburst

Post Details
Company
Date Published
Author
Padraig O’Sullivan
Word Count
1,354
Language
English
Hacker News Points
-
Summary

The article explores how to utilize dbt and Trino with Iceberg for implementing change data capture (CDC) on a data lake, specifically using Amazon DMS data stored in CSV format on S3. The process involves creating an external table to read the data, followed by developing a model named stg_dms__products that employs dbt's incremental materialization to process only new CDC records. The article outlines the use of common table expressions (CTEs) for handling insert, update, and delete operations, and discusses strategies for implementing soft deletes and hard deletes. Key techniques include generating a MERGE statement for efficient data updates and applying incremental strategies to enhance performance. Additionally, it advises on configuring Iceberg table properties and using post_hooks for operations like expiring snapshots. The article provides practical examples and configurations for these processes, and the complete dbt project is available on a GitHub repository.