Finding the UFC GOAT: A Full Stack Pipeline with dlt, dbt and Metabase
Blog post from dltHub
Reshef Sharvit, a Principal Engineer at Skyhawk Security, explores the complex question of identifying the greatest UFC fighter of all time by using a full-stack data pipeline comprising dlt, dbt, and Metabase. He argues that determining the UFC GOAT requires more than just surface-level statistics like win-loss records, suggesting a need for deeper analysis using a set of 15 metrics. To achieve this, he utilizes data scraped from ufcstats.com and Wikipedia, which is then loaded into a PostgreSQL database using dlt for schema inference and incremental loading. dbt is employed to transform the raw data into analysis-ready views with SQL-first, version-controlled transformations and built-in testing for data integrity. Finally, Metabase is used to make the data accessible to non-technical users through intuitive visualizations. Sharvit finds this combination of tools effective and likens PostgreSQL's enduring utility to Jon Jones' lasting prominence in the UFC, despite controversies and challenges both have faced.