Home / Companies / Fivetran / Blog / Post Details
Content Deep Dive

Slowly Changing Dimensions in Data Science

Blog post from Fivetran

Post Details
Company
Date Published
Author
Michael Kaminsky
Word Count
1,568
Language
English
Hacker News Points
-
Summary

In this blog post, Michael Kaminsky discusses a common pitfall in data science when using cloud data warehouses for machine learning tasks. The issue arises with untracked slowly changing dimensions, which can lead to poor real-world predictive accuracy of models. An example is provided where the goal is to predict customer churn in a subscription business. Two main pitfalls are identified: incorrect setup and the problem of slowly changing dimensions. To address these issues, Kaminsky suggests using a log format for data storage that tracks all changes made to different values in the database. Fivetran's "history mode" feature is mentioned as a tool to automatically record and capture changes to slowly changing dimensions, allowing for better model generalization from training to production.