Home / Companies / Statsig / Blog / Post Details
Content Deep Dive

Simulating Bigtable in BigQuery with Type 2 SCD modeling

Blog post from Statsig

Post Details
Company
Date Published
Author
Pablo Beltran
Word Count
1,704
Language
English
Hacker News Points
-
Summary

Statsig faced the challenge of managing high-throughput, schema-less data updates while making this data queryable at scale, prompting them to create a solution that leverages Google Cloud's Bigtable and BigQuery. They addressed the problem by replicating Bigtable updates into a Type 2 Slowly Changing Dimension (SCD) model in BigQuery, enabling schema-less read/write operations with low latency and supporting large analytical queries. The solution involves using a User Store Service to ingest data into Bigtable, enabling Change Streams to capture updates, and employing a Dataflow to stream changes to BigQuery, where a scheduled MERGE statement materializes the data into a queryable SCD Type 2 table. By integrating Bigtable's speed and schema flexibility with BigQuery's analytical capabilities, Statsig achieved a unified view of current and historical data that supports real-time analytics, manages costs with fine-grained DML, and allows customers to observe user behavior changes over time efficiently.