Simulating Bigtable in BigQuery with Type 2 SCD modeling
Blog post from Statsig
Statsig faced the challenge of managing high-throughput, schema-less data updates while making this data queryable at scale, prompting them to create a solution that leverages Google Cloud's Bigtable and BigQuery. They addressed the problem by replicating Bigtable updates into a Type 2 Slowly Changing Dimension (SCD) model in BigQuery, enabling schema-less read/write operations with low latency and supporting large analytical queries. The solution involves using a User Store Service to ingest data into Bigtable, enabling Change Streams to capture updates, and employing a Dataflow to stream changes to BigQuery, where a scheduled MERGE statement materializes the data into a queryable SCD Type 2 table. By integrating Bigtable's speed and schema flexibility with BigQuery's analytical capabilities, Statsig achieved a unified view of current and historical data that supports real-time analytics, manages costs with fine-grained DML, and allows customers to observe user behavior changes over time efficiently.