How we automatically handle ClickHouse ® schema migrations
Blog post from Tinybird
Tinybird's Forward deployments have evolved to enable efficient updates to data pipeline schemas without downtime by optimizing the migration of data. Initially, the algorithm required migrating all data, which was inefficient and time-consuming, especially for large data sources like a 14TB Kafka table. The initial approach was to migrate everything, creating auxiliary tables and using UNION views to ensure data availability during transitions. However, this proved cumbersome and didn't scale well. Recognizing this, Tinybird refined the process in three phases to dramatically reduce unnecessary data movement by focusing on smart migration triggers and isolating chain migrations to only the affected parts. The key was identifying the most upstream change in an ingestion chain, migrating from that point downstream, and employing cross-version bridging to maintain real-time data flow without migrating unchanged upstream tables. These optimizations significantly decreased deployment times for large workspaces, transforming the process from days to minutes. Looking forward, Tinybird aims to further enhance the deployment algorithm by avoiding downstream migrations when changes are purely additive and implementing TTL-based migration skipping for data sources with short time-to-lives, continuing to refine the process to only migrate absolutely necessary data.