Concept: "Cloud" MergeTree Tables

Company

ClickHouse

Date Published

Nov. 23, 2018

Author

ClickHouse Editor

Word count

3501

Language

English

Hacker News points

None

URL

clickhouse.com/blog/concept-cloud-merge-tree-tables

Summary

The MergeTree cloud tables system is designed to provide data locality and scalability for large datasets, with features such as automatic sharding, replication, and rebalancing of data across servers in a cluster. The system uses a sharding key to distribute data across servers, allowing for efficient querying and retrieval of data. The cloud table creation process involves specifying the sharding key, which is used to create multiple virtual shards that are then mapped onto real servers. The number of virtual shards can be adjusted dynamically based on the amount of data in the table. The system also allows for parallel insertion of data into multiple shards, reducing the load on individual servers and improving overall performance. However, issues such as handling large amounts of data, maintaining data locality, and ensuring atomicity of inserts and updates need to be addressed through various techniques, including intermediate mappings, splitting each shard into smaller parts, or using a spreading factor to distribute data evenly across servers.