Home / Companies / ClickHouse / Blog / Post Details
Content Deep Dive

Exploring massive, real-world data sets: 100+ Years of Weather Records in ClickHouse

Blog post from ClickHouse

Post Details
Company
Date Published
Author
Dale McDiarmid & Tom Schreiber
Word Count
3,614
Language
English
Hacker News Points
-
Summary

The journey of loading a real-world dataset into ClickHouse involves sampling, preparing, enriching, and optimizing the schema for specific queries. The NOAA Global Historical Climatology Network dataset was used, which contains 1 billion rows of climate data from 1900 to 2022. The dataset was downloaded in compressed format, filtered for relevant measurements, and then loaded into a ClickHouse instance. The data was enriched with additional information such as country names, latitudes, and longitudes using the `clickhouse-local` tool. A dictionary-based query system was implemented to efficiently search for weather events within specific geographical regions, reducing query execution time by orders of magnitude.