Native random sampling in ClickHouse

Post Details

Company

ClickHouse

Date Published

May 22, 2026

Author

Setup #

Word Count

2,565

Company Posts That Month

34

Language

English

Hacker News Points

-

Post removed?

No

Source URL

clickhouse.com/blog/native-random-sampling

Summary

ClickHouse's native random sampling feature allows users to execute queries on a fraction of their data, providing faster query times while maintaining a reasonable level of accuracy. By using the UK house prices dataset with over 30 million transactions, the process involves creating a table with a suitable sample key, such as the sipHash64 function applied to high-cardinality columns like postcode combinations, to ensure an even distribution of the sampled data. The approach demonstrates how to leverage sampling for both fractional and row count-based queries, highlighting the benefits of reduced processing time and resource usage. To optimize results, the sampling key should be included at the beginning of the ORDER BY clause, and sum aggregations should be scaled using the _sample_factor virtual column. This method is particularly effective for exploratory data analysis where approximate answers are sufficient, offering an efficient trade-off between accuracy and performance.

Trends Found in this Post

No tracked trend matches for this post yet.

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.