How to strip query parameters from URLs in ClickHouse ®
Blog post from Tinybird
Web analytics often face the challenge of cluttered URLs due to tracking parameters, which complicate analysis and grouping. ClickHouse® addresses this issue with the cutQueryString() function, which efficiently removes everything after the question mark in a URL, simplifying the analysis of page performance and user behavior. While cutQueryString() provides a straightforward solution for removing all query parameters, it does not remove URL fragments, and for selective parameter removal, regex functions like replaceRegexpAll become necessary. The function is available in ClickHouse® 19.14 and later versions and offers better performance than regex-based alternatives due to its optimization for URL parsing. To automate URL cleaning without query-time overhead, materialized views can be used, while Tinybird's managed ClickHouse® platform facilitates the development of real-time URL cleaning APIs by abstracting the complexities of database management and scaling, allowing teams to focus on application development.