Build complete URL hierarchies with path truncation in ClickHouse ®
Blog post from Tinybird
URLHierarchy is a function within ClickHouse® that enables the analysis of web traffic by breaking down URLs into hierarchical segments, thus providing insights into user navigation patterns across website structures. This tool is particularly useful for understanding traffic flow at various levels of a site's content hierarchy, from broad categories to specific pages, by truncating URLs at natural boundaries like slashes and query parameters. The function accepts a URL string and returns an array of strings representing the hierarchical levels of the URL. It can be combined with ClickHouse®'s array manipulation functions to transform URL data into a format suitable for aggregation and analysis. The document outlines the implementation of URLHierarchy in analytics workflows, including the handling of edge cases, performance optimization techniques for large-scale data, and the creation of content drilldown APIs using Tinybird's managed ClickHouse® service. By leveraging URLHierarchy, users can gain Google Analytics-style insights into content performance, user engagement, and navigation patterns, which are crucial for optimizing content strategy and understanding user behavior.