How to extract port numbers from URLs in ClickHouse ®
Blog post from Tinybird
ClickHouse provides a built-in function, port(), to efficiently extract port numbers from URLs, returning a UInt16 value, with 0 indicating no explicit port is present. When dealing with URLs lacking ports, the function facilitates conditional logic to substitute default ports like 80 for HTTP and 443 for HTTPS. The guide explores URL parsing, handling edge cases such as IPv6 addresses and malformed URLs, and optimizing performance for large datasets by recommending the use of built-in functions over regex-based extraction methods. For enhanced query performance, it suggests pre-computing port values during data ingestion or using materialized views and views for reusable logic. The document also highlights the benefits of deploying port extraction logic as an API endpoint using Tinybird for flexible parameterization and infrastructure management.