How we made ingestion 30% faster with C++ (not Rust, sorry)
Blog post from Tinybird
At Tinybird, real-time analytics depend heavily on efficient data ingestion processes, particularly with the conversion of JSON data to ClickHouse®'s RowBinaryWithDefaults format, which permits missing data fields to be replaced by defaults. Originally implemented in Python with some C helper functions, the conversion process had limitations in performance and maintainability, especially as it duplicated ClickHouse®'s internal encoding functions. By leveraging ClickHouse®'s internal functionalities and adopting a new approach that included writing C++ code to handle JSON to RowBinary conversion, Tinybird aimed to improve performance while maintaining reliability and flexibility. This transition involved creating a JSON Path tree for efficient data extraction, using the fast simdjson parser, and addressing numerous conversion quirks that could impact existing customer workflows. The new implementation has resulted in a significant decrease in CPU usage, improved reliability by documenting ingestion quirks, and better alignment with ClickHouse®'s internals, although further work remains to fully optimize performance and handle legacy data conversion.