The story of our SAMPLE BY enhancements
Blog post from QuestDB
QuestDB, an open-source time-series database known for its ultra-low latency and high ingestion throughput, encountered an unexpected result when executing a query meant to downsample NYC Taxi dataset trips from 2018, revealing a potential bug in the SAMPLE BY code. The issue arose due to the default calendar alignment in QuestDB's sampling, which floors the timestamp to the nearest unit, sometimes resulting in misaligned buckets. This led to timestamps beginning in 2017 instead of 2018, as expected. Through an investigation involving query explanations and optimizations, it was understood that the flooring mechanism used a fixed origin, causing misalignment in the absence of an appropriate offset origin. To address this, QuestDB introduced new syntax options such as the FROM-TO clause, allowing users to better define output data shapes and intervals, enhancing control over sampling processes and enabling the filling of missing data with specified values. This development aims to provide more flexibility and precision in handling time-series data, particularly for queries with complex conditions or those that lack explicit WHERE clauses. The ongoing enhancements reflect QuestDB's commitment to improving the functionality and user experience of its time-series database capabilities.