Generating more realistic sample time-series data with PostgreSQL generate_series()

Company

Timescale

Date Published

Nov. 11, 2021

Author

Ryan Booz

Word count

4988

Language

English

Hacker News points

None

URL

www.timescale.com/blog/generating-more-realistic-sample-time-series-data-with-postgresql-generate_series

Summary

The PostgreSQL function `generate_series()` is used to create large sets of sample data for testing and querying. The first post reviewed how to use `generate_series()` to create time-series data, including joining multiple series into a larger table of time-series data through a feature known as a CROSS (or Cartesian) JOIN. However, the generated data was basic and not very realistic, with random numbers and minimal variation. In this second post, we demonstrate ways to create more realistic-looking data beyond a column or two of random decimal values, including creating functions like `random_between()` and `random_text()`. These functions can be used to generate realistic numeric and text data for time-series data, such as CPU usage and temperature readings. We also show how to use these functions in conjunction with the `json_object_agg()` function to create JSON documents. Finally, we demonstrate how to query this data using various PostgreSQL functions, including `time_bucket()`, `approx_percentile()`, and `time_weight()`. In part 3 of this series, we will explore ways to add shape and trends into your sample time-series data.