Company
Date Published
Author
Ryan Booz
Word count
4988
Language
English
Hacker News points
None

Summary

The PostgreSQL function `generate_series()` is used to create large sets of sample data for testing and querying. The first post reviewed how to use `generate_series()` to create time-series data, including joining multiple series into a larger table of time-series data through a feature known as a CROSS (or Cartesian) JOIN. However, the generated data was basic and not very realistic, with random numbers and minimal variation. In this second post, we demonstrate ways to create more realistic-looking data beyond a column or two of random decimal values, including creating functions like `random_between()` and `random_text()`. These functions can be used to generate realistic numeric and text data for time-series data, such as CPU usage and temperature readings. We also show how to use these functions in conjunction with the `json_object_agg()` function to create JSON documents. Finally, we demonstrate how to query this data using various PostgreSQL functions, including `time_bucket()`, `approx_percentile()`, and `time_weight()`. In part 3 of this series, we will explore ways to add shape and trends into your sample time-series data.