Managing your Amazon Redshift performance: How Plaid uses Periscope Data

Company

Plaid

Date Published

Sept. 10, 2018

Author

Austin Gibbons

Word count

1903

Language

English

Hacker News points

None

URL

plaid.com/blog/managing-your-amazon-redshift-performance-how-plaid-uses-periscope-data

Summary

The Data Science & Infrastructure team at Plaid rebuilt their internal analytics around rollup tables and materialized views, using Periscope Data as a business intelligence tool to track metrics around core product usage, go-to-market strategy, and customer support. Initially, query performance was volatile due to the lack of sort and distribution keys in Redshift tables, which impacted data organization on-disk. The team discovered that 95% of slow queries came from just 5% of tables, with a Pareto distribution indicating compounding factors such as large datasets and similar filters. To address this, they pre-computed common elements by creating rollup tables, materialized views, and pre-filtered tables, changing user queries to run against these derivative tables. The operational deployment required cross-team collaboration and setting up dashboards to track individual dashboards and charts contributing slow runtimes. Ultimately, the system stabilized, removing nearly all "unbearably long" queries, with a total migration of 10x more queries, 1/10th the query runtime. The team emphasizes understanding user query patterns, making data tightly-packed through rollup tables, reducing query duplication, and continuously improving processes.