Home / Companies / Plaid / Blog / Post Details
Content Deep Dive

Managing your Amazon Redshift performance: How Plaid uses Periscope Data

Blog post from Plaid

Post Details
Company
Date Published
Author
Austin Gibbons
Word Count
1,903
Company Posts That Month
6
Language
English
Hacker News Points
-
Summary

The Data Science & Infrastructure team at Plaid rebuilt their internal analytics around rollup tables and materialized views, using Periscope Data as a business intelligence tool to track metrics around core product usage, go-to-market strategy, and customer support. Initially, query performance was volatile due to the lack of sort and distribution keys in Redshift tables, which impacted data organization on-disk. The team discovered that 95% of slow queries came from just 5% of tables, with a Pareto distribution indicating compounding factors such as large datasets and similar filters. To address this, they pre-computed common elements by creating rollup tables, materialized views, and pre-filtered tables, changing user queries to run against these derivative tables. The operational deployment required cross-team collaboration and setting up dashboards to track individual dashboards and charts contributing slow runtimes. Ultimately, the system stabilized, removing nearly all "unbearably long" queries, with a total migration of 10x more queries, 1/10th the query runtime. The team emphasizes understanding user query patterns, making data tightly-packed through rollup tables, reducing query duplication, and continuously improving processes.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
Data Pipeline 2 74 13 11 +252%
Real-time 1 366 107 45 -2%