Why PostgreSQL Ignored Our Index (and What the Planner Was Thinking)
Blog post from Mergify
PostgreSQL's cost-based query planner sometimes opts for sequential scans over index scans, even on indexed columns, due to its mathematical evaluation of execution plans, which can lead to performance issues on large tables. At Mergify, engineers encountered this behavior on tables with millions of rows, where PostgreSQL would choose a sequential scan, causing API latency spikes. By understanding the planner's reasoning, which involves evaluating multiple execution plans based on cost estimates derived from I/O and CPU metrics, they discovered that simply altering the query with an ORDER BY clause could signal the planner to use the index, dramatically improving performance. This insight led them to adopt a more interactive approach to query optimization, involving refreshing statistics, using hints like ORDER BY and LIMIT, adjusting cost constants for modern hardware, and validating changes with EXPLAIN ANALYZE to ensure the planner's decisions align with their performance goals.