Choosing the optimal index with limited information

Post Details

Company

Memgraph

Date Published

April 11, 2023

Author

Andi Skrgat

Word Count

1,093

Language

English

Hacker News Points

-

Source URL

memgraph.com/blog/optimal-index-with-limited-information

Summary

Indexes in databases are crucial for efficient data retrieval, but selecting the optimal index with limited information can be challenging. The blog explores methodologies to address this issue, particularly in graph databases, though the principles apply to relational databases as well. When querying databases with multiple indices, the goal is to choose the index that results in the fewest nodes or "hits." A simple heuristic is to count nodes with a specific property, but this approach has limitations as it does not consider fine-grained data. A more effective strategy involves calculating average group sizes, which provides a clearer estimate of hits. Additionally, probabilistic methods like the chi-squared statistic can help determine the uniformity of data distributions, offering insights into optimal index selection. The article also touches on the potential for databases like Memgraph to evolve into self-learning systems that adapt to user query patterns, optimizing index selection.