Using GIN Indexes in YugabyteDB

Company

Yugabyte

Date Published

Dec. 13, 2021

Author

Yogesh Mahajan

Word count

1394

Language

English

Hacker News points

None

URL

www.yugabyte.com/blog/yugabytedb-gin-indexes

Summary

YugabyteDB supports three container column types - jsonb, tsvector, and array, which are useful for various cases but require efficient indexing to utilize their full potential. Ordinary secondary indexes are insufficient for these data types due to the limitations in filtering text columns based on word containment, filtering array columns based on element presence, and filtering JSON columns based on primitive values or deeply-nested keys without full table scans. GIN (Generalized Inverted) indexes were developed to support full-text search in PostgreSQL but can be used with other data types like arrays and jsonb documents. YugabyteDB's YBGIN is similar to PostgreSQL GIN, implemented on top of LSM Tree indexes. Operator classes define semantics for index columns of a particular data type and index access method, supporting all GIN operator classes included in the core PostgreSQL distribution. Creating multiple operator classes on a single index allows for better performance with GIN indexes, which store mappings from values within a container column to the row that holds that value, speeding up searches or queries. Users can further improve GIN index performance by considering tips such as dropping and recreating the index during bulk insertions and updating slow due to the intrinsic nature of inverted indexes. YugabyteDB supports partial GIN indexes and PG extensions like pg_trgm for trigram matching, providing fast searching for similar strings. The next phase of development includes features such as multi-column GIN indexes, optimize insert operations, support ASC/DESC/HASH on GIN indexes, hstore, and btree_gin, adding enterprise-grade features to the flagship product - YugabyteDB 2.11.