miniCOIL: on the Road to Usable Sparse Neural Retrieval
Blog post from Qdrant
Sparse neural retrieval is an emerging field aiming to combine the semantic understanding of dense retrieval methods with the lightweight, explainable nature of term-based retrieval approaches like BM25. The article discusses the development of miniCOIL, a new candidate for sparse neural retrieval that seeks to address the limitations of previous models by integrating semantic components into the BM25 formula. MiniCOIL is designed to work efficiently across various domains without relying on large labeled datasets, achieving this through a simplified architecture that includes a COIL-inspired semantic component. The approach allows for the creation of sparse representations that can be easily integrated into traditional inverted indexes, making it a practical option for hybrid search solutions. Although the model shows promising results in improving retrieval accuracy by better capturing word meanings, the article also acknowledges the challenges in gaining widespread adoption due to the complexity of integrating vector operations into existing retrieval systems. The authors propose continuous development to enhance the model's quality and applicability across different languages and dense encoders.