Late Chunking: Balancing Precision and Cost in Long Context Retrieval

Post Details

Company

Weaviate

Date Published

Sept. 5, 2024

Author

Charles Pierse, Connor Shorten, Akanksha Sharma

Word Count

2,517

Company Posts That Month

5

Language

English

Hacker News Points

2

Source URL

weaviate.io/blog/late-chunking

Summary

JinaAI has introduced a new methodology called late chunking to aid in long-context retrieval for large documents. This approach aims to preserve contextual information across large documents by inverting the traditional order of embedding and chunking. Unlike naive chunking, which breaks up a document into chunks independently, or ColBERT, which requires significant storage capacity, late chunking maintains the contextual relationships between tokens across the entire document during the embedding process and only afterwards divides these contextually-rich embeddings into chunks. This method can help mitigate issues associated with very long documents, such as expensive LLM calls, increased latency, and a higher chance of hallucination. Late chunking offers a cost-effective path forward for users doing long context retrieval while preserving the contextual information that late interaction offers.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Vector Search	27	3,675	269	79	+77%
RAG	8	1,936	254	78	-19%
LLM	2	3,889	441	129	+7%
Real-time	2	3,932	887	192	+47%