Home / Companies / Weaviate / Blog / Post Details
Content Deep Dive

Late Chunking: Balancing Precision and Cost in Long Context Retrieval

Blog post from Weaviate

Post Details
Company
Date Published
Author
Charles Pierse, Connor Shorten, Akanksha Sharma
Word Count
2,517
Company Posts That Month
5
Language
English
Hacker News Points
2
Summary

JinaAI has introduced a new methodology called late chunking to aid in long-context retrieval for large documents. This approach aims to preserve contextual information across large documents by inverting the traditional order of embedding and chunking. Unlike naive chunking, which breaks up a document into chunks independently, or ColBERT, which requires significant storage capacity, late chunking maintains the contextual relationships between tokens across the entire document during the embedding process and only afterwards divides these contextually-rich embeddings into chunks. This method can help mitigate issues associated with very long documents, such as expensive LLM calls, increased latency, and a higher chance of hallucination. Late chunking offers a cost-effective path forward for users doing long context retrieval while preserving the contextual information that late interaction offers.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
Vector Search 27 3,675 269 79 +77%
RAG 8 1,936 254 78 -19%
LLM 2 3,889 441 129 +7%
Real-time 2 3,932 887 192 +47%