How We Built a Semantic Highlight Model To Save Token Cost for RAG

Post Details

Company

HuggingFace

Date Published

Jan. 15, 2026

Author

Cheney Zhang and Jiang Chen

Word Count

2,344

Company Posts That Month

56

Language

-

Hacker News Points

-

Source URL

huggingface.co/blog/zilliz/zilliz-semantic-highlight-model

Summary

A bilingual Semantic Highlight model has been developed and open-sourced to improve production RAG (retrieval-augmented generation) systems by reducing token costs and enhancing answer quality. This model, which operates on both English and Chinese, highlights semantically relevant sentences in documents, improving data interpretation and reducing irrelevant information. Unlike traditional keyword-based highlighting, this model uses a 0.6B encoder-only architecture to efficiently identify sentences that semantically address queries, even without keyword matches. It achieves a 70-80% token cost reduction and better answer quality by focusing on relevant content. Existing models like OpenSearch and Naver's Provence/XProvence were found inadequate due to limitations in context window size, language support, and commercial licensing. The new model, based on BGE-M3 Reranker v2, enhances performance through LLM-generated training data, achieving state-of-the-art results on both English and Chinese datasets. This model is released under the MIT license, allowing commercial use, and provides a foundation for developing more cost-effective and interpretable RAG systems.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	12	3,836	662	193	+2%
RAG	10	849	194	70	-7%
AI Agents	1	3,616	674	184	+28%
Real-time	1	4,546	943	215	-38%