BidirLM: Turning Generative LLMs into the Best Open-Source Omnimodal Encoders

Post Details

Company

Hugging Face

Date Published

April 7, 2026

Author

Nicolas-BZRD and Théo Deschamps-Berger

Word Count

1,772

Company Posts That Month

61

Language

-

Hacker News Points

-

Post removed?

No

Source URL

huggingface.co/blog/Nicolas-BZRD/bidirlm-release

Summary

BidirLM is an innovative open-source project that transforms generative language models into powerful omnimodal encoders by adapting causal decoder models into bidirectional encoders. The process involves a two-phase pipeline that starts with Masked Next-Token Prediction (MNTP) to enable the use of bidirectional context, followed by contrastive training to enhance embedding quality. To address challenges like catastrophic forgetting when scaling without original data, the project employs strategies such as linear weight merging and multi-domain data mixtures, significantly improving cross-domain knowledge retention. The creators further advanced the project by merging weights from specialized models like vision and audio into their text encoder, resulting in BidirLM-Omni, a compact model that excels in handling text, images, and audio, outperforming both omnimodal and unimodal specialists in standard benchmarks. The BidirLM approach is modular, allowing for incremental integration of new specialized models, offering a cost-effective and flexible alternative to traditional multimodal encoder training.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	6	5,932	1,046	223	-2%
Vector Search	4	1,739	413	146	-27%
AI Model Fine-tuning	1	420	130	55	-54%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.