Long context retrieval models with Monarch Mixer

Company

Together AI

Date Published

Jan. 11, 2024

Author

Jon Saad-Falcon, Dan Fu, Simran Arora

Word count

2583

Language

English

Hacker News points

None

URL

www.together.ai/blog/long-context-retrieval-models-with-monarch-mixer

Summary

The text discusses the development of long-context retrieval models using Monarch Mixer, a recent model family that aims to improve the scaling properties of Transformers along two axes – sequence length and model dimension. The authors release a preview of several models, including long-context versions of M2-BERT up to 32K context length and embedding versions fine-tuned for long-context retrieval. They also introduce a new benchmark called LoCo, which is designed to evaluate the performance of long-context retrieval models on tasks with long documents. The authors report promising results, demonstrating that their long-context M2-BERT models can outperform much larger models on this benchmark, suggesting that long-context models are beneficial for retrieval. The authors also discuss challenges in training long-context models, including adapting the BERT pretraining pipeline and fine-tuning the model using a suitable loss function. They propose a new loss function called orthogonal loss, which pushes the cosine similarity of positive pairs to 1 and the cosine similarity of negative pairs to 0.