The text discusses the development of long-context retrieval models using Monarch Mixer, a recent model family that aims to improve the scaling properties of Transformers along two axes – sequence length and model dimension. The authors release a preview of several models, including long-context versions of M2-BERT up to 32K context length and embedding versions fine-tuned for long-context retrieval. They also introduce a new benchmark called LoCo, which is designed to evaluate the performance of long-context retrieval models on tasks with long documents. The authors report promising results, demonstrating that their long-context M2-BERT models can outperform much larger models on this benchmark, suggesting that long-context models are beneficial for retrieval. The authors also discuss challenges in training long-context models, including adapting the BERT pretraining pipeline and fine-tuning the model using a suitable loss function. They propose a new loss function called orthogonal loss, which pushes the cosine similarity of positive pairs to 1 and the cosine similarity of negative pairs to 0.