In this article, Ulrick BLE discusses the development and evaluation of multimodal rerankers, specifically focusing on a reranker based on Qwen 3 VL 2B that surpasses the Jina Reranker M0 in performance, inference speed, and model size. The author explores the evolving field of multimodal retrieval and the crucial role of rerankers in enhancing the relevance of retrieved documents, particularly in visually rich corporate scenarios where traditional text-based methods fall short. By creating a benchmark dataset for evaluating multimodal rerankers and experimenting with reinforcement learning (RL) strategies, Ulrick BLE highlights the challenges and opportunities in optimizing rerankers for multimodal contexts. The article also delves into technical aspects such as attention implementation and inference optimization, emphasizing the potential of using pretrained language model (LM) heads for efficient performance. Despite achieving competitive results, the attempt to apply RL for reranking did not yield conclusive outcomes, indicating the complexities involved in multimodal environments.