Recent advancements in AI have led to innovative techniques in document retrieval, particularly through models like ColPali, which combines vision and language models for efficient data processing. Late Interaction Retrieval models, such as ColBERT, rely on embedding similarities between queries and documents, offering a method to precompute document representations offline, thus reducing computational demands during query time. ColPali extends this by integrating a visual retriever model that utilizes PaliGemma, a combination of vision and language encoders, to create multi-vector representations of documents. This allows for efficient retrieval through MaxSim operations, which calculate maximum similarity scores across query terms. The process is further enhanced by LanceDB, a database designed for fast retrieval in multi-modal datasets, providing both compute-storage separation and support for full-text and semantic searches. Despite its efficiency, challenges remain in scaling this approach, as the high dimensionality of embeddings can be computationally expensive, necessitating strategies to reduce search space and optimize retrieval processes.