Summer Internship at Vespa
Blog post from Vespa
During a summer internship at Vespa, two interns embarked on a project to enhance semantic search relevance by developing a system that automates the creation of training data for text embedders using large language models (LLMs) like ChatGPT. The aim was to generate both queries and query relevance judgments (qrels) automatically, thereby reducing the manual labor traditionally involved in this process. They utilized techniques such as few-shot prompting to improve output quality and experimented with various datasets, achieving the most success with the NFCorpus dataset. Despite challenges in applying the system to other datasets, the interns identified potential improvements, including using different models, refining training parameters, and implementing frozen embeddings to overcome bottlenecks in large datasets. Additionally, they engaged in side projects such as developing a sample app for creating embeddings and enhancing the Pyvespa Python API, which bolstered their understanding of Vespa's capabilities and contributed to their professional growth. The internship offered valuable insights into information retrieval, machine learning, and open-source project contributions, with strong support from the Vespa team.