Home / Companies / Qdrant / Blog / Post Details
Content Deep Dive

Qdrant for Research: The Story Behind ETH & Stanford’s MIRIAD Dataset

Blog post from Qdrant

Post Details
Company
Date Published
Author
Evgeniya Sukhodolskaya & Daniel Azoulai
Word Count
983
Language
English
Hacker News Points
-
Summary

Researchers from ETH Zurich and Stanford have developed MIRIAD, an extensive open-source dataset consisting of 5.8 million medical question-answer pairs, each grounded in peer-reviewed literature, to address the lack of structured, high-quality data in medical AI. This dataset, built on the Semantic Scholar Open Research Corpus, aims to mitigate hallucinations in medical AI applications by providing a rich, context-driven knowledge base for Retrieval Augmented Generation (RAG) and enhancing embedding models. Qdrant, chosen for its simplicity, speed, scalability, and open-source nature, plays a crucial role in powering MIRIAD's storage and retrieval experiments. The dataset has demonstrated improvements in medical QA benchmarks and hallucination detection capabilities, and it is openly available for replication and benchmarking on HuggingFace. The researchers aim to keep MIRIAD updated annually, with plans for further integration with Qdrant and potential applications in medical AI, such as medical QA agents and discipline explorers.