HCMBench: an evaluation toolkit for hallucination correction models
Blog post from Vectara
HCMBench is an open-source evaluation toolkit developed by Vectara to address hallucinations in Retrieval-Augmented Generation (RAG) systems, particularly in fields requiring high accuracy such as healthcare and financial services. The toolkit comprises four main components: the Dataset, Hallucination Correction Model (HCM), Postprocessor, and Hallucination Evaluation Model (HEM), and integrates multiple public datasets to assess the effectiveness of hallucination correction models. Users can customize and configure the pipeline to evaluate models at different levels of granularity, from response-level to claim-level, using metrics like HHEM, Minicheck, AXCEL, FACTSJudge, and ROUGE. This allows users to monitor the similarity between edited and original responses while improving the accuracy of generated content. HCMBench's modular design supports various research and development needs, allowing for flexible and comprehensive assessment of hallucination correction effectiveness. Vectara's toolkit encourages contributions from the community to further enhance the evaluation of hallucination correction models.