Atropos Health’s Arjun Mukerji, PhD, Explains RWESummary: A Framework and Test for Choosing LLMs to Summarize Real-World Evidence (RWE) Studies

Post Details

Company

Arize

Date Published

Sept. 19, 2025

Author

Dylan Couzon

Word Count

369

Language

English

Hacker News Points

-

Source URL

arize.com/blog/atropos-healths-arjun-mukerji-phd-explains-rwesummary-a-framework-and-test-for-choosing-llms-to-summarize-real-world-evidence-rwe-studies

Summary

In a recent presentation, Arjun Mukerji, PhD, a Staff Data Scientist at Atropos Health, introduced RWESummary, a benchmark for evaluating large language models (LLMs) in the context of summarizing real-world evidence (RWE) studies. Mukerji emphasized the importance of selecting reliable AI models for healthcare due to its high-stakes nature, where errors can have significant consequences. RWESummary tests LLMs on converting structured study inputs into plain-English summaries, focusing on three key evaluations: the accuracy of the direction of effect, numerical accuracy, and completeness. Mukerji highlighted that getting the direction of effect right is crucial, as reversing it could lead to severe misinterpretations. The benchmark revealed no single model excelled in all areas; Gemini 2.5 performed best overall in accuracy, while Gemini 2.0 Flash was superior in speed. Mukerji advocated for robust evaluations and incorporating human oversight in AI-driven healthcare workflows to mitigate risks.