<i>RADO</i> : Trustworthy Radiology Impression Generation using Safety and Faithfulness based Preference Optimization

20260 citationsJournal Article

Authors

Akash Ghosh · Indian Institute of Technology Patna

Nishant Kumar · Indian Institute of Technology Patna

Nitesh Patnaik · KIIT University

Adity Prakash · GenePath Dx (India)

Rishi Raj · Indian Institute of Management Visakhapatnam

Sriparna Saha · Indian Institute of Technology Patna

Abstract

Radiology impression generation involves producing concise, clinically meaningful summaries from detailed imaging findings such as CT and MRI scans, serving as a critical aid in diagnosis and treatment planning. However, recent studies highlight a severe shortage of radiologists, particularly in low and middle-income countries, where there is fewer than one radiologist per 100,000 people, making timely expert interpretation a significant challenge. While advancements in AI, especially large language models (LLMs), offer promising potential to automate this task, current systems often suffer from hallucinations, omissions of key clinical details, and a lack of linguistic clarity, thereby raising serious concerns about their safety and reliability in real-world clinical settings. In this work, we attempted to address this issue by introducing RADO , a novel framework for radiology impression generation that integrates safety, faithfulness, and linguistic refinement rewards for preference optimization. To support robust evaluation, we introduce RIB , a real-world benchmark dataset curated and annotated by radiologists, spanning 1,429 annotated CT and MRI findings and impressions across 27 study types. RADO enforces critical safety and factuality constraints via carefully designed reward models and achieves state-of-the-art performance across multiple automatic and human evaluation metrics. Our framework significantly outperforms existing baselines, demonstrating improved factual consistency, reduced omissions, and higher clinical relevance, thus advancing the safety and reliability of generative AI in high-stakes medical applications. The code and dataset associated with the work are made available at RADO . Disclaimer: This work includes descriptions of medical reports related to the subject of the study, which some readers may find sensitive or potentially distressing.

Topics & Keywords

Artificial Intelligence in Healthcare and Education Machine Learning in Healthcare Multimodal Machine Learning Applications

UN Sustainable Development Goals

No poverty

Publication Details

Published in: ACM Transactions on Computing for Healthcare

DOI: 10.1145/3805803

Field-Weighted Citation Impact: 0.00