Search for a command to run...
Radiology impression generation involves producing concise, clinically meaningful summaries from detailed imaging findings such as CT and MRI scans, serving as a critical aid in diagnosis and treatment planning. However, recent studies highlight a severe shortage of radiologists, particularly in low and middle-income countries, where there is fewer than one radiologist per 100,000 people, making timely expert interpretation a significant challenge. While advancements in AI, especially large language models (LLMs), offer promising potential to automate this task, current systems often suffer from hallucinations, omissions of key clinical details, and a lack of linguistic clarity, thereby raising serious concerns about their safety and reliability in real-world clinical settings. In this work, we attempted to address this issue by introducing RADO , a novel framework for radiology impression generation that integrates safety, faithfulness, and linguistic refinement rewards for preference optimization. To support robust evaluation, we introduce RIB , a real-world benchmark dataset curated and annotated by radiologists, spanning 1,429 annotated CT and MRI findings and impressions across 27 study types. RADO enforces critical safety and factuality constraints via carefully designed reward models and achieves state-of-the-art performance across multiple automatic and human evaluation metrics. Our framework significantly outperforms existing baselines, demonstrating improved factual consistency, reduced omissions, and higher clinical relevance, thus advancing the safety and reliability of generative AI in high-stakes medical applications. The code and dataset associated with the work are made available at RADO . Disclaimer: This work includes descriptions of medical reports related to the subject of the study, which some readers may find sensitive or potentially distressing.