Multimodal Generative Artificial Intelligence Model for Creating Radiology Reports for Chest Radiographs in Patients Undergoing Tuberculosis Screening

20254 citationsJournal Article

Authors

Eun Kyoung Hong · Mass General Brigham

Hae Won Kim · The Catholic University of Korea Seoul St. Mary's Hospital

Ok Kyu Song · Mass General Brigham

Kyu-Chong Lee · Korea University

Dong Kyu Kim · Severance Hospital

Jae-Bock Cho

Jungbin Kim

Seung‐Ho Lee ·

Abstract

BACKGROUND. Chest radiographs play a crucial role in tuberculosis screening in high-prevalence regions, although widespread radiographic screening requires expertise that may be unavailable in settings with limited medical resources. OBJECTIVE. The purpose of this study was to evaluate a multimodal generative artificial intelligence (AI) model for detecting tuberculosis-associated abnormalities on chest radiography in patients undergoing tuberculosis screening. METHODS. This retrospective study evaluated 800 chest radiographs obtained from two public datasets originating from tuberculosis screening programs. A generative AI model was used to create free-text reports for the radiographs. AI-generated reports were classified in terms of presence versus absence and laterality of tuberculosis-related abnormalities. Two radiologists independently reviewed the radiographs for tuberculosis presence and laterality in separate sessions, without and with use of AI-generated reports, and recorded if they would accept the report without modification. Two additional radiologists reviewed radiographs and clinical readings from the datasets to determine the reference standard. RESULTS. By the reference standard, 378 of 800 radiographs were positive for tuberculosis-related abnormalities. For detection of tuberculosis-related abnormalities, sensitivity, specificity, and accuracy were 95.2%, 86.7%, and 90.8% for AI-generated reports; 93.1%, 93.6%, and 93.4% for reader 1 without AI-generated reports; 93.1%, 95.0%, and 94.1% for reader 1 with AI-generated reports; 95.8%, 87.2%, and 91.3% for reader 2 without AI-generated reports; and 95.8%, 91.5%, and 93.5% for reader 2 with AI-generated reports. Accuracy was significantly lower for AI-generated reports than for both readers alone (p < .001), but significantly higher with than without AI-generated reports for one reader (reader 1: p = .47; reader 2: p = .03). Localization performance was significantly lower (p < .001) for AI-generated reports (63.3%) than for reader 1 (79.8%) and reader 2 (77.9%) without AI-generated reports and did not significantly change for either reader with AI-generated reports (reader 1: 78.7%, p = .71; reader 2: 81.5%, p = .23). Among normal and abnormal radiographs, reader 1 accepted 91.7% and 52.4%, whereas reader 2 accepted 83.2% and 37.0%, respectively, of AI-generated reports. CONCLUSION. Although AI-generated reports may augment radiologists' diagnostic assessments, the current model requires human oversight, given inferior standalone performance. CLINICAL IMPACT. The generative AI model could have potential application to aid tuberculosis screening programs in medically underserved regions, although technical improvements remain required.

Topics & Keywords

Radiomics and Machine Learning in Medical Imaging COVID-19 diagnosis using AI Artificial Intelligence in Healthcare and Education

UN Sustainable Development Goals

Good health and well-being

Publication Details

Published in: American Journal of Roentgenology

Volume 225, Issue 4, pp. e2533059-e2533059

DOI: 10.2214/ajr.25.33059

Field-Weighted Citation Impact: 6.21