Search for a command to run...
<b>BACKGROUND</b>. Chest radiographs play a crucial role in tuberculosis screening in high-prevalence regions, although widespread radiographic screening requires expertise that may be unavailable in settings with limited medical resources. <b>OBJECTIVE</b>. The purpose of this study was to evaluate a multimodal generative artificial intelligence (AI) model for detecting tuberculosis-associated abnormalities on chest radiography in patients undergoing tuberculosis screening. <b>METHODS</b>. This retrospective study evaluated 800 chest radiographs obtained from two public datasets originating from tuberculosis screening programs. A generative AI model was used to create free-text reports for the radiographs. AI-generated reports were classified in terms of presence versus absence and laterality of tuberculosis-related abnormalities. Two radiologists independently reviewed the radiographs for tuberculosis presence and laterality in separate sessions, without and with use of AI-generated reports, and recorded if they would accept the report without modification. Two additional radiologists reviewed radiographs and clinical readings from the datasets to determine the reference standard. <b>RESULTS</b>. By the reference standard, 378 of 800 radiographs were positive for tuberculosis-related abnormalities. For detection of tuberculosis-related abnormalities, sensitivity, specificity, and accuracy were 95.2%, 86.7%, and 90.8% for AI-generated reports; 93.1%, 93.6%, and 93.4% for reader 1 without AI-generated reports; 93.1%, 95.0%, and 94.1% for reader 1 with AI-generated reports; 95.8%, 87.2%, and 91.3% for reader 2 without AI-generated reports; and 95.8%, 91.5%, and 93.5% for reader 2 with AI-generated reports. Accuracy was significantly lower for AI-generated reports than for both readers alone (<i>p</i> < .001), but significantly higher with than without AI-generated reports for one reader (reader 1: <i>p</i> = .47; reader 2: <i>p</i> = .03). Localization performance was significantly lower (<i>p</i> < .001) for AI-generated reports (63.3%) than for reader 1 (79.8%) and reader 2 (77.9%) without AI-generated reports and did not significantly change for either reader with AI-generated reports (reader 1: 78.7%, <i>p</i> = .71; reader 2: 81.5%, <i>p</i> = .23). Among normal and abnormal radiographs, reader 1 accepted 91.7% and 52.4%, whereas reader 2 accepted 83.2% and 37.0%, respectively, of AI-generated reports. <b>CONCLUSION</b>. Although AI-generated reports may augment radiologists' diagnostic assessments, the current model requires human oversight, given inferior standalone performance. <b>CLINICAL IMPACT</b>. The generative AI model could have potential application to aid tuberculosis screening programs in medically underserved regions, although technical improvements remain required.
Published in: American Journal of Roentgenology
Volume 225, Issue 4, pp. e2533059-e2533059
DOI: 10.2214/ajr.25.33059