Search for a command to run...
Abstract In the United States, colorectal cancer is the second leading cause of cancer-related death, with over 52,900 deaths in 2025 alone, according to the American Cancer Society. Early detection of malignant Kudo pit patterns through colonoscopic imaging is crucial for reducing mortality, as identifying these polyps and pits enables timely cancer diagnosis using direct visualization of the gastrointestinal system. Conventional diagnostic techniques for processing colonoscopies are explainable but inefficient, resulting in delays in timely treatment. State-of-the-art AI approaches that rely on image classification are hard to explain and mainly focus on a specific modality, leading to limitations in their diagnostic capabilities. For all these reasons, there exists a significant gap between AI advances and usage in a clinical setting. We bridge this significant gap between AI development and real-life diagnostic decision-making with two main contributions: (1) developing Endo-Insight Gen, a multimodal AI model that generates real-time, textual descriptions of endoscopic images, and (2) creating MED-X, a multi-agent AI system which integrates models such as Endo-Insight Gen to analyze an endoscopic image by collaborating with other models to provide a reason-based decision for Kudo pattern classification. Trained on a subset of the HyperKvasir dataset with approximately 10,000 labeled images, Endo Insight-Gen transforms visual features into clinically relevant textual descriptions. It simultaneously processes text and image inputs, integrating natural language processing and visual recognition to offer robust support for clinical and research applications. We benchmarked Endo-InsightGen against expert annotations and outputs from ChatGPT-4, LLaVA-1.5, and LLaVA-Med, with endoscopy specialists providing additional evaluation. The model was incorporated into MED-X alongside base models such as LLaVA-Med and fine-tuned models like LLaVA-Endo trained on ∼5,000 Kudo images. To overcome limited labeled data, we developed a web-based, human-in-the-loop annotation platform. A few-shot vision model generates preliminary labels, which are refined and validated by multiple experts, producing a consensus-driven dataset efficiently compared to manual labeling. MED-X uses multiple AI agents to collaboratively analyze each colonoscopic image, producing diagnostic conclusions on kudo pit pattern. Its framework is both interpretable and efficient: reasoning models and multi-agent summarization allow the system to mimic human expert panels. Endo-InsightGen shows strong concordance with expert annotations, demonstrating its potential as an interpretable clinical tool. MED-X further exhibits advanced reasoning for Kudo classification, outperforming current AI systems. Collectively, these contributions establish MED-X as an explainable diagnostic assistant, capable of accurate, efficient, and clinically aligned detection of colorectal polyps through colonoscopic imaging. Citation Format: Kushal Virupakshappa, Sowmya Sankaran, Yue Hu, Oladimeji Macaulay, Ala Jararaweh, David Arredondo, Gulshan Parasher, Avinash D. Sahu. Explainable AI with multi-agent collaborative system for colonoscopic polyp detection and Kudo pit classification [abstract]. In: Proceedings of the AACR Special Conference in Cancer Research: Cancer Evolution: The Dynamics of Progression and Persistence; 2025 Dec 4-6; Albuquerque, NM. Philadelphia (PA): AACR; Cancer Res 2025;85(23_Suppl):Abstract nr A032.
Published in: Cancer Research
Volume 85, Issue 23_Supplement, pp. A032-A032