The use of Large Language Models and other Generative AI tools to support decision-making in cancer care: a mapping review

20260 citationsJournal Articlehybrid Open Access

Authors

Safa Elkefi · Binghamton University

Roberta Scheinmann · Columbia University

Duxiao Hao · Binghamton University

Salma Bhar · Binghamton University

Achraf Tounsi · AID Atlanta

Steven Feiner · Columbia University

Abstract

Cancer care involves complex, data-driven decisions shared among patients, clinicians, and multidisciplinary teams. The recent emergence of large language models (LLMs) and other generative artificial intelligence (GenAI) tools has introduced new opportunities to enhance decision-making by facilitating information synthesis, education, and communication. However, empirical evidence describing how these technologies are developed, evaluated, and implemented in oncology remains fragmented. This mapping review aimed to characterize the scope and features of empirically tested LLM and GenAI tools designed to support decision-making in cancer care. Following PRISMA guidelines, five databases were searched through June 2025. Studies were included if they described an AI-based tool using transformer or foundation models (for example, GPT, Gemini, LLaMA) that supported patients or clinicians in oncology-related decisions and had undergone empirical testing. Data were extracted on publication characteristics, model types, decision functions, evaluation methods, and implementation challenges. A total of 218 studies were included. Publications increased more than fivefold between 2023 and 2024, with most originating from high-income countries, particularly the United States and Germany. The majority of tools were clinical decision support systems (61%, n=133), followed by patient-facing chatbots (17%, n=37) and educational platforms (12%, n=26). Diagnostic and treatment-planning decisions dominated (~54%, n=117), while follow-up and survivorship support were rarely addressed. Most tools relied on general-purpose closed models such as GPT-3.5, GPT-4, or Gemini, used without domain-specific fine-tuning. Evaluation designs primarily assessed accuracy, guideline concordance, and readability, reporting correctness between 55% and 97%. Few studies examined clinical, behavioral, or implementation outcomes. Reported challenges included hallucination, lack of transparency, limited explainability, privacy concerns, and poor integration with electronic health records. LLM and GenAI applications in oncology are expanding rapidly, yet remain concentrated in high-income settings and dominated by non-customized, general-purpose models. Evidence to date focuses on correctness rather than clinical utility or safety. Future research should prioritize context-specific model adaptation, workflow integration, equitable language accessibility, and rigorous mixed-method evaluations to ensure trustworthy, transparent, and patient-centered decision support in cancer care. Not applicable.

Topics & Keywords

Artificial Intelligence in Healthcare and Education Explainable Artificial Intelligence (XAI)Cancer survivorship and care

UN Sustainable Development Goals

No poverty

Publication Details

Published in: BMC Artificial Intelligence

Volume 2, Issue 1

DOI: 10.1186/s44398-026-00024-x

Field-Weighted Citation Impact: 0.00