Search for a command to run...
Giuliano Lo Bianco,1,* Alexandra Therond,2,* Francesco Paolo D’angelo,3 Leonardo Kapural,4 Sudhir Diwan,5 Peter Staats,6,7 Sean Li,8 Paul J Christo,9 Timothy R Deer,10 Christopher L Robinson9 1Anesthesiology and Pain Department, Fondazione Istituto G. Giglio Cefalù, Palermo, Italy; 2Department of Psychology, Université du Québec à Montréal, Montréal, QC, Canada; 3Department of Anaesthesia, Intensive Care and Emergency, University Hospital Policlinico Paolo Giaccone, Palermo, Italy; 4Center for Clinical Research, Carolinas Pain Institute, Winston-Salem, NC, USA; 5Albert Einstein College of Medicine, Bronx, NY, USA; 6electroCore, Rockaway, NJ, USA; 7National Spine and Pain Centers, Rockville, MD, USA; 8National Spine and Pain Centers, Shrewsbury, NJ, USA; 9Division of Pain Medicine, Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA; 10The Spine and Nerve Centers of the Virginias, Charleston, WV, USA*These authors contributed equally to this workCorrespondence: Giuliano Lo Bianco, Anesthesiology and Pain Department, Fondazione Istituto G. Giglio Cefalù, Palermo, Italy, Email giulianolobianco@gmail.com Christopher L Robinson, Johns Hopkins University School of Medicine, Department of Anesthesiology and Critical Care Medicine, 1800 Orleans Street, Baltimore, MD, 21287, USA, Email ChristopherRobinsonMDPhD@outlook.comBackground: With the continued advancement of artificial intelligence (AI), large language models (LLMs) such as GPT-4 may assist clinicians in evaluating patient candidacy for spinal cord stimulation (SCS). We compared a general-purpose, non–fine-tuned LLM (GPT-4), an expert multidisciplinary team (MDT), and a clinician-input, rule-based e-Health decision-support tool. The study focused exclusively on decision agreement and did not assess clinical outcomes (eg, pain relief or device retention).Methods: This single-center, retrospective cohort was conducted at Fondazione Istituto G. Giglio (Cefalù, Italy) and included 93 consecutive adults referred to the MDT for SCS evaluation between January 2022 and March 2024. The MDT issued binary recommendations (“proceed” vs “do not proceed”) as the reference standard. The e-Health tool generated “yes”, “maybe”, or “no” outputs from structured clinician-entered data. GPT-4 was applied zero-shot, using a single standardized prompt on anonymized vignettes within an offline environment. The primary endpoint was agreement (weighted κ) among MDT, e-Health, and GPT-4; sensitivity/specificity analyses explored three interpretations of “maybe”.Results: The MDT recommended SCS for 91.4% of patients, compared with 54.8% for the e-Health tool and 46.2% for GPT-4. Agreement was moderate for MDT vs e-Health (κ = 0.51) and e-Health vs GPT-4 (κ = 0.46), and fair for MDT vs GPT-4 (κ = 0.29). GPT-4 demonstrated a more conservative profile, favoring specificity over sensitivity.Conclusion: A non–fine-tuned GPT-4 approximated but did not replicate MDT decision-making, functioning as a high-specificity, low-sensitivity filter. A layered workflow combining rule-based tools with expert oversight and targeted LLM adaptation may best optimize SCS candidate selection.Keywords: artificial intelligence, large-language models, spinal cord stimulation, chronic pain, patient selection, neuromodulation