Search for a command to run...
Abstract Large language models (LLMs) are increasingly used by the public to seek health information, yet their reliability in addressing common vaccine myths remains unclear. We conducted an exploratory multi-vendor evaluation of three LLMs (GPT-5, Gemini 2.5 Flash, Claude Sonnet 4) using officially curated vaccination myths from Germany’s public health institution and two realistic user framings as prompts: a curious skeptic and a convinced believer . All model responses were independently evaluated by two blinded medical experts for misconception addressal (binary), scientific accuracy, and communication clarity (5-point Likert scales). Additionally, blinded marketing experts ranked models for lay communication clarity, and Flesch-Kincaid Reading Ease scores were computed for all outputs. Across all myths, prompts, and models (11 x 2 x 3 = 66 rating items), medical raters found 100% successful refutation of misinformation. Scientific accuracy and clarity ratings were high and tightly clustered (median 4.0–4.5), with no combined score below 3 and substantial inter-rater agreement. Marketing experts independently ranked Gemini 2.5 Flash and GPT-5 highest for lay clarity, with Claude Sonnet 4 consistently less favored. Readability analysis revealed generally low accessibility, particularly for the convinced believer framing and for Claude Sonnet 4 outputs. Our findings suggest that current general-purpose LLMs can deliver accurate debunking of widely documented vaccine myths under realistic conditions, but that linguistic complexity and framing-sensitive style may limit accessibility. Careful integration of LLMs into public health channels, alongside transparent sourcing and readability optimization, could enable these models to be used as scalable tools for debunking vaccine myths.