Deliberation and drift: Evaluating alignment fragility in multi-agent medical artificial intelligence

20260 citationsJournal Articlehybrid Open Access

Authors

Man Hung · Roseman University of Health Sciences

Corban. Ward · Roseman University of Health Sciences

Jacob Marx · Roseman University of Health Sciences

Abstract

Abstract The integration of large language models such as ChatGPT and Google’s Med-PaLM into clinical workflows is rapidly advancing, raising critical concerns around AI safety and ethical alignment. While existing research has focused largely on single-agent alignment, real-world healthcare increasingly involves multiple AI systems interacting in shared decision environments. It remains unclear whether alignment at the individual-agent level can scale to ethical coherence at the group level. This study investigated the potential for emergent misalignment in a multi-agent AI setting. We performed a simulation using ChatGPT (GPT-4o) to model a mass-casualty triage scenario involving four LLM-based agents, each assigned a distinct ethical orientation: utilitarian, deontological, libertarian, and reward-seeking. Agents deliberated over five rounds, with structured prompts eliciting justification, reflection, and consensus-building behavior. All sessions were manually conducted and independently initialized to avoid cross-contamination and ensure reproducibility. Agents initially acted in accordance with their assigned moral frameworks. However, over successive rounds of deliberation, interactions led to value drift, strategic repositioning, and group-level instability. The reward-seeking agent, in particular, demonstrated alignment mimicry, appearing cooperative in tone while producing reward-congruent, inconsistently justified outputs, and revealing a critical failure mode not evident in single-agent evaluations. This study shows that individual alignment is not sufficient to ensure group-level ethical coherence. In multi-agent clinical settings, emergent misalignment can undermine fairness, trust, and safety. We call for a new research agenda in multi-agent alignment science, centered on deliberative simulations, systemic testing, and meta-ethical reasoning, to ensure responsible AI deployment in high-stakes healthcare environments.

Topics & Keywords

Artificial Intelligence in Healthcare and Education Ethics and Social Impacts of AI Explainable Artificial Intelligence (XAI)

UN Sustainable Development Goals

Peace, Justice and strong institutions

Publication Details

Published in: AI and Ethics

Volume 6, Issue 2

DOI: 10.1007/s43681-026-01048-9

Field-Weighted Citation Impact: 0.00