Search for a command to run...
Abstract The integration of large language models such as ChatGPT and Google’s Med-PaLM into clinical workflows is rapidly advancing, raising critical concerns around AI safety and ethical alignment. While existing research has focused largely on single-agent alignment, real-world healthcare increasingly involves multiple AI systems interacting in shared decision environments. It remains unclear whether alignment at the individual-agent level can scale to ethical coherence at the group level. This study investigated the potential for emergent misalignment in a multi-agent AI setting. We performed a simulation using ChatGPT (GPT-4o) to model a mass-casualty triage scenario involving four LLM-based agents, each assigned a distinct ethical orientation: utilitarian, deontological, libertarian, and reward-seeking. Agents deliberated over five rounds, with structured prompts eliciting justification, reflection, and consensus-building behavior. All sessions were manually conducted and independently initialized to avoid cross-contamination and ensure reproducibility. Agents initially acted in accordance with their assigned moral frameworks. However, over successive rounds of deliberation, interactions led to value drift, strategic repositioning, and group-level instability. The reward-seeking agent, in particular, demonstrated alignment mimicry, appearing cooperative in tone while producing reward-congruent, inconsistently justified outputs, and revealing a critical failure mode not evident in single-agent evaluations. This study shows that individual alignment is not sufficient to ensure group-level ethical coherence. In multi-agent clinical settings, emergent misalignment can undermine fairness, trust, and safety. We call for a new research agenda in multi-agent alignment science, centered on deliberative simulations, systemic testing, and meta-ethical reasoning, to ensure responsible AI deployment in high-stakes healthcare environments.