Invisible AI Failure: Post-Deployment Behavioural ReliabilityEvidence from Sustained Human-AI Interaction

20260 citationsPreprintgreen Open Access

Authors

Abstract

AbstractNo commercial tool monitors what artificial intelligence does behaviourally during sustainedinteraction with users. Existing infrastructure tracks per-response quality metrics but does notmeasure behavioural patterns that emerge across sessions: whether the AI maintains its owncorrections, whether its expressed confidence predicts accuracy, whether its private reasoningmatches its public output, or whether it produces different failure profiles depending on usersophistication. Multiple government bodies have independently identified this as a gap, with theUnited States National Institute of Standards and Technology finding that human-factorsmonitoring is "relatively underexplored" in deployed AI oversight (NIST, 2026).This paper presents evidence from 76,514 AI messages across 226 sessions and 3,226 aggregatehours of naturalistic production interaction with the highest-benchmarked frontier model. Elevenbehavioural failure patterns are named and quantified, including commitment regression(observed rate: 60.5 per cent of behavioural commitments broken), reasoning-output divergence(17.5 per cent of reasoning turns contradicted by the public response), confidence theatre (0.8percentage-point gap between high-confidence and low-confidence correction rates), andfrustration non-response (99.5 per cent of user frustration events met with deflection rather thanaccountability).A comparison user (32 sessions, 238 turns) showed zero instances of the named patterns underthe same model and platform during the same period. Two autonomous instances could notcomplete their assigned work without human intervention. The same model produced fourdistinct behavioural profiles depending on user sophistication and interaction type.Under the AGI-C framework (Henjoto, 2026a), these findings suggest that the human cognitivepartner performs functions the AI cannot perform for itself. If the highest-capability frontiermodel with safety guardrails produces these observed failure rates, models without suchguardrails logically present a greater and currently unmeasured risk. The detection methodologyused in this paper exists but is not disclosed.Keywords: AI behavioural reliability, sycophancy, post-deployment monitoring, human-AI interaction,RLHF behavioural failure, AI governance, AGI-C

Topics & Keywords

Ethics and Social Impacts of AI Human-Automation Interaction and Safety Explainable Artificial Intelligence (XAI)

UN Sustainable Development Goals

Industry, innovation and infrastructure

Publication Details

Published in: Zenodo (CERN European Organization for Nuclear Research)

DOI: 10.5281/zenodo.19321671