The Collapse of Trust: Security Architecture for the Age of Autonomous AI Agents

20260 citationsPreprintgreen Open Access

Authors

Abstract

With ATR, the First Open Detection Standard for AI Agent Threats The deployment of autonomous AI agents — systems that reason, use tools, maintain memory, and coordinate with other agents — has created a novel category of security challenge. Unlike traditional software, AI agents are non-deterministic systems that process attacks and legitimate instructions through the same channel, in the same language, in the same context space. The foundational assumption of 40 years of cybersecurity — that you can separate control flow from data flow — no longer holds. This paper presents a comprehensive analysis of the AI agent security landscape based on empirical evidence from 30+ CVEs, multiple distinct attack classes, 7 published benchmarks, and multiple real-world production incidents in 2025–2026. Key Contributions Trust Pillar Analysis: We demonstrate that AI agents are missing two of three foundational security trust pillars (authorization and integrity), and that this architectural gap — not model weakness — is the root cause of agent vulnerability. Unified Attack Taxonomy: We provide the first unified attack taxonomy covering all 10 structural data flow points in AI agent architecture, showing that current defenses provide meaningful coverage for at most 1 of 10 points. Indirect injection through unmonitored channels achieves 36–98% attack success rates across state-of-the-art models, and more capable models are more susceptible to tool-layer attacks. Scaling Dynamics: Attack surfaces grow superlinearly with capability, security measures have a half-life of 3–6 months, and offense is outpacing defense — making static defenses (guardrails, prompt hardening, sandboxes) structurally inadequate. Architectural Requirements: We propose requirements for a defense system that can keep pace with AI evolution and argue that the industry needs an open detection standard analogous to Sigma (for SIEM) or Snort (for network IDS). ATR Validation: We present Agent Threat Rules (ATR), the first open-source implementation of this standard — 61 YAML-based detection rules across 9 threat categories, benchmarked at 99.4% precision and 39.9% recall against the external PINT dataset (850 samples). npm MCP Ecosystem Scan A complete scan of the npm MCP ecosystem (2,386 packages, 35,858 tool definitions) revealed that 49% of packages contain security findings, with 27% rated HIGH or CRITICAL. Resources Open-source project: github.com/Agent-Threat-Rule/agent-threat-rules 59 references, 9 sections License: CC BY 4.0

Topics & Keywords

Adversarial Robustness in Machine Learning Security and Verification in Computing Explainable Artificial Intelligence (XAI)

Publication Details

Published in: Zenodo (CERN European Organization for Nuclear Research)

DOI: 10.5281/zenodo.19178003

Command Palette

The Collapse of Trust: Security Architecture for the Age of Autonomous AI Agents

Authors

Abstract

Topics & Keywords

Publication Details