Search for a command to run...
Adversarial examples, created by adding small distortions to audio, can fool neural network models and cause automatic speech recognition (ASR) systems to produce incorrect outputs. Most current detection methods rely on the recognition capabilities of ASR systems. As a result, they often fail to detect such examples when ASR performance degrades or when facing an evasion attack—referred to in this paper as a “partial adversarial attack”—that is specifically designed to bypass ASR models. In this paper, we identify a distinct noise energy difference between adversarial examples and their original audio. Moreover, this noise energy difference is typically greater than that between adversarial examples and their re-attacked examples. This finding leads us to propose a novel detection method that fundamentally departs from traditional approaches by operating independently of ASR systems. The proposed method employs a detection strategy that involves re-attacking the input audio and identifies adversarial examples by characterizing the noise energy difference before and after re-attack without relying on ASR systems. The experimental results demonstrate that the proposed method detects various state-of-the-art adversarial attacks. Compared with the baselines, the proposed method achieves substantially better detection performance across standard adversarial examples, noisy adversarial examples, and partial adversarial examples.