Critical Information Only: A Content Privacy-Preserving Framework for Detecting Audio Deepfakes

20250 citationsJournal Article

Authors

Xinfeng Li · Nanyang Technological University

Yifan Zheng · Zhejiang Lab

Yan Chen · Zhejiang Lab

Kai Li · Tsinghua University

Chang Zeng · National Institute of Informatics

Xiaoyu Ji · Zhejiang Lab

Wenyuan Xu · Zhejiang Lab

Abstract

Text-to-Speech (TTS) and Voice Conversion (VC) models have exhibited remarkable performance in generating realistic and natural audio. However, their dark side, audio deepfake poses a significant threat to both society and individuals. Existing countermeasures largely focus on determining the genuineness of speech based on complete original audio recordings, which however often contain private content. This oversight may refrain deepfake detection from many applications, particularly in scenarios involving sensitive information like business secrets. In this paper, we propose SafeEar, a novel framework that aims to detect deepfake audios without relying on accessing the speech content within. Our key idea is to devise a neural audio codec into a novel decoupling model that well separates the semantic and acoustic information from audio samples, and only use the acoustic information (e.g., prosody and timbre) for deepfake detection. In this way, no semantic content will be exposed to the detector. To overcome the challenge of identifying diverse deepfake audio without semantic clues, we enhance our deepfake detector with real-world augmentation, such as codecs and reverbs. Extensive experiments conducted on five benchmark datasets demonstrate SafeEar's effectiveness in detecting various deepfake techniques with an equal error rate (EER) down to 2.41%. Simultaneously, it shields f ive-language speech content from being deciphered by both machine and human auditory analysis, demonstrated by word error rates (WERs) all above 93.74% and our user study. Furthermore, our benchmark constructed for anti-deepfake and anti-content recovery evaluation helps provide a basis for future research in the realms of audio privacy preservation and deepfake detection.

Topics & Keywords

Digital Media Forensic Detection Music and Audio Processing Music Technology and Sound Studies

Publication Details

Published in: IEEE Transactions on Dependable and Secure Computing

Volume 23, Issue 2, pp. 2165-2182

DOI: 10.1109/tdsc.2025.3624972

Field-Weighted Citation Impact: 0.00

Command Palette

Critical Information Only: A Content Privacy-Preserving Framework for Detecting Audio Deepfakes

Authors

Abstract

Topics & Keywords

Publication Details