PadNet: Defending Neural Networks Against Adversarial Examples

20260 citationsJournal Articlehybrid Open Access

Authors

Armon Barton · Naval Postgraduate School

Matthew Wright · Rochester Institute of Technology

Shaikh Akib Shahriyar · Rochester Institute of Technology

Edgar Jatho · United States Naval Academy

Mohammad Saidur Rahman · Rochester Institute of Technology

Kantha Girish Gangadhara · Rochester Institute of Technology

Jiang Ming · Tulane University

Abstract

Machine learning (ML) suffers from a persistent and critical flaw: adversarial examples. Many new forms of adversarial example attacks have been invented and many narrow defenses have been proposed. Unfortunately, no defensive approach can withstand current attacks. We hypothesize that ML model robustness can be improved with approaches that delineate the data-point-sparse latent space between data-dense regions of a model’s classification space as a barrier class. We introduce one such defense, PadNet, that builds a barrier class using a combination of training samples that mix multiple classes together. It leverages this barrier class to separate decision boundaries between benign classes with regions of padding. PadNet then implements a gradient regularization strategy that penalizes gradients in the direction of the barrier class, causing the decision boundary to draw tighter around training samples increasing boundary thickness between classes. We evaluate PadNet against a sampling of the most effective state-of-the-art attacks, demonstrating that it offers significant robustness and reliability compared to current defenses. We also test it against adaptive attacks and find that PadNet remains robust against them.

Topics & Keywords

Adversarial Robustness in Machine Learning Explainable Artificial Intelligence (XAI)Ethics and Social Impacts of AI

UN Sustainable Development Goals

Peace, Justice and strong institutions

Publication Details

Published in: ACM Transactions on Privacy and Security

Volume 29, Issue 2, pp. 1-26

DOI: 10.1145/3799889

Field-Weighted Citation Impact: 0.00