Search for a command to run...
Machine learning (ML) suffers from a persistent and critical flaw: adversarial examples. Many new forms of adversarial example attacks have been invented and many narrow defenses have been proposed. Unfortunately, no defensive approach can withstand current attacks. We hypothesize that ML model robustness can be improved with approaches that delineate the data-point-sparse latent space between data-dense regions of a model’s classification space as a barrier class. We introduce one such defense, PadNet, that builds a barrier class using a combination of training samples that mix multiple classes together. It leverages this barrier class to separate decision boundaries between benign classes with regions of padding. PadNet then implements a gradient regularization strategy that penalizes gradients in the direction of the barrier class, causing the decision boundary to draw tighter around training samples increasing boundary thickness between classes. We evaluate PadNet against a sampling of the most effective state-of-the-art attacks, demonstrating that it offers significant robustness and reliability compared to current defenses. We also test it against adaptive attacks and find that PadNet remains robust against them.
Published in: ACM Transactions on Privacy and Security
Volume 29, Issue 2, pp. 1-26
DOI: 10.1145/3799889