Feature Augmentation for Adversarial Robustness
preprintposted on 21.04.2022, 19:14 by Yue ZhuoYue Zhuo, Zhiqiang Ge
Adversarial attack is to craft tiny perturbations on inputs, causing neural networks to give incorrect outputs with high confidence, while adversarial training is the de facto most successful method to obtain robust neural networks. Nevertheless, adversarial training suffers from robust overfitting: the increasing robustness gap between train and test datasets. This paper aims at reducing this robust overfitting (i.e., improving adversarially robust generalization / generalized adversarial robustness). Firstly, we study the robust bias-variance trade-off and find that the robust overfitting even happens on the simple dataset (100MNIST), which is considered free from that in previous research. Next, based on 100MNIST and Gaussian data, the correlation between feature augmentation and robust overfitting is analyzed on a theoretical basis. A novel insight is obtained: the augmentation on robust features can improve generalized adversarial robustness, through alleviating the over-optimization on non-robust features. Then, we propose a regularization term, Feature Alignment (FA), to guide synthesized sample directions. Adversarial FA aided Generative Adversarial Network, AFAGAN, generates samples aligning robust features, which can train more adversarially robust generalized models. Experiments on CIFAR10 show that augmented data of AFAGAN significantly improves the generalized adversarial robustness.