HaS-Net: A Heal and Select Mechanism to Securely Train DNNs against Backdoor Attacks

Hassan Ali; Surya Nepal; Salil S. Kanhere; Sanjay K. Jha

doi:10.36227/techrxiv.16571184.v1

loading page

HaS-Net: A Heal and Select Mechanism to Securely Train DNNs against Backdoor Attacks

Hassan Ali ,
Surya Nepal ,
Salil S. Kanhere ,
Sanjay K. Jha

Abstract

We have witnessed the continuing arms race between backdoor attacks and the corresponding defense strategies on Deep Neural Networks (DNNs). However, most state-of-the-art defenses rely on the statistical sanitization of inputs or latent DNN representations to capture trojan behavior. In this paper, we first challenge the robustness of many recently reported defenses by introducing a novel variant of the targeted backdoor attack, called low-confidence backdoor attack. Low-confidence attack inserts the backdoor by assigning uniformly distributed probabilistic labels to the poisoned training samples, and is applicable to many practical scenarios such as Federated Learning and model-reuse cases. We evaluate our attack against five state-of-the-art defense methods, viz., STRIP, Gradient-Shaping, Februus, ULP-defense and ABS-defense, under the same threat model as assumed by the respective defenses and achieve Attack Success Rates (ASRs) of 99\%, 63.73%, 91.2%, 80% and 100%, respectively. After carefully studying the properties of the state-of-the-art attacks, including low-confidence attacks, we present HaS-Net, a mechanism to securely train DNNs against a number of backdoor attacks under the data-collection scenario. For this purpose, we use a reasonably small healing dataset, approximately 2% to 15% the size of training data, to heal the network at each iteration. We evaluate our defense for different datasets—Fashion-MNIST, CIFAR-10, Celebrity Face, Consumer Complaint and Urban Sound—and network architectures—MLPs, 2D-CNNs, 1D-CNNs—and against several attack configurations—standard backdoor attacks, invisible backdoor attacks, label-consistent attack and all-trojan backdoor attack, including their low-confidence variants. Our experiments show that HaS-Nets can decrease ASRs from over 90% to less than 15%, independent of the dataset, attack configuration and network architecture.