HaS-Net: A Heal and Select Mechanism to Securely Train DNNs against
Backdoor Attacks
Abstract
We have witnessed the continuing arms race between backdoor attacks and
the corresponding defense strategies on Deep Neural Networks (DNNs).
However, most state-of-the-art defenses rely on the statistical
sanitization of inputs or latent DNN representations to
capture trojan behavior. In this paper, we first challenge the
robustness of many recently reported defenses by introducing a novel
variant of the targeted backdoor attack, called low-confidence
backdoor attack. Low-confidence attack inserts the backdoor by
assigning uniformly distributed probabilistic labels to the poisoned
training samples, and is applicable to many practical scenarios such as
Federated Learning and model-reuse cases. We evaluate our attack against
five state-of-the-art defense methods, viz., STRIP, Gradient-Shaping,
Februus, ULP-defense and ABS-defense, under the same threat model as
assumed by the respective defenses and achieve Attack Success Rates
(ASRs) of 99\%, 63.73%, 91.2%, 80% and 100%,
respectively. After carefully studying the properties of the
state-of-the-art attacks, including low-confidence attacks, we present
HaS-Net, a mechanism to securely train DNNs against a number of
backdoor attacks under the data-collection scenario. For this purpose,
we use a reasonably small healing dataset, approximately 2% to 15% the
size of training data, to heal the network at each iteration. We
evaluate our defense for different datasets—Fashion-MNIST, CIFAR-10,
Celebrity Face, Consumer Complaint and Urban Sound—and network
architectures—MLPs, 2D-CNNs, 1D-CNNs—and against several attack
configurations—standard backdoor attacks, invisible backdoor attacks,
label-consistent attack and all-trojan backdoor attack, including their
low-confidence variants. Our experiments show that HaS-Nets can
decrease ASRs from over 90% to less than 15%, independent of the
dataset, attack configuration and network architecture.