Abstract
Deep neural networks for image classifi cation are well-known to be
vulnerable to adversarial attacks. One such attack that has garnered
recent attention is the adversarial backdoor attack, which has
demonstrated the capability to perform targeted misclassifi cation of
specifi c examples. In particular, backdoor attacks attempt to force a
model to learn spurious relations between backdoor trigger patterns and
false labels. In response to this threat, numerous defensive measures
have been proposed; however, defenses against backdoor attacks focus on
backdoor pattern detection, which may be unreliable against novel or
unexpected types of backdoor pattern designs. We introduce a novel
re-contextualization of the adversarial setting, where the presence of
an adversary implicitly admits the existence of multiple database
contributors. Then, under the mild assumption of contributor awareness,
it becomes possible to exploit this knowledge to defend against backdoor
attacks by destroying the false label associations. We propose a
contributor-aware universal defensive framework for learning in the
presence of multiple, potentially adversarial data sources that utilizes
semi-supervised ensembles and learning from crowds to fi lter the false
labels produced by adversarial triggers. Importantly, this defensive
strategy is agnostic to backdoor pattern design, as it functions without
needing—or even attempting—to perform either adversary identifi
cation or backdoor pattern detection during either training or
inference. Our empirical studies demonstrate the robustness of the
proposed framework against adversarial backdoor attacks from multiple
simultaneous adversaries.