Shawqi Al-Maliki

and 3 more

Protecting the privacy of personal information, including emotions, is essential, and organizations must comply with relevant regulations to ensure privacy. Unfortunately, some organizations do not respect these regulations, or they lack transparency, leaving human privacy at risk. These privacy violations often occur when unauthorized organizations misuse machine learning (ML) technology, such as facial expression recognition (FER) systems. Therefore, researchers and practitioners must take action and use ML technology for social good to protect human privacy. One emerging research area that can help address privacy violations is the use of adversarial ML for social good. Evasion attacks, which are used to fool ML systems, can be repurposed to prevent misused ML technology, such as ML-based FER, from recognizing true emotions. By leveraging adversarial ML for social good, we can prevent organizations from violating human privacy by misusing ML technology, particularly FER systems, and protect individuals’ personal and emotional privacy. In this work, we propose an approach called Chaining of Adversarial ML Attacks (CAA) to create a robust attack that fools misused technology and prevents it from detecting true emotions. To validate our proposed approach, we conduct extensive experiments using various evaluation metrics and baselines. Our results show that CAA significantly contributes to emotional privacy preservation, with the fool rate of emotions increasing proportionally to the chaining length. In our experiments, the fool rate increases by 48% in each subsequent chaining stage of the chaining targeted attacks (CTA) while keeping the perturbations imperceptible (ε = 0.0001).

Hassan Ali

and 6 more

Deep Learning (DL) algorithms have shown wonders in many Natural Language Processing (NLP) tasks such as language-to-language translation, spam filtering, fake-news detection, and comprehension understanding. However, research has shown that the adversarial vulnerabilities of deep learning networks manifest themselves when DL is used for NLP tasks. Most mitigation techniques proposed to date are supervised—relying on adversarial retraining to improve the robustness—which is impractical. This work introduces a novel, unsupervised detection methodology for detecting adversarial inputs to NLP classifiers. In summary, we note that minimally perturbing an input to change a model’s output—a major strength of adversarial attacks—is a weakness that leaves unique statistical marks reflected in the cumulative contribution scores of the input. Particularly, we show that the cumulative contribution score, called CF-score, of adversarial inputs is generally greater than that of the clean inputs. We thus propose Con-Detect—a Contribution based Detection method—for detecting adversarial attacks against NLP classifiers. Con-Detect can be deployed with any classifier without having to retrain it. We experiment with multiple attackers—Text-bugger, Text-fooler, PWWS—on several architectures—MLP, CNN, LSTM, Hybrid CNN-RNN, BERT—trained for different classification tasks—IMDB sentiment classification, fake-news classification, AG news topic classification—under different threat models—Con-Detect-blind attacks, Con-Detect-aware attacks, and Con-Detect-adaptive attacks—and show that Con-Detect can reduce the attack success rate (ASR) of different attacks from 100% to as low as 0% for the best cases and =70% for the worst case. Even in the worst case, we note a 100% increase in the required number of queries and a 50% increase in the number of words perturbed, suggesting that Con-Detect is hard to evade.