Facial Privacy Preservation using FGSM and Universal Perturbation attacks

Recent research has established the possibility of deducing soft-biometric attributes such as age, gender and race from an individual’s face image with high accuracy. Many techniques have been proposed to ensure user privacy, such as visible distortions to the images, manipulation of the original image with new face attributes, face swapping etc. Though these techniques achieve the goal of user privacy by fooling face recognition models, they don’t help the user when they want to upload original images without visible distortions or manipulation. The objective of this work is to implement techniques to ensure the privacy of user’s sensitive or personal data in face images by creating minimum pixel level distortions using white-box and black-box perturbation algorithms to fool AI models while maintaining the integrity of the image, so as to appear the same to a human eye.


INTRODUCTION
The swift growth of the Information Age has surpassed and neglected the legal infrastructure for protecting an individual's privacy. While the existing laws are limited to protect the privacy of citizens in most countries around the world with respect to "analog" data. Inclusion of digital data is fairly difficult due to the fact that it can be easily replicated, shared and even stolen. Even in cases where laws do exist, for example, the European Union General Data Protection Regulation, it is difficult to enforce when data can be transferred to other legal jurisdictions. Image data is particularly concerning because of its invasive applications and the lack of protection from these practices in existing laws [2]. Due to the absence of strict policies to safeguard the digital privacy rights of a person, time has come where the users themselves have to take up precautions and necessary measures to protect their privacy as other protection measures are not in place. A common feature to be focused on in several online social media platforms is the photo tagging. Most open source public applications that offer media storage services like Google, Facebook and Flickr, make use of facial recognition tools to tag different individuals in the photos. Though, this is an attraction to some consumers, it is a great risk of privacy for many others. This paper discusses the possible approaches to protect user privacy by impersonating the face which is the most sensitive and unique feature of an individual.
Many techniques have been proposed to ensure user privacy, such as visible distortions to the images, manipulation of the original image with new face attributes or face swapping, addition of special features like hats, glasses, a beard, a different smile, etc. Though these techniques achieve the goal of user privacy by fooling face recognition models, they don't help the user to upload original images without distortions or manipulation. Techniques that provide pixel level changes, without distorting the original image achieve better results with the aid of Adversarial Machine Learning algorithms that look at how neural networks can be fooled by feeding them with deceptive inputs. It also ensures that the originality of the image will be maintained.
This paper proposes two models, a black box model that involves generation of a universal perturbation for the image using a DeepFool algorithm and a white box model which involves an FGSM attack on the image for generating a noise mask for the image. The results of all these algorithms are tested against face recognition models to ensure proper misclassification.

LITERATURE SURVEY
By definition, facial identification is a classification task. Mere classification only permits a fixed number of output classes in the network, which is obviously impractical for facial recognition because the network would have to be re-trained every time a new person was to be added to the database. The solution is to transition from a classification task to a more regression-like task with networks that generate meaningful representations of faces in the form of numerical vectors. There are various black-box approaches proposed like the GenAttack-with gradient free optimization which uses a lesser number of queries to form the adversarial image compared to zeroth order optimization (ZOO). Here, genetic algorithms were used for synthesising adversarial examples. Work has been done on MNIST, CIFAR-10, ImageNet datasets, but nothing has been done specifically for human faces. In particular for faces, work has been done by generating non-invasive noise masks to apply to facial images for a newly introduced user, yielding adversarial examples and preventing the formation of identifiable clusters in the embedding space. The algorithms proposed are executed in a white-box environment, which may not always be accessible and also it doesn't include application of feature extractor in facial recognition models. [7] Semi-adversarial networks are another way of persevering attributes of faces. Generally there is a possibility of deducing soft-biometric attributes such as age, gender and race from an individual's face image with high accuracy. This raises privacy concerns and to tackle this scenario, a technique was developed for transfering soft biometric privacy to face images via an image perturbation methodology which also gives the user the choice to obfuscate specific attributes from the input face image. Though the idea was commendable, the results were not quite satisfactory. The modified images from the proposed model had some artifacts. As a result, a human observer will be able to distinguish between perturbed face images and non-modified one. [8] With an effort to address privacy issues systematically, balance usability, and enhance privacy in a natural and measurable manner a framework-AnonymousNet was proposed. The stack involves 4 stages namely facial attribute estimation, privacy-metric-oriented face obfuscation, directed natural image synthesis, and adversarial perturbation. But the limitation here was evaluation of perturbation performance among different deep neural network-based detectors qualitatively and quantitatively, and was ignored, due limitations in space and computational resources. [9] Another creative approach was proposed which let users add minor pixel-level changes ("cloaks") to their own photos which did not affect the visual anatomy of the image. When used to train facial recognition models, these "cloaked" images produce functional models that consistently cause normal images of the user to be misidentified. But it was accurate only for the Microsoft Azure API. [10] Doubly Permuted Homomorphic Encryption is another way of achieving data privacy. Here, the framework is designed to aggregate multiple classifiers updated locally using private data and to ensure that no private information about the data is exposed during and after its learning procedure. They utilize a homomorphic cryptosystem that can aggregate the local classifiers while they are encrypted and thus kept secret. By using homomorphically encrypted locally-updated classifiers, the aggregator can average them while ensuring that the classifiers never expose private information about the trained. But, it is primarily focused exclusively on the learning of linear classifiers (like SVM). A promising direction for the future work is learning much higher-dimensional models like sparse convolutional neural networks. [11] GAN models for learning private and fair representations were proposed, which uses adversarial learning to allow a data holder to learn universal representations of a given dataset that decouple a set of sensitive attributes from the rest of the dataset. It involves modifying the training data that decouple a set of sensitive attributes from the non-sensitive ones. But, this model hasn't been tested against a large dataset. The size of the dataset may affect the convergence speed of the decorrelation schemes. [12] Not much work has been dedicated completely for personal data theft, but various approaches have been proposed to achieve data privacy, but each one of these approaches have had some limitations, either the dataset or accuracy etc. Hence, observing the gaps in the works already done gives a fair idea as to what improvements can be done. [13] III. OBJECTIVES AND PROPOSED METHODOLOGY Adversarial machine learning can be used to implement techniques that are cyber attacks by nature,which can be categorised based on the resources available to the attacker: 1. Create pixel level distortions to images. When these images whose pixels are subjected to minor changes are used to train facial recognition models, they produce weights that regularly result in misclassification of normal images of the user, thus imparting privacy. This model will be a white box implementation.
2. Generate non-invasive noise masks to apply to facial images for a newly introduced user and hence prevent the formation of identifiable clusters in the embedding space. Such protection is offered through a system that alters images in a manner that is indistinguishable to the human eye without having prior knowledge about the facial recognition model architecture, and yet large enough to cause misclassifications. This model will be a black box implementation. [3] A. FAST GRADIENT SIGN METHOD (FGSM) -WHITE BOX ATTACK Any adversarial attack is termed as a white-box if the weights, loss functions and other hyper-parameters of target Neural network model are known to the attacker in prior, that will be used for performing the attack.
The fast gradient sign method operates by using the gradients of the neural network to create an adversarial example. For an input image, it calculates the gradients of the loss with respect to the input image which will produce a perturbation noise matrix. Using this matrix, a new adversarial image will be created which gets misclassified by the model. This can be summarised using the following expression as shown in eqn 1 [15]: where advₓ : Adversarial image.
: Original input image. The system architecture explains the workflow of the White box model attack. For a given dataset, the faces present in the images will be extracted using the MTCNN model. This data is used to train a facial classifier.

Fig 1: System architecture of White box model
The trained classifier's Hyperparameters are known to the attacker. Using the loss gradient of the predicted class and the input given to the classifier, FGSM generates a perturbation, which when added to the Image, adversarial properties are generated. The workflow of the process is shown in fig 1.

B. BLACK BOX ATTACK -UNIVERSAL PERTURBATIONS
The adversarial attacks are called the black-box attacks in case no information about the target CNN is available. [4][5] Universal perturbations proposes a method for estimating a single perturbation matrix which when added to any image from a particular dataset, the image transforms into an adversary. This perturbation is termed as universal, because it represents a fixed image-agnostic perturbation that misclassifies an entire dataset of images belonging to µ. The focus here is on the case where the distribution µ represents the set of natural images, hence containing a huge amount of variability. The goal is to find perturbation( ) that satisfies the following two constraints as shown in eqn 2 and eqn 3 [16]: The parameter ξ controls the magnitude of the perturbation vector , and δ quantifies the desired fooling rate for all images sampled from the distribution. Let X = {x1, . . . , xm} be a set of images sampled from the distribution µ. The proposed algorithm calculates a universal perturbation , such that |v| ≤ ξ, while fooling all/most data points in X. The algorithm proceeds in a loop over the images in X and gently updates the universal perturbation. At each iteration, the minimal perturbation ∆ i transmits the current perturbed point, xi+ , classifier's decision boundary is computed, and summed up to the current sample of the universal perturbation using the update rule as shown below.
The quality of universal perturbation can be improved with several iterations on the data set X. The algorithm is terminated when the empirical "fooling rate" on the perturbed data set X := {x1 + , . . . , xm + } exceeds the target threshold 1 − δ. That is, we stop the algorithm whenever eqn 5 [16] is met.
The system architecture explains the workflow of the Black box model attack. For a given dataset, an initial perturbation is randomly generated and based on its misclassifying rate with respect to any face classifier available, a universal perturbation is estimated.

A. Fast Gradient Sign Method
Following is the detailed structural representation of the workflow of White box model. It consists of a MTCNN model for face extraction followed by a custom face classifier and the FGSM attack. The core step of the attack is performing FGSM Attack with the gradients of the loss w.r.t the input image to create an adversarial image that maximises the loss and eventually causes misclassification. A perturbation matrix is calculated by finding out how much each pixel of the image contributes to loss value. Using the chain rule to find required gradients and loss of each input pixel of image makes this process very fast and easy. Fig  7 shows the expected result after an FGSM attack is done on the facial recognition model.

B. Universal Perturbation
The structure of the facial classifier remains the same as explained in the FGSM attack. The blacbox attack here is done on the dataset directly instead of on the model. Model's architecture is not known prior to the attack. Universal perturbations proposes a method for estimating a single perturbation matrix which when added to any image from a particular dataset, the image transforms into an adversary. The focus here is on the case where the distribution µ represents the set of natural images, hence containing a huge amount of variability.
In that context, possible examination of the existence of small universal perturbations that misclassify most images is done. Fig 8 shows the algorithm used to derive the perturbation matrix.  The FGSM approach, by far, produces the best results as the architecture of the facial recognition model is known and used to perform the attack. Thus, the facial recognition model whose architecture uses neural networks can easily be fooled with an accuracy of almost 100% can be achieved. Greater the difference in probabilities of prediction that a particular image belongs to a particular class, better is the performance of the approach. Hence, it shows that image is properly misclassified by fooling the facial recognition model.
The Universal Perturbation matrix generated by this approach when applied to the faces of the dataset successfully misclassifies upto 64% of the faces with a single perturbation matrix with 49 iterations. The number of iterations run is proportional to the degree of misclassification. The results can be seen in the tables.   Deep learning algorithms are used in a variety of fields including data analytics, successfully solving a variety of problems such as image classification, natural language processing, and prediction of consumer behavior. The triumph of these algorithms pivots on the accessibility of large image datasets, which most probably contains sensitive data about the subject that might facilitate learning models to inherit societal biases leading to unintended algorithmic discrimination on legally protected groups such as race or gender [17] [18]. This has led to growth of research on transforming sensitive data to fair and private representations.

Input
The techniques in this paper provide a new approach to handle privacy issues. The perturbations generated through white box and black box approaches can fool the neural network and achieve user privacy. Though, more work has to be done in generalising these approaches to the benchmarked facial recognition systems, the work in this paper provides a starting point for the same!