SecureDL: A privacy preserving deep learning model for image recognition over cloud

The key benefits of cloud services, such as low cost, access flexibility, and mobility, have attracted users worldwide to utilize the deep learning algorithms for developing computer vision tasks. Untrusted third parties maintain these cloud servers, and users are always concerned about sharing their confidential data with them. In this paper, we addressed these concerns for by developing SecureDL, a privacy-preserving image recognition model for encrypted data over cloud. Additionally, we proposed a block-based image encryption scheme to protect images’ visual information. The scheme constitutes an order-preserving permutation ordered binary number system and pseudo-random matrices. The encryption scheme is proved to be secure in a probabilistic viewpoint and through various cryptographic attacks. Experiments are performed for several image recognition datasets, and the achieved recognition accuracy for encrypted data is close with non-encrypted data. SecureDL overcomes the storage, and computational overheads occurred in fully-homomorphic and multi-party computations based secure recognition schemes.


Introduction
Over the last decade, numerous state-of-the-art algorithms for various artificial intelligence tasks [1,2,3], utilizing convolutional neural networks (CNN) and long short term memory (LSTM), have been proposed. Besides their high performances, these systems are time-consuming and require high-configuration computing resources such as GPUs. To facilitate users, the cloud service providers, hereafter abbreviated as CSP, have developed Infrastructure-as-a-Service (IaaS), which constitutes deep learning (DL) frameworks and the respective resources. Azure Machine Learning 1 and Google Prediction API 2 are few of CSPs. In these services, users have to share their private data with CSP to learn on-demand DL services. In this protocol, both user and CSP will get benefited. However, user data may contain sensitive and confidential information such as patient's medical reports or organization's legal documents, whose leakage can drastically affect the functioning and reputation of the underlying individual/organization 3 . Hence, along with the superior benefits offered by cloud infrastructures, users are also concerned to utilize the cloud services while preserving their data confidentiality.
Various image encryption schemes [4,5] have been proposed to deal with the problem of secure storage over the cloud servers as an adversary cannot reveal any information from the encrypted form. However, processing an encrypted image to achieve the same results as that of plain (non-encrypted) images poses a challenging problem because encryption schemes distort pixels inter-correlation. This paper focuses on proposing a system for learning an image recognition model using DL algorithms over the cloud without revealing image-data information.
In recent years, researchers have proposed algorithms for privacy-preserving image recognition over the cloud. These algorithms utilize multi-party computation (MPC) protocols or fully homomorphic encryption (FHE) techniques. In MPCs, a CSP provides a recognition model to multiple parties to individually train the model using their private data, followed by aggregation of encrypted weights over the CSP. This protocol may perform several times, and then the final model gets available to each party. In this way, each party will get a trained recognition model without compromising with their sensitive data. In [6,7,8], CSP provides a recognition model, pretrained over a local data by CSP, to all parties to iterate one time over their private data. The resulting weights are encrypted by each party using a homomorphic encryption scheme and transmitted to CSP. Then, the CSP process, for instance, aggregates these encrypted weights utilizing the homomorphic property of encryption schemes and transmits the processed weights to each party. Each party then performs the second iteration and transmits parameters to CSP, and this loop continues until the terminating condition is achieved. These schemes preserve the data information efficiently; however, they exponentially increase the communication overheads between parties and CSP. Moreover, users have to maintain the computing resources, which increase the computation overheads at the user-end, henceforth contradicts CSPs' flexible services. The schemes proposed in [9,10] utilize FHE technique [11] for encrypting user data which reduces the communication overhead between user and CSP, in comparison with MPCs. However, these models increase the data size [12] drastically; hence, the storage and computation overheads make it unsuitable for the recent era of technology such as cloud-based services and Internet-of-things environments.
In existing works, we observe two possible solutions for accomplishing image recognition tasks while preserving data privacy. In the first solution, the data can be encrypted using existing homomorphic encryption schemes, and new image recognition models can be proposed for these data, as presented in [9,10]. In the second scenario, a new image encryption scheme can be proposed, which secures data information. The existing state-of-the-art recognition models, for instance VGG16 [13] and ResNet50 [14], can be used for learning model parameters.
In this work, we follow-up the later solution and propose an image encryption scheme that partially preserves the trichotomous property over the set of image pixels intensity values in encrypted form. The main contribution of SecureDL are as follows -1. A novel symmetric encryption scheme based on the paradigm of substitutionpermutation (SP) is proposed, constituting a permutation ordered binary (POB) number system followed by block-permutation matrices.
The POB transforms each 8-bits gray-level intensity value of the input image to n-bits value. Then, the resulting transformed image is parti-tioned into non-overlapping sub-blocks, and each sub-block is permuted through a different permutation matrix. Pseudo-random number generators generate the permutation matrices. Finally, the permuted subblocks of the image are re-positioned to their respective positions.
2. The proposed encryption scheme is proved to be secure from a probabilistic point-of-view. The security of the scheme is evaluated through histogram attack, pixel-correlation attack, and image information sensitivity.
3. The computational time required to encrypt an 299 × 299-dimensional color image using blocks-size of 13 × 13 is 0.4069 second only. The storage space of the encrypted image is the same as that of the original image. These computational and storage overheads are very less as compared with the existing image encryption schemes. Moreover, Se-cureDL requires single round of interaction between the user and CSP (user→ CSP → user), which make it suitable candidate for real-time cloud-based services.
4. The performance of SecureDL is evaluated over state-of-the-art image recognition models such as VGG16, ResNet50, NasNet and InceptionV3 for public datasets namely, Kaggle's Dogs Vs. Cats, ISIC dermoscopy (malignant and benign recognition), Linnaeus 5 dataset, and CIFAR-10. The recognition results achieved over encrypted data are close to that of plain (unencrypted) data.
Organization: Section 2 provides an overview of the related work of privacy preserving schemes. Section 3 is devoted for preliminaries, followed by the proposed methodology and proof-of-security in Sections 4 and 5 respectively. In Section 6, analytical security analysis of proposed scheme over various cryptographic attacks is discussed. The experimental recognition results and performance analysis of SecureDL are explained in Section 7 and Section 8 respectively. Finally, Section 9 concludes the paper and discusses the future work.

Related Work
Over the recent decades, researchers have proposed various privacy-preserving schemes for processing multimedia [15,16,17,18]. We discuss a glimpse of privacy-preserving models for image processing and other Machine Learning (ML) tasks followed by image recognition models, where our scheme lies. A detailed survey for preserving visual data was presented by Padilla-López et al. [19].
The existing privacy-preserving systems address the tasks of enhancement, and feature extraction [15,20,21] over the encrypted domain (ED). The scheme proposed by Lathey et al. [15,20] is efficient for noise removal, edge sharpening, unsharp masking, and anti-aliasing from an image in ED. The image is secured through the (N, K)-threshold Shamir Secret Sharing (SSS) scheme. Tanwar et al. [21] proposed a privacy-preserving algorithm for image scaling over the cloud in which the image is encrypted by ramp SSS. They utilized 2D-Bicubic interpolation to interpolate the new pixel intensity values over cloud shares. Yan and Kankanhali [22] proposes a novel scheme for extracting and locating face regions in the ED. The information of the image is preserved using coefficients of discrete cosine transform. Rahulamathavan et al. [23] presented a facial expression recognition system in ED using local fisher discriminant analysis. The input image is encrypted using the Paillier cryptosystem. Further, privacy-preserving models for image denoising based on wavelet transformation over SSS scheme [24], Ring Learning with Errors cryptosystem [25], and Non-Local Means with Paillier cryptosystem are proposed. Researchers have proposed privacy-preserving models for performing clustering in ED for vertically [16,26] and arbitrary [27,28] partitioned data. Zhang et al. [29] proposed a possibilistic c-means algorithm for fuzzy clustering of FHE encrypted big data. Xing et al. [30] proposed a two-stage protocol for privacy-preserving k-means clustering based on Paillier cryptosystem over social networking big-data.
Towards privacy-preserving image recognition models, various schemes have been presented by Ma et al [6], Yonetani et al. [7], Xie et al. [9], Gilad et al. [10], Orlandi et al. [17], Bost et al. [18] and, Shokri and Shmatikov [31]. Xie et al. [9] presented a protocol for learning the neural networks over encrypted data. Authors approximated the non-linear activation functions to polynomials and discussed the theoretical aspects to learn and inference for encrypted data. However, the protocol was not tested for any standard dataset to analyze the proposed scheme's robustness. Gilad et al. [10] proposed an FHE-scheme for encrypting gray-scale images and training a basic neural network architecture. They evaluated their scheme over the MNIST dataset only. The utilization of FHE techniques exponentially increases the computation overhead making these schemes impractical for real-time appli-cations. In [17], an interactive protocol between the model and data owners for evaluating the activation functions in the neural network is presented using Homomorphic Encryption (HE). Bost et al. [18] proposed a scheme for Naive Bayes, Decision Trees, and Hyperplane Decision classifiers. The authors combined FHE without bootstrapping, Quadratic Residuosity, and Paillier cryptosystems for encrypting data.
This paper developed a privacy-preserving image recognition system to learn a DL framework for encrypted data over the cloud. A block-based image encryption scheme is proposed to protect each image's visual information in the user's data.

Preliminaries
In this section, we discuss the theoretical foundations used to understand the proposed method. The first layer in DL frameworks is the convolutional layer containing a class of different filters/kernels. In contrast with NN, whose input is a vector, input for the first layer of CNN-based architectures is a normalized raw image. These filters operate on the input points for extracting the spatial features.

Permutation ordered binary (POB) number system
Sreekumar and Sundar [32] introduced the generalized form of number system which depends upon two positive integers n and k where n > k.
Specifically, for fixed n and k (n > k), each value lies in 0, 1, ..., n k − 1 can be transformed into n-bit binary string with 1 at exactly k positions and 0 at the remaining n − k positions. The mathematical formulation of POBnumber for a positive integer i with parameters n and k (n > k) is given as P n,k num (i) = r n−1 r n−2 ...r 0 , where r i is either 0 or 1. Further, the POB-number P n,k num (i) is converted into a positive integer known as POB-value, denoted by P n,k val (i) and defined as It is noted that the POB-number and POB-value for positive integer i (Eq. 1) will be different for different parameters as shown in Table 1. For instance, the POB-value for i = 5 with parameters (n = 9, k = 4) and (n = 35, k = 15) are 39 and 64511 respectively. Also, POB number system follows the partial trichotomous property of positive integers (Defs. 3.1 and 3.1). For more details of POB system, the reader is advised to refer [32]. Definition 3.1. A set X of positive integers is said to be trichotomous if it satisfies the following property - x < y or x > y or x == y ∀ x, y X Definition 3.2. A function F defined over set X of positive integers is said to be partial trichotomous if - Deductions from POB number system 1. For fix parameters (n, k), n > k, there exists a unique POB-number and POB-value for each element of the set 0, 1, ..., n k − 1 .
2. POB is a partial trichotomous over the set of positive integers of finite cardinality.
3. For each set of parameters (n, k), there are exactly n k POB-values.

Proposed methodology
This section discusses the details of the proposed encryption scheme for encrypting user's image datasets and our protocol defined for developing a secure image recognition model.

Threat model
In our system, two entities are involved: a user or data owner U, who wishes to develop an image recognition model for personal data without revealing visual information contained in data to the third party entity, and a CSP C, who owns an image recognition DL framework M and provides it as a cloud service on a pay-as-you-go business model. We assume that CSP is "honest-but-curious or semi-honest," which means that CSP performs all the desired computations accurately but will try to learn the possible data information. Users' task is to encrypt the data at the local server and transmit the encrypted data to C. The CSP will perform all the computations (i.e., train the model M) over the users' transmitted encrypted data. Then, C will transmit the trained encrypted model to U for further inference purposes. Since M is trained over encrypted data; therefore, the learned parameters of M will also be in encrypted form. We assume that an adversary is an entity, present at cloud server, who can access users' encrypted data, weights, and hyper-parameters of the trained model M.

System overview
In the framework, U generates the secret encryption key-set K which are used to encrypt the data D before transmitting to C. The CSP trains and simulates the model M over user's encrypted data D enc . Finally, the encrypted weights of trained model M is sent back to the user for inference purposes. The pictorial representation of our proposed framework is depicted in Fig. 1. We assume that D contains gray-level images of dimension M ×N . The step-by-step description for developing image recognition model in the ED is defined below: Step 1 -Generation of Security Key-set (K): The encryption scheme defined in this work for encrypting image information requires a set of hyper-parameters values {n, k, K 1 , K 2 }, where (n, k), (n ≥ k) are parameters of POB number system (defined in Section 3.1) and K 1 × K 2 is the dimension of each block-permutation matrix. The values of n and k are generated using a pseudo-random number generator G whereas K 1 and K 2 are divisors of M and N respectively. Additionally, T permutation matrices say π = {π 1 , π 2 , ..., π T } each of dimension Specifically, the value T indicates the number of blocks in which U wishes to partition plain image I. Therefore, the secret-keys are K = {n, k, ρ 1 , ρ 2 , π 1 , ..., π T } which are generated by U only and does not share with C.
Step 2 -Formation of Encrypted Data: Here we explain the scheme for transforming plain data D into encrypted data D enc . The successful generation of secret-keys K = {n, k, ρ 1 , ρ 2 , π 1 , ..., π T } where (n ≥ k) and n k >> l, l is maximum intensity value of all M × N -dimensional gray-scale images in D, leads to data encryption process at user-end. The proposed encryption scheme is the two-stage process which provides high security to user's data.
In the primary stage, the POB-value corresponding to each intensity value I(u, v) at pixel (u, v) of image I in D is computed and substituted at pixel (u, v) of POB-space image, denoted by I P OB . Mathematically, In the second stage, POB-spaced image I P OB , is partitioned into T sub-images say I P OB 1 , I P OB 2 , ..., I P OB T each of dimension ρ 1 × ρ 2 in row/column roster method. Then, pixel values of t th sub-image I P OBt are shuffled using t th permutation matrix π t . The obtained sub-image is denoted by I πt P OBt in which the intensity value at pixel (u, v) is the intensity value of I P OBt at pixel-location given by π t (u, v). The mathematical equation is • indicates the composition of functions. Then, all shuffled sub-images I π 1 P OB 1 , I π 2 P OB 2 , ..., I π T P OB T are placed at the position of sub-images I P OB 1 , I P OB 2 , ..., I P OB T respectively to obtain the block-wise shuffled image I π P OB . The pseudo-code of the encryption scheme is presented in Algorithm 1.
Step 3 -Normalization of block-wise shuffled image I π P OB : As explained in Section 3.1, POB number system transforms the q-bit intensity value to n-bit POB value. In contrast with decimal number system, the difference between two consecutive POB values is non-uniform. For instance, let P x−1 , P x , P x+1 are three consecutive POB values for fixed parameters (n, r) with D x+1 = P x+1 − P x and D x = P x − P x−1 , then D x+1 may or may not be equal D x . Further, the POB values are large for bigger values of parameters (n, r). It exponentially increases the storage size of each image making the user's dataset D expensive in terms of transmission and storage. Also, it requires high computational resources for learning model parameters. Therefore, we normalize each image I π P OB over the range [0, l] to obtain the encrypted image I Enc as follows where min I π P OB and max I π P OB indicate the minimum and maximum value of image I π P OB for all 1 ≤ u ≤ M and 1 ≤ v ≤ N . In this manner, U can secure all the sensitive information contained in image dataset D. For an RGB-color image, Step 2 and 3 are performed over each color-channel simultaneously with same key-set. Now, U can transmit the encrypted data D enc to C for learning the parameters of recognition model over D enc . The toy-example of the encryption scheme over an RGB-image of dimension 4 × 4 is presented in Fig. 2.
We observed that the input image's security increases with a large value of T and vice-versa. The cloud server can know the total number of classes in D enc and the image associated with class number irrespective of image information. The deep learning-based recognition models belong to the class of supervised learning algorithms so, it is necessary to provide the class number associated with the input image. U does not share K with C instead can be destroyed after completion of encryption process.
Step 4 -Learning of Recognition Model (M): In this step, C performs an end-to-end training of model M over the encrypted data D enc . The values of M s learnable weights W and hyper-parameters H will be updated, automatically or manually, till the termination condition is achieved. Note that D enc is in the encrypted form, so the values H, W and predictions are also in the encrypted form and thus, M can be assumed to be secure. Therefore, C is unable to extract correct information related to model's parameter and predictions. We noticed that the (encrypted) trained model M Enc perform worst when validated over plain (un-encrypted) test data. It is worth to say that if an adversary utilizes the trained model M Enc over the plain-data to extract the class-related information, then the obtained inferences will be incorrect.
Step 5 -Transmission of Encrypted Results: After successfully learning the parameters, denoted as M (W, H), C transmits these parameters to U for inference purposes. As the model M Enc is trained over encrypted data D enc , therefore U have to encrypt the inference data with the same key-set K before computation.
In contrast with the existing encryption scheme, the proposed scheme partially preserves the trichotomous property of pixel intensity value along with encrypting image features using block-based shuffling matrices. Moreover, the existing encryption schemes exponentially increase the image size, requiring high storage and computational resources, making them incompatible for real-time scenarios. Besides the significant benefits, the scheme leaks some of the image information. However, we believe that a user may compromise with few shortcomings at the cost of a robust and efficient image recognition model with data privacy.

15:
for u ← 1 to ρ 1 do 16: for v ← 1 to ρ 2 do 17: return I Enc 29: end procedure 5. Proof-of-security of SecureDL The use of multiple CSPs by the user to process the private data increases data breaching threat as curious CSPs can perform colluding attacks by collaborating their data. Singh et al. [33] proposed encryption using the POB number system over Shamir's SSS, proving that the POB is robust for securing image data. In contrast, our encryption scheme does not share secret data over different CSPs. Hence, it secures user data efficiently and preserves it from colluding attacks. Furthermore, accessing a single CSP will make monetary benefits to the user.
Here, we will mathematically analyze the security strength of the combination of (n, k)-POB number system and permutation matrices π = {π 1 , π 2 , ..., π T }. As we choose T permutation matrices randomly, each of the dimensions say ρ 1 × ρ 2 , so we will prove the probability of selecting these T matrices is minuscule. If an adversary tries to find these T matrices through brute-force, then the probability of getting these matrices is negligible, and so the image information. Further, we will prove that an image encrypted through our proposed scheme using key K = {n, k, π 1 , π 2 , ..., π T } is probabilistic-secure. It means that the probability of extracting image content from its encrypted form is close to zero.
Theorem 5.1. The total number of permutation matrices each of dimension Proof. Let π t be an arbitrary permutation matrix of dimension ρ 1 × ρ 2 over the range [1, ρ 1 × ρ 2 ]. Matrix π t can be transformed into a row-vector through roaster-scanning manner say V πt , of length ρ 1 × ρ 2 . Then, V πt can be seen as a permutation obtained by rearranging the elements of set It is noted that interchanging the positions of a single-pair of values in π t , results new permutation matrix π j , thus changes the vector V pit to V pi j , t = j. Therefore, permutation vector (V πt ) is unique for each permutation matrix π t . The collection of all permutation matrices is equal to the set of all possible permutations over the set S. In the context of abstract group theory [34], the collection of all permutations over a finite set S forms a finite symmetric group G S [34]. Therefore, the total number of permutation matrices is equal to the order of the group G S  As p i is pixel intensity value, so 0 ≤ p i ≤ 255 ∀ i = 1, 2, ..., L.
Suppose a be an arbitrary pixel intensity value of I. Then, it is equiprobable that a can be any value from set A of cardinality L. The probability is given asprob (a = g) = 1 L , where g A Suppose q a be the POB value in B corresponding to pixel value a. Then, it is equiprobable that q a can be any value from the set B of cardinality Q. This probability is given as Since, the parameters n and k are sufficiently large, so they can be chosen such that L is very less than Q = n k − 1. It increases the range of POB values B for choosing q a corresponding to pixel intensity value a in set A.
(For instance, Ref. Corollary 2). Further, I is partitioned into T sub-images (matrices) say {I 1 , I 2 , ..., I T } each of dimension ρ 1 × ρ 2 . Also, I th t sub-image is permuted through π t . The probability of choosing a permutation matrix π t for I t is given as - It has been proved in Theorem 1 that total number of permutation matrices are (ρ 1 × ρ 2 ) ! and therefore, the probability of generating T permutation matrices of dimensions ρ 1 × ρ 2 , is given as Hence, the probability of extracting a pixel intensity value q a from an encrypted image I Enc corresponding to pixel value a in I is given as It can be observed through Eq. 13 that the probability of extracting pixel values is very small.

Security analysis
In this section, we present some commonly performed cryptographic attacks by an adversary. These attacks depend upon the quantitative values of the encrypted form of the input image.

Histogram attack
Histogram of a digital image refers to the graphical representation of its pixel intensity values, which significantly explains the contrast, brightness, and saturation effects of the image content. Histogram analysis is performed to illustrate the diffusion and confusion properties of the encrypted image in comparison with the plain image. In this attack, an adversary is known with the histograms corresponding to encrypted image I Enc , but unknown with the histogram of plain image I. The adversary manages to extract the histogram information of I. Therefore, the histograms corresponding to I and I Enc must be unrelated as much as possible. As an illustration, the histograms of secret image I of dimension 329×329 and its encrypted form are depicted in Figs. 3(a) and 3(e) respectively. The histograms of red-green-blue channels of I (Fig. 3(a)) are depicted in Fig. 3 (b)-(d). The encrypted image I Enc is obtained with POB-parameters n = 17, k = 13 and permutation matrices of dimensions ρ 1 = ρ 2 = 47. The histograms of respective channels are presented in Fig. 3 (f)-(h). Note that the frequency of pixel intensity values for I lies in the range [0, 1200], whereas in I Enc they lie in [0, 800]. It is clearly observed that histograms of corresponding channels of I and I Enc are drastically different. Therefore, if an adversary exercised to extract any information from the histogram of encrypted image, it will receive incorrect information.

Pixels-correlation attack
The correlations between the neighboring pixels in an image depict a statistical relevance [35]. The high correlation indicates the high-visual quality of an image, which helps to extract meaningful information. The resistance of the image information from an adversary is considered high when the correlation value is low for its encrypted form IEnc, which we will analyze here for SecureDL.
The correlation value (corr) for a gray-scale image I is evaluated by considering a set of K neighboring pixels say {(x 1 , y 1 ), (x 2 , y 2 ), ..., (x K , y K )}, as follows corr(x, y) = cov(x, y) where (x, y) designates the two neighboring pixels of a secret image and µ, σ 2 and cov(x, y) represent the mean, variance, and covariance, respectively. For a true-color image, corr is an average sum of correlation for each channel. corr is a real number over the range −1 and 1. The values "1" and " − 1" indicate perfect positive and negative correlation respectively, whereas "0" represents no relationship between adjacent pixels depicting highly uncorrelated. Experimentally, corr is evaluated in each of diagonal, horizontal and vertical axes through randomly selected K = 2000 adjacent pixel pairs, over a standard Lena image I, of dimension 256 × 256. The keys considered to encrypt I is same as that in Section 3. The corresponding value of corr in each direction is presented in Table 2. It can be observed that the corr obtained for I Enc are close to 0 than the existing schemes [36,37,38]. It indicates that SecureDL significantly breaks the pixels inter-correlations. Further, we computed the corr for different number of adjacent values such as K = 1000, 5000, 10000, 15000, 25000 and 35000 in Table no. 3.

Comparison of information randomness sensitivity
In this section, we discuss the differences between the number of pixel intensity values of the secret image I and its encrypted image. The comparison is performed with different variations in parameters of the encryption scheme. It is noticed that the graphical representation between a gray-level (8-bit)  image with itself gives a linear line indicating the zero difference. However, this representation must be very random in the encrypted form, as depicted in Fig. 4. A secret image is presented in Fig. 4 (a) followed by its encrypted images (b) -(d) with POB parameters n = 17, k = 13 and permutation matrices of dimensions (left-to-right) ρ 1 × ρ 2 = 13 × 13, ρ 1 × ρ 2 = 37 × 37 and ρ 1 × ρ 2 = 47 × 47 respectively. The differences in the red, green, and blue channels of secret image (a) and encrypted image (b) are shown in Fig. 4 (e). Further, the randomness in each channel of secret image and the encrypted image (c) and (d) are analyzed and presented in Fig. 4 (f) and Fig. 4 (g) respectively. It is easy to infer that our scheme adds high-intensity confusion in the spatial domain of each color-channel and hence in the whole image. It deduce that the predictions of pixel intensity values of secret image are unable to extract from its encrypted image.

Information sensitivity in encrypted images
A comparative analysis between the variation occurred in the spatial domains of different encrypted images is performed. The pictorial representation of spatial variation is depicted in Fig. 5. Consider a secret image Fig.  4 (a) and its respective encrypted images Fig. 4 (b) -(d). In Fig. 5 (b), the differences in the corresponding red, green and blue channels of encrypted images Fig. 4 (b) and (c) can be visualized. Moreover, the differences in encrypted images Fig. 4 (b) and Fig. 4 (d), and Fig. 4 (c) and Fig. 4 (d) are pictures in Fig. 5 (b) and 5 (d) respectively. It is inference that if an image is encrypted with different encryption parameters, then the obtained encrypted images are drastically different. Hence, it can be observed the encrypted images are key-sensitive.

Experiments
The robustness of SecureDL in various scenarios have been validated through benchmark image classification datasets. The qualitative and quantitative analyses of the proposed scheme are also reported.

Qualitative results
The visual representation of encrypted images obtained with secret keys K are presented in Fig. 6. The first row Fig. 6 (a) represents the secret images extracted randomly from above-defined datasets. The image depicted in Fig.  6  n = 17 and k = 13 and permutation matrices of varying dimensions. It can be observed that the probability of extracting the object shape and features in the encrypted images is nearly complicated. The distortion and decay in information increases with the increment in dimensions of block-size (ρ 1 ×ρ 2 ). Moreover, the encrypted images corresponding to cat and dog are visually similar; however, representing distinct classes.
In Fig. 6, it can be observed that the encrypted images have similar intensity values as that of secret image I at some pixel locations. In other words, the encrypted forms of I leaks a small amount of image's visual information. It is happening due to the order-preserving property of the permutation ordered binary (POB) number system. It can also be noted that the value of parameters (n, k) in POB number system (ref. Eq. 5) is same for all images of a dataset specifically n = 17 and k = 13. The information leakage can be reduced by considering different combinations of varying values of parameters n and k for different blocks, provided in a bounded range. For instance, the combinations of n [11,21] and k [9,17].
Also, we normalized all the POB-spaced images I P OB into a fixed range namely

Quantitative analysis
Here, we analyze the recognition accuracy obtained for the above-defined datasets in two scenarios. In the first scenario, the model is trained and tested over encrypted data. In the second scenario, the model is trained over encrypted data and tested over plain data. Note that testing data considered in the latter scenario is the same as that in the first scenario but in plain form. The datasets are encrypted with different secret keys and then, state-of-theart deep recognition models VGG16, ResNet50, NasNet and InceptionV3 are used for learning over these encrypted datasets. The secret encryption keys utilized for encrypting data are the same as those in Section 7.2. The data is divided into a ratio of 90% for training and 10% for testing purposes.

Training and testing on encrypted data
Here, we analyze the scenario in which model M is trained and tested over the encrypted data. The results of experiments in this scenario with over different datasets are reported in Table 4 -7. It is observed that the recognition results over the encrypted data obtained with small values of ρ 1 and ρ 2 are high and vice-versa. In resemblance of no free lunch theorem, there cannot exist a unique DL model that outperforms for all datasets and, therefore, resultant accuracies vary with different architectures. For instance in Table 6, the recognition accuracies over encrypted data obtained by permutation matrices of dimension 37 × 37 and, n = 17 and k = 13, are high as compared to others.
To analyze the recognition robustness of SecureDL, we compared the achieved recognition accuracies for the encrypted data with the existing image encryption schemes. In other words, we encrypt each dataset by shuffling image pixels, and the image encryption schemes proposed by Tong et al. [42] and Hua et al. [43]. Then, each of the DL model namely VGG16, ResNet50, NasNet and InceptionV3 are trained and tested for obtained encrypted datasets. The achieved testing accuracies are presented in Table 8. It can easily be observed that SecureDL outperforms the existing schemes. In SecureDL, the encryption is performed by obfuscating image information in a block-based manner. Each block is shuffled using a different permutation matrix, and in each block, the accumulated noise is bounded within the same block's maximum and minimum intensity values. In this way, our proposed image encryption scheme de-correlates each block's pixels independently and preserves some local features. These preserved features play a significant role in learning robust parameters of model M to give high recognition accuracy. In contrast, the existing image encryption schemes [42,43] de-correlate the complete image's pixels simultaneously and do not preserve any feature. It is worth mentioning that these existing encryption schemes are more robust than SecureDL for storage purposes only however, they cannot be utilized for processing the encrypted images. The deep learning recognition models for instance Inception V3 utilizes the preserved local features in SecureDL while training and inferencing, but it is not happening with the existing image encryption schemes.
In our scheme, a trade-off between the dimensions of block permutation matrices and accuracy is noticed. In today's era of data-driven approaches, data is no less than the user's monetary assets. Thus, a few decrements inaccuracy at the cost of data confidential can be accepted by the user. The improvement in the recognition rate is considered in our future works.

Training on encrypted data and testing on plain data
In this scenario, we consider the threat in which an adversary is likely to extract the class labels related information by utilizing the model M trained over encrypted data. It can be possible by transfer learning over plain (non-encrypted) data. This threat-likelihood is resistible if model M gives low confidence rates over plain data. Given this, the recognition rates

Performance analysis
This section discusses the computational and storage overheads of our scheme and compares it with the existing algorithms. A cloud service needs to be computationally less expensive and require minimum storage space.

Computation overhead
In SecureDL, the processing required for the generation of secret keys K = {n, k, π 1 , π 2 , ..., π T } using function KeyGen and encrypting data D at user-end are one-time operations. The computational time required for the functioning of these operations depends upon image size as well as key K. Further, the mathematical operations to be evaluated during the learning of model M are performed by CSP. Each service provider has different computational resources such as quantity and configurations of GPUs and hence, transforming all pixel intensity values of I to POB-values and generating 529 permutation matrices, π = {π 1 , π 2 , ..., π 529 }, each of size 13 × 13 is 0.0038 and 0.0501 respectively. Further, the time required for permuting the POBspaced image (I P OB ) using π is 0.3530 second. Thus, user needs to give a total of only 0.0038 + 0.0501 + 0.3530 = 0.4069 second to encrypt a secret image of dimension 299 × 299. Hence, the required time for encrypting dataset of 10000 images = 0.4069 * 10000 = 4069 seconds only. However, In view of FHE-based model [10], user can encrypt and encode 24193 images of MNIST type in an hour on a PC with window 10 OS, a single Intel Xeon E5-1620 CPU running at 3.5GHz, with 16GB of RAM. It is worth to note that MNIST dataset contains binary images of size 28 × 28 only whereas our images are RGB-color images of size 299 × 299.

Storage overhead
Similar to computational overhead, there is a one-to-one correspondence between storage overhead and image size. We experimentally proved that the space required for an encrypted image obtained by our scheme is the same as that of the original image and, henceforth, does not increase storage space. Consider an true-color image of size 300 × 300 × 3 pixels. The storage requirement is 300 × 300 × 3 × 8 bits or 0.2575 MB space.
In the first level of our encryption scheme, POB number system transforms each 8-bit pixel value into an n-bit POB-value (n >> 8). Thus, the space required for the encrypted image is 300 × 300 × 3 × n bits. The second level of security shuffles image pixels intensity values based on permutation matrices π = {π 1 , π 2 , ...π T }, in which size of the POB-spaced image remains the same i.e., 300 × 300 × 3 × n bits. As discussed in Section 4, this modified image is normalized, to increase computational efficiency, in the range [0, 255], which again drops down the image size. Thus, size of the encrypted image becomes same as of the original image i.e., 300×300×3×n×8 n bits = 300 × 300 × 3 × 8 bits = 0.2575 MB. Hence, SecureDL does not expand the data size whereas, existing schemes such as CryptoNets [10] and CryptoDL [44] drastically increase the data size by 60 and 34 times respectively over simple MNIST data. The storage comparison is presented in Table 9.

Conclusion and future work
We propose a novel image encryption scheme using permutation ordered binary number system and block-permutation matrices. The scheme partially preserves the trichotomous law of positive integers. Extending this scheme, we modeled SecureDL, a privacy-preserving image recognition system for the encrypted data over the cloud infrastructure. It overcomes the computational and storage overheads that occurred by FHE-based schemes. SecureDL is proved to be secure in probabilistic view-point and various cryptographic attacks. The model is evaluated over benchmark datasets: Kaggle Dogs vs. Cats, ISIC dermoscopy images, Linnaeus 5, and CIFAR10. The achieved accuracy over encrypted data is close to that of the plain domain. The preservation of trichotomy law in the proposed scheme allows the extension of image recognition to scene text detection, image segmentation, and video classification.

Histogram Attack
Here, we provide some more real-time images and compare their red, green, and blue component histograms with the encrypted image's histograms. It can be seen in Figs. 11 -15. It can be observed that the histograms of the original images vary drastically with each other, whereas the histograms of encrypted images are approximately similar. Further, we perform the histogram equalization over the histograms of the encrypted images in Figs. 16 -20. We observed that the encrypted image's equalized histograms are similar to the histograms of another encrypted image; however, they are significantly different from the histograms of the original images. It indicates that an adversary's negligible amount of information may be extracted, but that information is not sufficient for extracting the whole original image.

Statistical analysis
This section computes the block-wise maximum, mean, median, and minimum of the original and the corresponding encrypted images.
In Figs