Crypt-OR: A privacy-preserving system for exemplar-based object-removal over the cloud

Object removal is a technique for removing the undesired object(s) and then ﬁll-in the empty region(s) in an image such that the modiﬁed image is visually plausible. The existing algorithms are unable to provide promising results when the region to be removed - has varying textured-neighborhood, is small in size and the depth of the image and, is of speciﬁc geometric shapes such as triangle and rectangle. In this paper, we proposed a new algorithm by incorporating the merits of partial diﬀerential equations (PDEs) and exemplar-based schemes to address these challenges. The data term, which measures the continuity of isophotes in exemplar-based methods, is modiﬁed by incorporating a regularizer term and partial derivatives up to second order of the input image. This regularizer enhances the strength of isophotes striking the boundary and boosts the information propagation in an unbiased manner, in terms of pixel intensity values. Additionally, the low-cost, agility, and accessing ﬂexibility beneﬁts of cloud services have attracted user’s attention today. Besides, users are concerned about utilizing them for their data, as they are supported by untrusted third parties. Addressing these privacy concerns for object-removal in an image over the cloud server, we extended and modiﬁed our algorithm to make it compatible for ( T, N )-threshold Shamir secret sharing scheme (SSS). This privacy-preserving system is an end-to-end system for object-removal in the ED over the cloud server namely Crypt-OR . Crypt-OR is evaluated by removing synthetically imposed objects in real-images. Further, Crypt-OR has proved to be secure under various pixel-based cryptographic attacks such as frequency-known attack and pixel-correlation attack.


Privacy-preserving image processing
Recently, researchers have presented several image/video processing models to perform different tasks in the ED. Lathey et al. Lathey et al. (2013); Lathey & Atrey (2015) have proposed a technique for privacy-preserving image enhancement such as noise removal, smoothing, and sharpening, in the ED. The image is secured through (N, T )-threshold Secret Sharing Scheme (SSS) Shamir 25 (1979). Shamir's SSS is a homomorphic encryption scheme that partitions a secret s into N shares and requires at-least T shares to reconstruct s losslessly.
Individually, each share appears random and contains zero information making the scheme secure. On reconstructing with less than T -shares, the resultant value is equivalent to zero information. Similarly, Mohanty et al. Mohanty 30 et al. (2013) and Tanwar et al. Tanwar et al. (2018a) have presented schemes for privacy-preserving image resizing and cropping in ED over the cloud. The ramp SSS is utilized to secure the user's image from an adversary and interpolation technique over known intensity values to compute unknown pixel-intensities in ED. A scheme for extracting and locating face regions in ED is proposed by 35 Yan and Kankanhali Yan & Kankanhalli (2015). They secured the image information using coefficients of discrete cosine transform. Rahulamathavan et al. Rahulamathavan et al. (2013) presented a facial expression recognition system in an encrypted image, through Paillier cryptosystem, using local fisher discriminant analysis. Sayed et al. SaghaianNejadEsfahani et al. (2012) proposed an Hsu et al. Hsu et al. (2012) proposed a model for extracting SIFT features in ED using the Paillier cryptosystem. Chu et al. Chu et al. (2013) presented a real-time object detection model for video surveillance. The scheme trans-50 formed each frame into multiple blocks, and then each block is multiplied by a random matrix. Further, the Gaussian mixture model is used for detecting a moving object(s). The encryption is performed by matrix permutation and color flipping. However, using the same encryption key for the whole process in video surveillance increases the rate of collision attacks. Additionally, Chu 55 et al.'s model is based on each pixel-level operation, which highly increases the computational complexity. In Orlandi et al. (2007), an interactive protocol between the model and the data owners for evaluating the activation functions in the neural network is presented using Homomorphic Encryption (HE). Xie et al. Xie et al. (2014) and Gilad et al. Gilad-Bachrach et al. (2016) proposed 60 privacy-preserving image classification using Fully Homomorphic Encryption (FHE) Rivest et al. (1978) scheme. The utilization of FHE algorithms exponentially increases the computation overhead, making them impractical for real applications.

Image inpainting
It is a process of modifying images such that an unseen observer is unable to differentiate the original and inpainted image. In the ancient period, it was practiced by art-workers for restoring their images to make them "up to date".
The objective of image inpainting is to fill the deteriorated region in an image to make them more visually appealing. 70 Diffusion-based inpainting techniques utilize the propagating-structure of heat flow through diffusion Partial Differential Equations (PDEs), to propagate the local information to the region to be inpainted Ω. An iterative mathematical formulating the technique performed by the professional inpaint art-workers was presented by Bertalmio et al. Bertalmio et al. (2000). The formulation 75 leverage the isophotes-prolongation technique of Masnou and Morel Masnou & Morel (1998) arriving at the boundary of Ω. The isophotes are propagated in the inward direction in such that they do not cross themselves and generate undesired results. The authors tested their scheme to remove the scratches and overlaid-texts in an image. Following, numerous amount of inpainting models 80 incorporating the variants of linear and non-linear isotropic/anisotropic diffusion equations Bertalmio et al. (2001);Peter et al. (2016) have been proposed. These schemes are capable of completing the curves and small-sized objects; however, they generate smoothing/blurring effects for large and textured regions.
These perplexities have been addressed in Criminisi et al. (2004); Wang et al. 85 (2013); Xiang et al. (2019) using exemplar-based texture synthesis approach presented by Efros and Leung Efros & Leung (1999). These schemes scan the source region Φ to obtain the best-exemplar patches for the region Ω, determined by similarity metric. Then, these patches are copied in the respective positions in Ω. Because of the "No Free Lunch Theorem", these schemes are found to 90 output unconnected edges. To connect these edges, researchers have combined theories of the categories mentioned above and proposed hierarchical schemes which improve perceptual and visual plausible Drori et al. (2003); Arias et al. (2011). Recently, the data-driven and parameter learning-based schemes using Convolutional Neural Networks (CNNs) have been proposed Nazeri et al. (2019); 95 Yu et al. (2018). Nazeri et al. Nazeri et al. (2019) scheme is a two-stage approach, i.e., structure prediction and image completion. Both of these steps utilize the adversarial model, a variant of deep learning models. The scheme hallucinates the edges of the regions to be inpaint and then utilizes these edges to fill-in the region for image completion. Yu et al. Yu et al. (2018) scheme 100 is a generative model, based upon feed-forward convolutional neural networks (CNNs), for synthesizing image structure and surrounding image features. Their model is evaluated to fill-in multiple only square-shaped holes irrespective of size and location. However, these learning models require a huge amount of training data and computational resources for significant results. For a detailed overview 105 of image inpainting, ref. Guillemot & Le Meur (2013).
Nowadays, CSPs are providing inpainting SaaS such as Webpaint 1 and Photoscissors 2 . They require the user's image in a plausible visual form, which increases the concerns towards the protection of image information. Moreover, 1 https://www.webinpaint.com/ 2 https://online.photoscissors.com/ the existing inpainting techniques cannot be directly implemented in ED due to their dependency on neighboring pixels, and distortion occurred through encryption techniques.
In the context of image inpainting, Yan et al. Yan et al. (2018) proposed a partial secret image sharing scheme for hiding the desired region of the image rather than the whole image. Their technique fills the target region by using the 115 inpainting technique in PD, which reveals the information of non-inpainting or source region while inpainting is performed over the cloud. We address above concerns and propose Crypt-OR, a privacy-preserving end-to-end system for removing object(s), specified by the user, in ED.

120
The key contributions of the paper are as follows -1. A new image inpainting scheme for object-removal is proposed. The data term, which measures the continuity of isophotes in exemplar-based methods, is introduced by integrating a regularizer and partial derivatives up to second order of the input image. These derivatives improve the propaga-125 tion efficiency of texture and structural information from the source region to the target region in the isophotes directions. The priority function is defined as the linear function of confidence and data terms.
2. The proposed scheme is extended to develop a privacy-preserving objectremoval model, Crypt-OR, for removing pre-defined undesired object(s)   4. To best of our knowledge, Crypt-OR is the first move towards removing objects in a given image in the ED using a search-copy-paste approach incorporating the cloud infrastructure(s).
Organization: Section 3 is devoted to describing Shamir's SSS and the funda-145 mentals of the exemplar-based scheme. The proposed framework for Crypt-OR is defined in Section 4 and, security analysis in Section 5. The qualitative and quantitative analysis of the proposed scheme are provided in Section 6, followed by discussion and analysis in Section 7. Finally, Section 8 concludes the paper. A novel scheme for extracting and locating face regions in ED is proposed by Yan and Kankanhali Yan & Kankanhalli (2015). The information of image is preserved using coefficients of discrete cosine transform. Rahulamathavan et al. Rahulamathavan et al. (2013) presented a facial expressions recognition system 170 in ED using local fisher discriminant analysis. The input image is encrypted using Paillier cryptosystem.

Privacy-preserving clustering
Clustering is an unsupervised ML task, which aims to partition a group of D-dimensional data points into K different subgroups, formally known as clus-

200
The scheme proposed by Adi Shamir Shamir (1979) states that if a secret value s, is shared into N meaningless shares, then T or more shares are required for lossless reconstruction of s, where T and N are positive integers such that T ≤ N . The shares of s are generated via a polynomial g(x) of degree (T − 1), defined as: where the coefficients c 1 , ..., c T −1 are randomly chosen from a Galois field of prime order Υ, denoted as GF (Υ) and x [1, N ] is an integer.
The reconstruction of s from its N shares say {S 1 , S 2 , ..., S N } is accomplished using Lagrange's interpolation over more than T − 1 shares. Moreover, the Shamir's SSS has been proved to be a correct and perfect, in the sense that -(a) 210 out of N shares, T or more shares can reconstruct the secret value losslessly, (b) the zero information will reveal if reconstruction is performed using less than T shares. Also, Shamir's SSS is homomorphic with respect to addition and scalar multiplication.

215
Consider an image I, in which user aims to remove and fill-in the target region Ω, using the source region Φ, where Φ = I − Ω. The boundary between these regions is denoted by ∂ Ω, as depicted in Fig. 1(a). For pixel p, Ψ p indicates a patch or window of size w × w centered at p, as presented in Fig.   1(a). The standard rule-of-thumb in exemplar-based inpainting consists three 220 steps as follows:

Computing patch priorities
For pixel p ∂ Ω, the priority value P (p) of patch Ψ p is evaluated as a product of integral powers of confidence term C(p) and a data term D(p) as - where #(Ψ p ) and Γ indicate the number of pixels in Ψ p and normalization 225 factor respectively. The term C(p) is considered as the amount of reliable and meaningful information surrounding the pixel p. The D(p) term measures the continuity of isophotes in their direction and helps to improve the priority of patch along the isophotes direction. D(p) is calculated as a vector product of intensity gradient in an orthogonal direction ∆I ⊥ p and the normalized vector 230 − → n p , of the binary mask M depicting Φ and Ω only, as shown in Fig. 1(a).
Then, the technique searches a patch say Ψ q , in Φ having maximum similarity 235 measure with Ψ p as shown in Fig. 1(b).

Propagation of source information
Once the optimal patch Ψ q is obtained, the unknown pixel values in Ψ p are replaced with the corresponding values in Ψ q . It can easily be observed in Fig.   1(c). Finally, confidence values of unknown pixels are updated through

Proposed methodology
The proposed Crypt-OR removes the undesired object(s) and fills in the scratches in an image using an exemplar-based inpainting approach. In contrast with the classical data term and priority function, an Improved priority function, denoted as P imp (p), and Regularized data term, denoted by D reg (p) respectively 245 at pixel p are introduced. As discussed earlier, the schemes of PD cannot be directly applied to encrypted images. Therefore, the mathematical equations are modified depending upon the homomorphic operations, addition and scalar multiplication, of Shamir's SSS. Further, a protocol between the user and CSP is provided to accomplish the inpainting task at cloud servers.

250
In Crypt-OR, two entities are involved: (i) User: who aims to outsource his image I for removing the undesired region, denoted by Ω, in a secure manner and, (ii) CSP: who is "honest-but-curious" and accomplishes the task of inpainting in the encrypted share over the cloud server. CSP maintains the confidentiality, integrity and uniqueness of the share content provided by the

Regularized data term
The proposed data term, D reg (p), is defined as - the term ||R|| indicates the introduced regularizer, and D 2 (p) aims to evaluate the magnitude of maximum change in the neighborhood of p in vertical and both horizontal directions of the respective gradients. This 285 term helps to find the direction of edges and sharp changes in I. The third term D 3 (p) motivates to evaluate the curves in the diagonal and cross-diagonal directions. It is important to realize that D reg contains partial derivatives of second order which boosts data term and hence, improved priority function, P imp (p), to detect non-linear curves in I. Finally, the regularizer term boosts 290 the information propagation in an unbiased manner, in terms of pixel intensities.

Improved priority function
In previous works, priority function for pixel p is given as the product of integral powers of D(p) and C(p) such as in Eq. 2. Experimentally, it is observed that C(p) is inversely proportional to the exponential power of iteration 295 number and hence, approaches to zero for large number of iterations Cheng et al. (2005). This makes the priority value P (p), as defined in Eq. 2, to tend zero without considering the D(p) value and thus, calculation of P (p) becomes meaningless. Due to this, the scheme propagates the undesirable structure and texture information in the target region. Also, each pixel-intensity value is 300 highly inter-correlated with its neighboring pixels and most likely information lies in the isophotes direction. Additionally, the neighboring information of the target patch for the continuous and meaningful propagation of structure and texture information is considered. In our proposed work, the improved priority function for a pixel p, denoted by P imp (p), is defined as a linear operator of 305 C(p) and D reg (p) as defined in Eq. 2 and Eq. 4 respectively. The mathematical equation is: where β 3 = (β 1 + β 2 ) and α i , β i are arbitrary hyper-parameters whose values are evaluated through experiments. The value of hyper-parameters are not equal which leads to a non-uniform contribution by each of C(p) and D reg (p) to P (p).

Step-by-step description
In this section, we present the protocol of Crypt-OR which clearly explains the contribution of each entity -user and CSP. Assume an M × N -dimensional 315 gray-scale image I in which region Ω is aimed to remove and fill-in. Note that the inpainting for true-color RGB image can be accomplished by performing each operation over each color channel independently. The pictorial explanation of Crypt-OR is presented in Fig. 2. The outline of the proposed protocol is as follows -

320
Step 1: The region Ω in I is based upon user's requirement, therefore, it has be provided by the user, partitioning Ω and source region Φ = I − Ω. A binary mask M is generated utilizing the function MaskForm(). The partial derivatives of I upto second-order with respect to its spatial variables are computed using function DeriVal(). Since, I is a 2-D discrete signal, so, these derivatives are approximated using finite difference methods. For instance, the approximations of ∂ I(p) ∂ x , ∂ 2 I(p) ∂ x 2 , and ∂ 2 I(p) ∂ x∂ y at p = (x, y), are as follows -

325
∂ 2 I(p) ∂x∂y ≈ I x−2,y − 2 × I x,y + I x+2,y 4 (11) where I x,y indicates intensity value of I at position (x, y). Shamir's SSS is not homomorphic in multiplication and division operations, so, Eqs. 10 -12 will not provide same results as obtained in PD, when performed in ED. The shares are generated using modulo operation with prime Υ as defined in Eq. 1.

330
Therefore, the above equations are modified as - Similarly, other derivatives can be approximated with modulo operation.
The square root contained in regularizer in Eq. 8 is approximated by Taylor series upto first order. Further, the confidence and data values for each pixel of I are initialized randomly using ConfVal() and DataVal() respectively.
Step 3: After successfully receiving share S t and Ψ Enc p t , CSP C t searches the best possible encrypted patch say Ψ Enc q t similar to Ψ Enc p t . The searching is accomplished by defining a difference measure G() between Ψ Enc p t and Ψ Enc q t l , for some l, in S t as follows - Note that Ψ Enc q t l is of dimension w × w contained in Φ Enc , source region in S t corresponding to the source region Φ of I. A patch having least difference value is considered as the best-exemplar Ψ Enc q t corresponding to Ψ Enc p t in S t . Then, C t transmits back the obtained patch to user.

335
Step 4: User re-collect all N encrypted best-exemplar patches Ψ Enc q l N l=1 corresponding to Ψ Enc q l N l=1 obtained from N CSPs. Then, each encrypted patch Ψ Enc q t is decrypted to a patch in PD say Ψ q t . It can be clearly concluded that for t 1 = t 2 , Ψ q t 1 may be different from Ψ q t 2 . Therefore, the most suitable patch Ψ q with respect to Ψ p is procured by computing structural similarity 340 value between each Ψ q t and Ψ p . The patch having maximum similarity value is assumed to be best-exemplar Ψ q . The unknown pixels of Ψ p in Ω are replaced with the corresponding values of Ψ q . As defined in Section 3.2, the data term D reg (p) and confidence values C(p) are updated with the corresponding C(p) and D reg (p) values of Ψ p .

345
Step 5: Repeat Step 2 to Step 5 until the intensity values of all the pixels in Ω are obtained.

Security analysis
Here, the security analysis of Crypt-OR from an adversary is evaluated through standard image encryption tests Cao et al. (2018). 3(a). The encrypted shares I S1 and I S2 of I under modulo Υ = 251 are depicted in Fig. 3(b) and Fig. 3(c) respectively. It can be observed that histogram of

Pixels-correlation attack
The correlation between neighboring pixels in an image I depict a statistical relevance Chen et al. (2004). High pixels-correlation indicates the high-visual 385 quality of an image, which helps to extract meaningful information. To prevent the information poses by I from an adversary, the correlation value must be low for its encrypted shares, which we analyze for Crypt-OR.

Numerical values
The correlation of K neighboring pixels {(x 1 , y 1 ), (x 2 , y 2 ), ..., (x K , y K )} for 390 a gray-scale image I, denoted by corr, is evaluated as follows - where SD and cov(x, y) represent the standard deviation and covariance respectively. It lies in [-1, 1], with "-1" and "1" indicate perfect negative and positive correlation respectively and, no relation when it approaches "0".
For experiments, Lena image and its encrypted share, Ref. Fig. 3 Teng & Wang (2012) and, Tanwar et al. Tanwar et al. (2018b).  Consider two encrypted image E I 1 and E I 2 , obtained before and after changing one pixel intensity of an M × N -dimensional image I. Let E I 1 (i, j) and E I 2 (i, j) denote the intensities at pixel (i, j) th -position in E I 1 and E I 2 respectively. Then, UACI and NPCR are calculated as follows - where, The higher values of NPCR and UACI depict the high efficiency of the encryption algorithm to resist image information from differential attacks. For 425 experiments, two shares of Lena image (Ref. Fig. 3 (a)) and the modified image obtained by interchanging two-pixels in each color channel at the same position are generated. Due to the addition of a random number, the average of 10 simultaneous NPCR and UACI values are reported as 99.9520% and 33.2711% respectively. The values determine that Crypt-OR can adequately resist differential attacks, and an eavesdropper cannot extract any information even after a modification in intensity values of the secret image.

Experiments
In this section, the perceptual and statistical results of Crypt-OR are pre- with patch size as 5 and, deep learning schemes of Nazeri et al. Nazeri et al. 460 (2019) and Yu et al. Yu et al. (2018). The testing codes corresponding to inpaint model 4 presented by Nazeri et al. (2019) and pre-trained adversarial model 5 over Places2 dataset of Yu et al. (2018) are considered. The presented results are obtained without fine-tuning as authors have trained with regular and irregular patches provided by Liu et al. (2018). the results are depicted in  In each figure, the first row represents the original image with imposed redcolored region (Ω), aimed to remove. Second, third, and fourth rows indicate the inpainted images obtained by Criminisi et al. (2004), Nazeri et al. (2019), Yu et al. (2018 and, Crypt-OR respectively. It can easily be observed that Crypt-OR significantly removes the specified region except for minor artifacts in some

Peak Signal-to-Noise Ratio (PSNR)
The PSNR between inpainted image (I) and original image (J) is evaluated using the following equation: where MSE is mean squared error between I and J.
The quality of the inpainted image increases with higher PSNR value, which 490 is reported in Fig. 10. (MSSIM) index over N w number of local windows is computed as: where σ k and µ k are standard deviation and average intensity value of k {I, J}. σ IJ denotes correlation of I and J and, C l 's are stability constants.

500
Further, I k and J k indicate the image patches for k th window. It lies in the interval [0, 1], where the value near to 0 shows low structural similarity and that of 1 represents high structural similarity between I and J.
The MSSIM values obtained between the original and inpainted images are depicted in Fig. 11. Crypt-OR outperforms Criminisi et al. (2004)

Discussion
Here, the key outcomes of the proposed inpainting scheme are discussed by removing synthetically imposed shapes in grayscale images, as depicted in Fig.   13(a). It is observed that Criminisi's model Criminisi et al. (2004) can maintain the edges in an image, but also generates undesired information, except in the third row, Fig. 13(b). Nazeri et al. Nazeri et al. (2019) are unable to remove 540 the specified region as the existing model was not trained for these shapes.
Also, the neighboring background of the region has very little information and edges which the scheme can hallucinate, followed by completing the region. The inpainted images of Nazeri's model can be visualized in Fig. 13(c). Yu et al. Yu et al. (2018) model is a generative approach that synthesizes the image 545 structure and surrounding image features. It can be observed in Fig. 13(d) that the scheme fails to generate desirable results. Due to the non-availability of rich image features in the source region. The major drawbacks of these deep learning models are that they need to fine-tune/train over the new dataset with specific constraints as these are highly dependent upon the pixels inter-correlations.

550
Also, they require high configured computing resources and storage space.
In contrast, Crypt-OR is a single-image based scheme and outperforms the existing schemes as depicted in Fig. 13 (e). It is observed that the proposed regularizer in data term and improved priority function generalize the behavior of structure and texture propagation in isophotes direction for the arbitrary 555 region(s) to be inpaint. However, it is unable to propagate the edges of the triangle-shaped image (last-row), zoomed-in the upper corner of the triangle in Fig. 13(e), which is considered as future work.

Performance analysis
The computational cost required to remove the desired object(s) followed by   demonstrate that our scheme can be used for various applications such as removal of unwanted/undesired object(s), data hiding, and privacy surveillance.
As future work, the removal of various facial features in a personal image in the 585 ED over cloud infrastructures can be considered.