Robust Tensor PCA based Background/Foreground Separation in Noisy Videos and Its Applications in Additive Manufacturing

—Background/foreground separation is one of the most fundamental tasks in computer vision, especially for video data. Robust PCA (RPCA) and its tensor extension, namely, Ro- bust Tensor PCA (RTPCA), provide an effective framework for background/foreground separation by decomposing the data into low-rank and sparse components, which contain the background and the foreground (moving objects), respectively. However, in real-world applications, the video data is contaminated with noise. For example, in metal additive manufacturing (AM), the processed X-ray video to study melt pool dynamics is very noisy. RPCA and RTPCA are not able to separate the background, foreground, and noise simultaneously. As a result, the noise will contaminate the background or the foreground or both. There is a need to remove the noise from the background and foreground. To achieve the three terms decomposition, a smooth sparse RT- PCA (SS-RTPCA) model is proposed to decompose the data into static background, smooth foreground, and noise, respectively. Speciﬁcally, the static background is modeled by the low-rank tucker decomposition, the smooth foreground (moving objects) is modeled by the spatio-temporal continuity, which is enforced by the total variation regularization, and the noise is modeled by the sparsity, which is enforced by the (cid:96) 1 norm. An efﬁcient algorithm based on alternating direction method of multipliers (ADMM) is implemented to solve the proposed model. Extensive experiments on both simulated and real data demonstrate that the proposed method signiﬁcantly outperforms the state-of-the-art approaches for background/foreground separation in noisy cases. the background and foreground since RPCA and RTPCA have only two components, which cannot explain the three components in the data. This paper puts forward a smooth sparse Robust Tensor PCA by decomposing the tensor data into low-rank, smooth, and sparse components, respectively. It is a highly effective method for background/foreground separation in noisy case. In the case studies on simulated video and X-ray data, the proposed method can handle non-additive noise, and even the case of high noise-ratio. In the proposed algorithm, there is only one tuning parameter λ . Based on the case studies, our method achieves satisfying performance by taking any λ ∈ [0 . 2 , 1] with anisotropic total variation regularization ( q = { 1 } ). That means the proposed method is a kind of “parameter-free” algorithm. Furthermore, the proposed method is also applicable to other popular industrial applications. Practitioners can use the proposed SS-RTPCA for degradation processes monitoring, where the degradation image contains the static background, anomaly, and random disturbance, respectively.


NOMENCLATURE H, W, T
The height, width, and number of an image frame (r 1 , r 2 , r 3 ) The multi-linear rank in Tucker Decomposition λ The balance coefficient in the proposed objective function X The order three tensor in R H×W ×T represented by {X 1 , · · · , X T } X t t-th image frame in R H×W L The low-rank tensor (static video background) S The smooth tensor (smooth moving objects) E The noise tensor (all kinds of noise) X × n U The mode-n multiplication of a tensor X with a matrix U G The core tensor in tucker decompsition U 1 , U 2 , U 3 The factor matrices in tucker decompsition f The auxiliary variable D h , D v , D t Three vectorizations of the difference operation along the hori-zontal, vertical, and temporal directions D The the concatenated difference operation, i.e, The Frobenius norm · 1 The 1 norm · 2 The 2 norm · T V 1 The anisotropic total variation norm · T V 2 The isotropic total variation norm Q Q ∈ {T V 1, T V 2} q q = {1} corresponding to the anisotropic total variation norm; q = {2, 1} corresponding to the isotropic total variation norm vec(·) The vectorization operator ten(·) The tensorization operator λ f , Λ X The Lagrange multiplier vector and tensor β f , β X The positive penalty scalars c 1 , c 2 The coefficients in adaptive updating scheme for β f , β X fftn The fast 3D Fourier transform (a) Unprocessed X-ray image (b) Processed X-ray image (c) Processed X-ray image with manually annotated solid/liquid interface ifftn The inverse fast 3D Fourier transform soft(·, ·) The soft-thresholding operator γ The parameter associated with convergence rate in ADMM Err(·) The error of the auxiliary variable relChgA The relative change of A relErrA The relative error of A

I. INTRODUCTION
B ACKGROUND/Foreground separation is a fundamental step for moving object detection in many video data applications [1], [2]. It is usually performed by separating the moving objects called "foreground" from the static objects called "background" [3], [4]. In many real-world applications, the presence of noise is a very common but challenging issue [5], [6]. For example, in metal additive manufacturing (AM) research, high-speed X-ray data has been applied to study the melt pool formation and evolution [7]. The change of melt pool geometry is directly associated with solid/liquid interface velocity, which can be utilized to characterize the microstructure of the final part [8]. To capture the melt pool, a high-speed X-ray imaging system is applied to monitor the printing process [9]. One unprocessed X-ray image from the monitoring system is shown in Fig. 1a, where the melt pool cannot be identified by our naked eyes. To enhance the boundary of the melt pool, the unprocessed X-ray image is first subtracted by the X-ray image captured six frames earlier and then adjusted contrast to obtain the processed X-ray image. The processed X-ray image is shown in Fig. 1b, where the melt pool boundary can be located by our naked eyes as outlined in red shown in Fig. 1c. Even though the melt pool boundary is enhanced, this subtraction step generates lots of random noise causing problems for accurate boundary detection.
Recent research on background/foreground separation is based on decomposition of video data into low-rank and sparse components. It is an effective framework to separate the foreground from the background, which are modeled by the sparse and low-rank components, respectively. Among them, the most representative problem formulation is the Robust Principal Component Analysis (RPCA) [3], [10], which is a modification of the widely used statistical procedure named principal component analysis (PCA). RPCA [10] decomposes the data matrix X into the sum of a low-rank matrix L and a sparse matrix S, where the low-rankness and sparsity are measured by the nuclear norm · * and 1 norm · 1 , respectively. Apart from the background/foreground separation, RPCA can also be applied to the task of image denoising [2], where the noise is represented by the sparse components.
One major disadvantage of RPCA is that it can only deal with 2-D matrix data since the nuclear norm · * is designed for matrix. However, real-world data is usually multi-dimensional in nature, where rich information is stored in multi-way arrays known as tensors [11]. For example, a greyscale video is 3-D data, which stacks multiple images along the time domain; a color image is also 3-D data that has three channels: red, green, and blue, where each channel is a 2-D image. To apply RPCA to these datasets, the multi-way tensor data has to be reconstructed into a matrix. Such a preprocessing usually leads to information loss and performance degradation since the structure information in the data is deteriorated. To address this issue, it is necessary to consider extending RPCA to manipulate the tensor data directly by taking advantage of its multi-dimensional structure. However, it is challenging to do so since the numerical algebra of tensors is fraught with many computationally hard problems [12], [13].
Contributed by the newly developed tensor multiplication scheme on t-SVD [14], Zhang et al. [15] proposed the tensor tubal rank as well as the tensor nuclear norm for image denoising. Based on the tensor nuclear norm, Lu et al. [16] developed Robust Tensor PCA (RTPCA) by extending RPCA from 2-D matrix to 3-D tensor data, aiming to exactly recover a low-rank tensor contaminated by sparse errors. More specifically, it tries to recover the low-rank tensor L and sparse tensor E from the data tensor X , which can be represented X = L + E. To further improve RTPCA, some work has been proposed with different objective functions and constraints [17]- [19]. However, none of them is able to address the background/foreground separation problem in noisy cases because the low-rank and sparse components extracted from RPCA or RTPCA algorithms do not consider the noise components in their analyses.
The objective of this study is to address the problem of background/foreground separation in the presence of noise and apply it to additive manufacturing applications ( Figure 1). To achieve this objective, a smooth sparse RTPCA (SS-RTPCA) is proposed to decompose the data tensor X into a low-rank tensor (background) L, a smooth tensor (foreground) S, and a sparse tensor (noise) E, namely, X = L + S + E. In the SS-RTPCA, the background is modeled by the low-rank Tucker decomposition [11]. The spatio-temporal continuity is applied to formulate the moving objects (foreground) [20], [21]. That is, the moving objects in video foreground are spatially continuous in both its support regions and its intensity values in these regions. Moreover, the moving objects are also temporally continuous among succeeding frames. The noise is modeled by a sparse tensor. To summarize, the contributions of this paper are as follows: (i) Propose the smooth sparse RTPCA model for background/foreground separation in the presence of noise by decomposing the data tensor into a low-rank tensor, a smooth tensor, and a sparse tensor, respectively; (ii) Implement an efficient algorithm based on alternating direction method of multipliers (ADMM) [22] to solve the proposed model. The remainder of this paper is organized as follows. A brief review of notation and related research work is provided in Section II. The proposed model and algorithm to solve this model are introduced in Section III. Numerical studies in Section IV and real-world additive manufacturing application in Section V are provided for testing and validation of the proposed method. Finally, the conclusions and future work are discussed in Section VI.

II. NOTATION AND RESEARCH BACKGROUND
In Section II-A, the notation and basics in multi-linear algebra used in this paper are reviewed. Then, the smooth sparse decomposition and Robust Tensor PCA are reviewed briefly in Section II-B. Afterwards, the research gaps of the existing work are identified in Section II-C.

A. Notation and Tensor Basis
Throughout this paper, scalars are denoted by lowercase letters, e.g., x; vectors are denoted by lowercase boldface letters, e.g., x; matrices are denoted by uppercase boldface, e.g., X; and tensors are denoted by calligraphic letters, e.g., X . The order of a tensor is the number of its modes or dimensions. A real-valued tensor of order N is denoted by X ∈ R I1×I2×···×I N and its entries by X (i 1 , i 2 , · · · , i N ). The multi-linear Tucker rank of an N -order tensor is the tuple of the ranks of the moden unfoldings X (n) ∈ R In×(I1×···×In−1×In+1×···×I N ) . The inner product of two same-sized tensors X and Y is the sum of the products of their entries, i.e., X , Y = i1 · · · i N X (i 1 , . . . , i N ) · Y (i 1 , . . . , i N ). Following the definition of inner product, the Frobenius norm of a tensor X is defined as X F = X , X . The mode-n multiplication of a tensor X with a matrix U amounts to the multiplication of all mode-n vector fibers with U, i.e.,(X × n U)(i 1 , · · · , i n−1 , j n , i n+1 , · · · , i N ) = in X (i 1 , · · · , i N ) · U(j n , i n ).

B. Related Work
In this subsection, two directions of related work to motivate the research in this paper are introduced here.

1) Smooth Sparse Decomposition:
In the literature of anomaly detection in images, there is a popular two-steps procedure, namely, smoothing images first followed by anomaly detection [23]. However, this procedure may not be optimal. To address this issue, Yan et al. [24] proposed a smooth sparse decomposition (SSD) approach to integrate smoothing images and anomaly detection into a single task. It decomposes an image y into a smooth image background µ, sparse anomalous regions a, and random noise e, namely, y = µ + a + e. To further enhance the model, the authors proposed to use smooth spline basis B and sparse spline basis B a to formulate the mean and anomalies. Specifically, the enhanced SSD model is as y = Bθ + B a θ a + e, where θ is the smooth basis coefficients and θ a is the sparse basis coefficients, respectively. To estimate θ and θ a , a least square regression penalized by the two terms is as follows min θ,θa where e 2 2 is the fitting error; the first penalty term measures the smoothness in θ through the roughness matrix A; the second penalty term describes the sparsity in θ a ; and τ 1 , τ 2 > 0 are regularization coefficients to be tuned. The idea of smooth and sparse decomposition is further extended to the literature of tensor completion [25].
2) Robust Tensor PCA: As the tensor extension of the popular Robust PCA, the recent proposed RTPCA [16] aims to recover the low rank tensor L 0 ∈ R I1×I2×I3 and sparse tensor E 0 ∈ R I1×I2×I3 from their sum. RTPCA solves the following convex optimization problem where · T N N is their proposed tensor nuclear norm, which is a convex relaxation of the tensor tubal rank. The tensor nuclear norm and tensor tubal rank are defined based on the t-SVD proposed in [15]. In the paper, the analysis shows that ρ = 1/ max(I 1 , I 2 )I 3 guarantees the exact recovery under tensor incoherence condition. Following this direction, to further exploit the low-rank structures in tensor data, Liu et al. [17] extracted low-rank component for the core matrix whose entries are from the diagonal elements of the core tensor. Based on this idea, they defined a new tensor nuclear norm and proposed a creative algorithm to deal with RTPCA problems. Other than the work based on the tensor tubal rank, Yang et al. [19] considered a new model for RTPCA based on tensor train rank. These methods are applied to background/foreground separation, image/video denosing, etc.

C. Research Gap Identification
In real-world applications, the video data is often contaminated with noise during acquisition, compression, and transmission [6] exemplified in Fig. 1. If the smooth sparse decomposition method [24] is applied to the noisy video, the background and foreground cannot be separated well since they together are considered as smooth components and the noise is considered as sparse components. If the Robust Tensor PCA [17]- [19] is applied, either the detected moving objects or the background is contaminated with noise since RTPCA cannot handle the three components simultaneously (i.e., background, foreground, and noise). Therefore, this work seeks to address these research gaps by devising a new smooth sparse Robust Tensor PCA (SS-RTPCA). The proposed model can be considered as separating background/foreground together with video denoising by providing a new decomposing methodology with three components.

III. PROPOSED METHOD
In Section III-A, the proposed smooth sparse RTPCA for background/foreground separation in the presence of noise is presented. Specifically, the low-rankness, spatio-temporal continuity, and sparsity are formulated by the tucker decomposition, total variation (TV) regularization, and 1 regularization, respectively. In Section III-B, an efficient algorithm based on ADMM [22] is designed to solve the proposed model.

A. Proposed Smooth Sparse RTPCA Model
Throughout this work, it is focused on the video that can be represented as a third-order tensor X := {X 1 , · · · , X T } ∈ R H×W ×T , where each matrix X t ∈ R H×W represents t-th image frame t = 1, . . . , T . H, W and T denote the height, width of an image frame, and the number of image frames, respectively. The three modes of tensor X are height, width and time of the video.
As discussed in Section II-C, for a noisy video, it is necessary to decompose the video data X into the low-rank tensor L (the static video background), the smooth tensor S (the smooth moving objects in foreground), and the sparse tensor E (absorbing all kinds of noise), respectively. In the static background, the image frames keep unchanged along time domain. This can be achieved by restricting L to be a low-rank tensor in time domain. For the moving objects in the video foreground, they are continuous spatially and temporally so that they can be represented as a smooth tensor S. But the sparse tensor E absorbs the random noise so that the random noise can be excluded from the background and foreground. An illustration of the video decomposition strategy for our proposed method is provided in Fig. 2. Specifically, it has the following form X = L + S + E as proposed in Section I.
To model the low-rankness, the static background L is approximated by the well-known Tucker decomposition [11] with rank-(r 1 , r 2 , r 3 ). Specifically, the Tucker decomposition has the following form where U 1 ∈ R H×r1 , r 1 < H and U 2 ∈ R W ×r2 , r 2 < W are orthogonal factor matrices for two spatial domain, U 3 ∈ R T ×r3 , r 3 < T is orthogonal factor matrix for temporal domain, core tensor G ∈ R r1×r2×r3 interacts these factors. The determination of (r 1 , r 2 , r 3 ) is provided in Section III-C. By formulating the low-rank tensor L using Tucker decomposition, it can reconstruct more accurate video background than the low-rank model based on matrices. Because the Tucker decomposition considers not only the spatial but also the temporal correlations in video background. The smooth tensor S (moving objects) is assumed to have the spatio-temporal continuity property such that the foreground moves smoothly and coherently in the spatial and temporal directions. In the literature, imposing the spatio-temporal continuity constraints on moving objects in the foreground is well studied and proven to be effective [20], [21]. To measure the sensitivity to change of a quantity function, the derivative is often applied in mathematics. For discrete functions, difference operators are the approximation to derivative. Given a thirdorder tensor S ∈ R H×W ×T , S(x, y, t) indicates intensity of position (x, y) at time t, and denote three difference operation results of position (x, y) at time t with periodic boundary conditions along the horizontal, vertical, and temporal directions, respectively. For simplicity of computation, all the entries of S can be stacked into a column vector s = vec(S), in which vec(·) represents the vectorization operator.
and D t s = vec(S t ) are used to denote the vectorizations of the three difference operation results, respectively, in which applied. Two commonly used vector norms are 1 and 2 norms. Specifically, the anisotropic total variation norm is defined as and the isotropic total variation norm as which are the 1 and 2,1 norms of [D h s, D v s, D t s] , respectively. The isotropic total variation of S is denoted by Ds 2,1 for the sake of notations. The total variation regularization has been widely used in image and video denoising and restoration [26], [27] due to its superiority in detecting discontinuous changes in image processing. By combing the advantages of Tucker decomposition for the low-rank tensor and total variation regularizations for the smooth tensor, the proposed smooth sparse RTPCA has the following formulation where the factor matrices U j , j = 1, 2, 3 in Tucker decomposition (1) are orthogonal in columns. The first term in the objective function measures the sparsity of the noise tensor E by 1 norm. The second term encodes the spatio-temporal continuity of smooth tensor S, which is measures by the total variation regularizations defined in (2) and (3), where Q can be selected from {T V 1, T V 2}. λ > 0 is the coefficient to balance the two terms in the objective function. The first term in the constraints show the decomposition strategy of the tensor data and the second term in the constraints shows the tucker decomposition of the low rank tensor L.
Remark 1 (Comparison with [20], [28]). Similar as our paper, the work in [20], [28] also decomposes the video into three components, namely, L, S, and E. However, they focus on using L + E to model the dynamic background, where L represents the static background and E represents the small changes in the dynamic background. These two papers are all vector-based methods without using the structure information in the tensor. Besides, our paper emphasizes removing the noise in background/foreground separation. The methods in [20], [28] are also be utilized as part of the benchmarks for comparison.

B. Optimization Algorithm
In this section, an efficient algorithm based on alternating direction method of multipliers (ADMM) [22] is proposed and implemented for solving the proposed SS-RTPCA model in (4). Specifically, a multi-block version of ADMM is developed. To apply ADMM, the SS-RTPCA model in (4) is rewritten as the following equivalent form by introducing an auxiliary variable f : where the factor matrices U j , j = 1, 2, 3 are orthogonal in columns, and · q can be selected from · 1 or · 2,1 .
The above optimization problem in (5) can be solved in its Lagrangian dual form by augmenting the constraints into the objective function. Specifically, the augmented Lagrangian function of problem in (5) has the follow form where λ f , Λ X are the Lagrange multiplier vector and tensor, respectively, and β f , β X > 0 are positive penalty scalars. The optimization problem in (6) is a non-convex optimization problem. Thus, optimizing all variables simultaneously is difficult. Instead, this optimization problem (6) is solved by alternatively minimizing one variable with the others fixed. Under the framework of multi-block ADMM, the optimization problem (6) with respect to each variable can be solved by the following sub-problems. 1) Optimization on G, U j or L: The optimization subproblem of (6) with respect to G and U j , j = 1, 2, 3 can be rewritten as (7) where X = X −S−E −Λ X /β X . The classic HOOI algorithm [11] can be applied to solve this sub-problem, which is another alternating iterative algorithm. Due to the non-convexity of the sub-problem (7), HOOI cannot obtain the optimal solution in general. However, the iterative sequence from HOOI has the monotone decreasing property.
2) Optimization on S: By setting the gradient of L with respect to S to zero, the sub-problem of (6) with respect to S can be solved by the following linear equations where ten(·) denotes the tensorization operator. Thanks to the block-circulant structure of the matrix corresponding to the operator D * D, it can be diagonalized by the 3D FFT matrix [29]. Therefore, S can be fast computed by where Φ = |fftn(D h )| 2 + |fftn(D v )| 2 + |fftn(D t )| 2 is not related to the data and model parameters so that it only needs to be calculated once in the whole algorithm. fftn and ifftn indicate fast 3D Fourier transform and its inverse transform, respectively.
3) Optimization on f : The sub-problem of (6) with respect to f can be rewritten as For q = {1} (anisotropic total variation), the well-known softthresholding operator [30] can be applied to solve this subproblem as follows where the soft-thresholding operator soft( The definitions for p v and p t are analogous. f h can be efficiently computed by which is a element-wise operator, and ε > 0 is a small positive constant.

4) Optimization on E:
The sub-problem of (6) with respect to E can be solved by where

5) Updating multipliers:
According to the ADMM, the multipliers associated with L are updated by the following formulas where γ > 0 is a parameter associated with convergence rate, and the penalty parameters β f and β X follow an adaptive updating scheme as suggested in [31]. Take β f as an example, where Err(f k ) = f k − Dvec(S k ) . As suggested in [21], γ = 1.1, and c 1 , c 2 can be taken as 1.15 and 0.95, respectively. The proposed algorithm for SS-RTPCA can now be summarized in Algorithm 1. The derivations for (7), (9), and (12) are provided in the Appendix A.

5:
Updating E via (12); 6: Updating multipliers and the related parameters via (13) and (14). 7: end while to the nearest integer greater than or equal to that element. For r 3 , it takes the value 1 for all experiments as it has proven to be effective in [21], [32]. In terms of λ, it needs to be carefully tune based on the data. Specifically, λ is taken in the range [0.2, 1], [1,3] for q = {1} and q = {2, 1}. Empirically, the proposed algorithm can achieve satisfactory performance with any λ ∈ [0.2, 1] for q = {1}. Therefore, it can be concluded that the proposed SS-RTPCA is a "parameter-free" algorithm.
Next, the computational complexity of the proposed SS-RTPCA is analyzed. In Algorithm 1, one outer iteration from line 1 to 7 includes updating rules for L, S, f , E, and multipliers, respectively. Updating L is achieved by the iterative HOOI algorithm, which needs O(W 3 + H 3 + T 3 ) floating operations in each inner iteration. In this paper, the HOOI algorithm iterates 20 times, then updating L needs O(W 3 + H 3 + T 3 ) floating operations. At each iteration of updating S, the main computation is from the four FFTs (including three FFTs and one inverse fast Fourier transform), each is with O HW T log (HW T ) as shown in [33]. Updating f , E, and multipliers only requires elementwise addition and shrinkage operations, say O(HW T ). In conclusion, each outer iteration of the proposed algorithm requires O W 3 + H 3 + T 3 + HW T log (HW T ) floating point operations.

IV. NUMERICAL STUDIES
To evaluate the performance of the proposed SS-RTPCA, its performance demonstrations on simulated data from opensourced video data is presented in this section. In Section IV-A, the empirical convergence and sensitivity of the proposed algorithm are illustrated. The performances of the proposed algorithm for background subtraction and foreground detection are presented in Section IV-B and IV-C, respectively. In Section IV-B and IV-C, RTPCA 1 [18], IRTPCA 2 [17], TTNN 3 [19], TVRPCA 4 [20], and GoDec 5 [28] are selected as benchmarks for comparison with the proposed SS-TRPCA, which are state-of-the-art methods in the related area. The benchmarks have two categories: (1) RTPCA, IRTPCA, TTNN are the most advanced Robust Tensor PCA algorithms in the literature; (2) TVRPCA and GoDec are vector-based algorithms to decompose a video into three terms. All results in this section are the average results of 20 repetitions for comparison. The codes of SS-RTPCA are implemented in Matlab 2019a. The CPU of the computer to conduct experiments in this paper is an Intel ® Xeon ® Processor E-2286M (8-cores 2.40-GHz Turbo, 16 MB).
Performance evaluation indices and parameter tuning: For the task of background subtraction, the peak signal-tonoise ratio (PSNR) and the structural similarity index (SSIM) are used to measure the recovery accuracy. PSNR and SSIM commonly measure the similarity of two images in intensity and structure respectively. Specifically, PSNR is defined as: PSNR = 10×log 10 , where I andÎ are respectively the original and recovered background. SSIM measures the structural similarity of two images; see [34] for details. Average PSNR and SSIM over all image frames in the video are used to evaluate recovery performance of video background. For the task of foreground detection, F-measure is applied to assess the foreground detection performance. Average F-measure over all image frames in the video is applied to evaluate the detection performance of video foreground. Therefore, 20 repetitions are sufficient to represent the performance of each method since each repetition is the average performance of multiple image frames. For these performance indices PSNR, SSIM, and Fmeasure, higher values indicate the better performance.
For the proposed method as well as the benchmark methods, the parameter tuning is performed by searching from 100 sets of parameters sampled by the maximin Latin hypercube design [35] such that the average PSNR and F-measure achieve the best value for background subtraction and foreground detection, respectively. In the proposed algorithm 1, λ can be selected from [0.2, 1] for q = {1} and [1, 3] for q = {2, 1}, respectively.
A. Convergence and Sensitivity Analysis 1) Convergence Analysis: The video dataset Candela from SBI dataset 6 [36] is used in this section. In total, this video dataset has 855 image frames, where the size of each grayscale image is 288×352. For simplicity, the first 80 image frames in the sequence are used for experiments. Therefore, the tensor size is 288 × 352 × 80. One image frame from Candela in this experiment is shown in Fig. 5a. In the video, there is a man entering and leaving room, abandoning a bag. For each image, 10% of pixels are randomly selected to set as random integers in [0, 255], and the positions of the contaminated pixels are unknown (one example is shown in Fig. 5d). To evaluate the convergence of the proposed algorithm, the relative change relChgA = A k −A k−1 F max(1, A k−1 F ) and the relative error relErrA = A k −A 0 F max(1, A 0 F ) are applied as the assessment 6 https://sbmi2015.na.icar.cnr.it/SBIdataset.html indices of algorithm convergence, where A k is the result in k-th iteration and A 0 is the ground-truth. The ground truth of the static video background is provided in the first column of Fig. 6.
The curves of the relative change of the video background L and the video foreground S, and the relative error of the video background L are shown in Fig. 3 2) Sensitivity Analysis: Since λ is the only tuning parameter in the proposed algorithm 1, the sensitivity of the algorithm to parameter λ is studied to further explore its effect. The same dataset with 10% noise as the one in Section IV-A1 is applied here. However, in this experiment, λ can vary from 0.01 to 3 for both isotropic (q = {2, 1}) and anisotropic (q = {1}) TV regularizations. The results of relative error of the video background L for both cases are plotted in Fig. 4 vs. λ.
The relative error of the proposed SS-RTPCA with isotropic (q = {2, 1}) TV regularization keeps decreasing when λ increases. On the other hand, the relative error of proposed SS-RTPCA with anisotropic (q = {1}) TV regularization drops very fast when 0 < λ < 0.2 and remains very stable when 0.2 ≤ λ < 3. Overall, our method with q = {1} outperforms the one with q = {2, 1} since it has a flat area where the performance of our method with q = {1} is fairly good for a wide range of λ. Therefore, our method with anisotropic TV regularization with any λ ∈ [0.2, 1] is recommended for practitioners.

B. Background Subtraction
In this subsection, the proposed algorithm 1 is applied to background subtraction. The video dataset Candela in Section IV-A, Caviar1, and Caviar2 from SBI dataset [36] are used for the experiments. Figs. 5a, 5b, 5c show three image frames from the three video datasets. In the dataset of Caviar1, there are people slowly walking along a corridor, with mild shadows. In the dataset of Caviar2, there are people entering and leaving a store, standing only for few frames. For both videos, the first 80 image frames are used for experiments. Thus, the tensor data size of Caviar1 and Caviar2 is 256 × 384 × 80. The background in each video dataset is static, which is provided as the ground truth for comparison. There are people walking in the background, which are treated as the smooth foreground. The noisy effect is simulated the same way as in [18]. Specifically, a ratio of pixels in each image are set to random integer in [0, 255], and the positions of the contaminated pixels are unknown. The corresponding noisy images from the three video datasets are shown in the second row of Fig. 5. Note that the simulated noise in the section is much harder than the additive noise like Gaussian noise since the information in the pixel is completely ruined. In this experiment, the cases of 10%, 20%, and 30% noise are studied to show the performance on background subtraction under different noise ratio. The quantitative results of all benchmark methods with different noise ratios on simulated Candela, Caviar1, and Caviar2 are summarized in Table I in terms of PSNR, SSIM, and Computational Time, respectively. The performance of proposed SS-RTPCA is the best from q = {1} and q = {2, 1} in terms of PSNR, our method with q = {1} always has better background subtraction performance and computation efficiency. For all cases using PSNR and SSIM, our method can achieve the best performance for all scenarios. When the noise ratio increases, our proposed SS-RTPCA is the most consistent one among all the benchmark methods. Specifically, our method is performing much better than RTPCA, IRTPCA, and TTNN. This demonstrates the advantage of low-rank tucker decomposition for the static background model over the tensor nuclear norm. TVRPCA and GoDec have better performance than the tensor nuclear norm based algorithms given that TVRPCA and GoDec decompose the data into three components. Nevertheless, the proposed method shows better performance than TVRPCA and GoDec due to the advantage of tensor modeling over the vector-based method. In terms of computational time, our method ranks third among all benchmarks and is very close to the second fastest algorithm, i.e., TTNN. The quantitative result shows that the proposed algorithm not only has the best performance in terms of accuracy but also runs efficiently.
To show the visualization result, the background subtraction results from the case of 10% noise on all three video datasets are demonstrated. The visualizations of the recovered video background from each video dataset for different methods are shown in Fig. 6. For the backgrounds subtracted from RTPCA, IRPCA, and TTNN, there are people remaining in the background even though the majority of the noise is removed. For the result from GoDec, there is only a little shadow of people left in the background. For the result from TVRPCA, its performance is very close to the proposed method for Candela, and Caviar2. But it performs poorly for the Caviar1. In general, our method produces the cleanest background for all three video datasets.
Since the quantitative results in Table I are the best performance for each algorithm from 100 sets of tuning parameters, other descriptive statistics (for example, median, quartiles, etc.) for each algorithm are unknown to us. For the case of Candela with 10% noise, the boxplot in Fig. 7 summarizes five-number standard (the minimum, the maximum, the sample median, and the first and third quartiles) for each algorithm in terms of PSNR. This result shows that the proposed method is very stable for different values of λ since the variation among these five numbers is quite small. Even though TVRPCA and GoDec can achieve good performance, they are more sensitive to the tuning parameters compared with our method, especially TVRPCA. This comparison shows that the proposed SS-RTPCA is a kind of parameter-free algorithm since its performance for different λ is almost the same.

C. Foreground Detection
In this subsection, our algorithm 1 is applied to foreground detection. The video datasets, i.e., Highway, Office, and Pedestrians from CDnet dataset 7 [37], are used for experiments.   sake of computational time. The background in each video dataset is static. The binary masks for the video foreground are provided as ground truth for comparison. To simulate the noisy effect, the salt and pepper noise [20] is applied. Specifically, a ratio of pixels in each image are set to random integer from {0, 255}, and the positions of the contaminated pixels are unknown. The corresponding noisy images from the three video datasets are shown in the second row of Fig. 8.
In this experiment, the cases of 10% and 20% noise are considered to show the performance on foreground detection under different noise ratio. The quantitative results of all benchmark methods with different noise ratios on simulated Highway, Office, and Pedestrians are summarized in Table II in terms of Precision, Recall, F-measure, and Computational Time, respectively. The performance of proposed SS-RTPCA is the best from q = {1} and q = {2, 1} in terms of F-measure, our method with q = {1} always has better foreground detection performance and computation efficiency. In terms of precision, our method can achieve the best performance in five out of the total six scenarios. In terms of recall, the proposed SS-RTPCA can achieve the best performance in three out of the total six scenarios. More importantly, our method is the best for all six scenarios in terms of F-measure. When the noise ratio increases, our proposed SS-RTPCA is the most consistent one among all the benchmark methods. TVRPCA and GoDec have better performance than the tensor nuclear norm based algorithms given that TVRPCA and GoDec decompose the data into three components. However, our proposed method shows better performance than TVRPCA and GoDec due to the advantage of tensor modeling over the vector-based method. In terms of computational time, our method runs efficiently even though it is not the fastest one.
To show the visualization result, the foreground detection results from the case of 20% noise ratio are demonstrated. The visualizations of one frame from each video dataset for different methods are shown in Fig. 9. For the foreground masks subtracted from RTPCA, IRPCA, TTNN, and GoDec, there are lots of noise remaining in the foreground, where the moving objects are not well detected either. For the result from TVRPCA, it can remove noise from the foreground. However, its detected foreground is not as complete as our result. In summary, our method can detect the most accurate foreground among all benchmarks even though the video is noisy.
Since the quantitative results in Table II are the best performance for each algorithm from 100 sets of tuning parameters, other descriptive statistics (for example, median, quartiles, etc.) for each algorithm are still unknown to us. For the case of Highway with 20% noise, the boxplot in Fig. 10 summarizes median, and the first and third quartiles) for each algorithm in terms of F-measure. This result shows that our method is very stable for different values of λ since these five numbers have very small variation. Even though TVRPCA can achieve good performance, it is very sensitive to the tuning parameters compared with our method. In this case, the worst performance from our method is still much better than the best performance from all other methods.

V. APPLICATION IN ADDITIVE MANUFACTURING
In this section, the application of proposed method in melt pool detection is presented. The same benchmark methods in Section IV are applied here. In terms of metal additive manufacturing, the electron-beam melting [9] printing process using Ti-6Al-4V alloy is used. The experimental conditions for the laser are: power 190 W; spot size 100 µm; scan speed 0.25 m/s. The X-ray imaging system is applied to monitor the printing process in order to track the melt pool information [9]. More specifically, the melt pool boundary, i.e., the solid/liquid interface, is the main interest. In terms of X-ray imaging conditions, the pixel dimension is 2 µm, the duration of each frame is 16.67 µs, the field of view is 768µm×1440µm, and the frame rate is 60 kHz. One unprocessed example from the imaging system is shown in Fig. 1a. As discussed earlier, our naked eye can hardly identify the melt pool boundary. To enhance the boundary of the melt pool, each unprocessed X-ray image is first subtracted by the X-ray image captured 6 frames earlier and then adjusted contrast to obtain the processed X-ray image to have a wide boundary, which is shown in Fig. 1b. By doing so, lots of noise are generated. Therefore, the proposed algorithm is applied to decompose the video data and remove the noise.
In this experiment, there are 100 processed X-ray image frames capturing the melt pool dynamics, where the size of each image frame is 384 × 720. Therefore, the tensor size is 384 × 720 × 100. Since the ground truth of background and foreground is unknown, visualization results are presented to show the performance of different methods. The performance of proposed SS-RTPCA is represented by taking λ = 0.55 and q = {1}. In Fig. 11, the background/foreground separation results from all methods are presented. In terms of background, our method can recover the porosity defects existing in the specimen, which cannot be seen from the noisy image. Besides, the shape and location of porosity defects is consistent with the ones in Fig. 1a. GoDec can also recover the porosity defects. However, the recovered background is noisy and the melt pool can still be seen in the background. For other methods, they cannot recover the background. In terms of foreground, our method and TVRPCA can remove noise from the foreground while other methods cannot. However, our detected foreground can better show the entire melt pool compared with TVRPCA. The result in this subsection shows that our algorithm can not only recover the background with porosity defects, which are hidden by the noise, but also obtain the most accurate melt pool geometry among all methods.
VI. CONCLUSION In this article, a new smooth sparse Robust Tensor PCA is developed for background/foreground separation in noisy video data. The proposed SS-RTPCA decomposes the video data into low-rank, smooth, and sparse components, respectively. To achieve the solutions efficiently, an algorithm based on ADMM for SS-RTPCA is implemented. The empirical convergence experiment shows that the proposed SS-RTPCA can converge and run efficiently in practice. The background subtraction and foreground detection results on simulated video Fig. 11. Visualization results of different methods on a frame of processed X-ray video.
data and X-ray data demonstrate that our method outperforms RTPCA, IRTPCA, TTNN, TVRPCA, and GoDec, which are state-of-the-art algorithms in the literature. These results also illustrate the effectiveness of Tucker decomposition for the low-rank tensor, Total Variation regularization for the smooth tensor, and noise removal model using the sparse tensor. More importantly, the proposed SS-RTPCA with q = {1} can be considered as a "parameter-free" algorithm since its performance is very stable when the only tuning parameter λ takes any value from [0. 2,1] In addition, there are still some aspects of SS-RTPCA that deserve further investigation. First, the video background is assumed to be static in the proposed model. However, the extension to the case of dynamic background should be further investigated. Second, in the case studies, our method with q = {1} is always better than q = {2, 1}. Therefore, finding the reason why q = {1} is better is one of the next steps of research. Third, the empirical convergence of the proposed algorithm based on ADMM is provided in our case study. Thereafter, the theoretical convergence of the proposed algorithm needs further research. This appendix provides the detailed derivations for Algorithm 1. Specifically, the steps to obtain (7), (9), and (12) will be shown in details.
The terms in the augmented Lagrangian function (6) related to G, U j are as follow where X = X − S − E − Λ X /β X . Since the second term in the right hand side of (15) is a constant, the sub-problem with respect to G, U j can be represented as (7).
The terms in the augmented Lagrangian function (6) related to f have the following representation where the third term in the right hand side of (16) is a constant. Therefore, the sub-problem with respect to f can be represented as (9).
The terms in the augmented Lagrangian function (6) related to E take the following form where the third term in the right hand side of (17) is a constant. Thus, the equation (12) can be derived.