Robust Low Rank and Sparse Decomposition from Blurred Video Frames

Recently, the low-rank and sparse decomposition problem has attracted attention in several applications, especially surveillance videos. Due to the physical limitations in acquisition systems, measured frames are blurred by a low-pass filter. In this article, we aim to decompose blurred videos’ frames into low-rank and sparse components, in order to extract the background. Unlike conventional methods, we simultaneously take into account the blurring effect, as well as the missing data. Our simulation results confirmed the advantage of this approach in extracting low-rank components in surveillance videos.


I. INTRODUCTION
With the growth in population and public transportation, moving object detection and background extraction have become an essential problem in video surveillance systems [1]. Furthermore, the problem of separating the background and the foreground [2]- [5] has several applications such as safety and security, retail, and home automation [6]. One of the wellstudied approaches to deal with this issue is the low-rank and sparse decomposition (LRSD), that are based upon reshaping the video into low-rank and sparse structures. Indeed, by vertically embedding the consecutive frames inside a matrix, we have a matrix composed of low-rank and sparse parts. The background in each frame can be seen as a low-rank structure, while the moving parts form a sparse one. Principle component analysis is the first attempt used for extracting the low-rank background of a video stream from Gaussian noise [7]. In [8], the authors considered the sparse part as noise and they exploited the robust principal component analysis (RPCA) method. This was done by conventional algorithms such as singular value thresholding and inexact augmented Lagrangian method. In another work [9], a stable version of the RPCA method was deployed to deal with noisy data.
In some cases, due to the perturbation of read or write processes during the recording of the frames; frame pixels might be either corrupted or missed. In [5], [10] and [11], algorithms were proposed to solve the LRSD problem with missing data. Also, in [12], [13], fast singular value decomposition (SVD) free methods were proposed. Indeed, the RPCA method in [14] could solve foreground and background extraction via gradient descent (RPCA-GD).
Due to physical limitations in imaging systems such as lens effect, lengthy shutter exposure, high-speed motions, etc. the measured images are blurred [15], [16]. Hence, this problem obliges us to consider the blurring effect in the LRSD problem model.
In this paper, we aim to solve the background extraction problem from blurred and sub-sampled frames (of surveillance videos), where the blurring kernel is assumed to be known. First, we assume each frame to be convolved with a blurring kernel (typically a low-pass filter); Next, the sub-samples of the blurred frames are measured. This model differs from [13] as we consider missing data. Unlike the methods in [5], [10], we take into account the blurring effect at each frame of the video. Furthermore, our simulation results demonstrate that our model can achieve better low-rank background estimation in surveillance videos than other conventional models in the LRSD problem.
Our paper is organized as follows: In Section II, we explain our problem model and propose the algorithm to solve it. Next, in Section III experimental results are presented. Finally, in Section IV, we conclude the paper.

II. PROBLEM MODEL
Let I 1 , I 2 , . . . , I t ∈ R n1×n2 be t frames of the target video. A blurring kernel φ is convolved with each frame I i to model the lens effect. Next, the resulting images are uniformly sampled to reach measured matrices B 1 , B 2 , . . . , B t ∈ R m as follows: where Ω is the index set, uniformly chosen in with size |Ω| = m, and P Ω (Z) : R n1×n2 → R m is the sampling operator which measures the observed elements in Ω. By concatenating each vectorized image frame vec(I i ), we can build the frames matrix F I ∈ R n1n2×t . Therefore, one can rewrite (1) in matrix format as: where Φ is a block Hankel structure corresponding to the 2-D convolution of the blurring kernel φ with each frame. Also, M o represents the matrix of measured frames. In the surveillance videos, for the frames matrix, one can see that F I = L I + S I , where L I , S I ∈ R n1n2×t are lowrank and sparse matrices, respectively. The recovery problem here refers to extracting L I and S I from their mixture F I . Although the main goal is to achieve an estimate of the lowrank component L I , the sparse matrix S I might also contain other information, such as the moving objects in the video surveillance example.
We consider the following convex optimization problem to extract the low-rank and the sparse parts: minimize L,S∈R n×t , where L, S ∈ R n×t are the low-rank and the sparse desired matrices in which n = n 1 n 2 , and Φ ∈ R n×n is a block Hankel structured matrix. . * denotes the nuclear norm which equals to the sum of the singular values and . 1 is the 1 norm. Also, λ > 0 is a weighting factor, regularizing the sparsity and the rank. There are many suggested approaches to tackle the nuclear norm in optimization 3 such as smooth rank approximation [11], [17], [18].

A. Algorithm
Regarding the upper bound of the nuclear norm of an arbitrary matrix A in [19], [20], we are using the following SVD free structure: Therefore, problem (3) can be rewritten in the following setting: We use the alternating direction method of multipliers (ADMM) technique for solving the problem (5). By combining the two constraints, we reach the following Augmented Lagrangian cost function: Now, the update rules of L (k+1) , U (k+1) , V (k+1) , and S (k+1) in k−th iteration can be obtained by sequentially Then, the update rules of Lagrangian multipliers is given by Next, the sub-problems on L, U, V, and S can be solved by taking the gradient concerning each variable and setting the gradients to zero. Hence, where the linear operator J : C n×t → C n×t is defined as: Afterward, we update S by where S λ/µ2 : R n×t → R n×t is the soft-threshold function which is defined as: where z i,j is the (i, j)-th element of matrix Z. Next, for U and V we have: where I n×n and I t×t represent identity matrices with size n×n and t×t, respectively. The initial values of L (0) and S (0) come from the least square problem solution, i.e., where F (0) = L (0) + S (0) . Then, U (0) and V (0) can be computed by the polar decomposition of L (0) . Also, we set Λ (0) 1 = ∅ and Λ (0) 2 = ∅. The convergence analyses of the ADMM algorithm for minimizing the sum of multi nonsmooth convex separable functions have been studied in [21], [22], in which it is shown that if the penalty parameters are chosen to be sufficiently large, the classical ADMM converges to the set of stationary solutions.

III. SIMULATION RESULTS
In this section, we apply our model to the foreground and background of real video data. We compare methods such as Grassmannian Robust Adaptive Subspace Tracking Algorithm (GRASTA) [23], Grassmannian Rank-One Update Subspace Estimation(GROUSE) [24], and Robust PCA via Non-convex Gradient Descent (RPCA-GD) [14] with each other. These algorithms and ours are implemented in MATLAB. We used three types of real surveillance videos in our simulations (i.e. traffic data, shopping center data, and escalator data). In addition, Gaussian filters are used to simulate blurring effect in datasets. The videos in these datasets have different frame sizes, but we used 100 frames of each one. Moreover, the input data is generated by randomly setting 90% of the pixels of each frame as missing data. Figure 1 demonstrates the foreground and background separation for the Highway dataset by applying the mentioned methods. The original image is in the first row, and other rows show the outputs of the other algorithms. Figures 2, 3 show the same results for Escalator, and Mall data set, respectively. The results show that the accuracy of our method is higher than those of the other tested methods.

IV. CONCLUSION
In this paper, we investigated the background and foreground separation problem on blurred surveillance video datasets. Moreover, we proposed a problem considering the blurring kernel effects. Next, we developed an ADMM algorithm to solve the sparse and low-rank matrix decomposition problem. At last, we compared our method with state-of-theart methods. Numerical simulation results that came from realworld datasets demonstrates that our method works better than the other simulated approaches.