Dynamic L1-Norm Tucker Tensor Decomposition

Tucker decomposition is a standard method for processing multi-way (tensor) measurements and finds many applications in machine learning and data mining, among other fields. When tensor measurements arrive in a streaming fashion or are too many to jointly decompose, incremental Tucker analysis is preferred. In addition, dynamic adaptation of bases is desired when the nominal data subspaces change. At the same time, it has been documented that outliers in the data can significantly compromise the performance of existing methods for dynamic Tucker analysis. In this work, we present Dynamic L1-Tucker: an algorithm for dynamic and outlier-resistant Tucker analysis of tensor data. Our experimental studies on both real and synthetic datasets corroborate that the proposed method (i) attains high bases estimation performance, (ii) identifies/rejects outliers, and (iii) adapts to changes of the nominal subspaces.

Decomposition (CPD) [12,13], also known as Parallel Factor Analysis (PARAFAC), is another successful tensor analysis scheme with many applications in data mining and machine learning.
Tucker can be viewed as a high-order extension of Principal-Component Analysis (PCA) [14].Similar to PCA, which jointly analyzes a collection of vectors, Tucker analyzes a collection of (N ≥ 1)-way tensors to extract one orthonormal basis for each tensor mode.Instead of applying PCA on vectorized measurements, Tucker treats multiway measurements in their tensor form, thus leveraging inherent data structure and allowing for superior inference.
The merits of Tucker analysis have been demonstrated in a wide range of applications.However, it is also well-documented that Tucker is very sensitive against faulty measurements (outliers).Such outliers appear often in modern datasets, due to sensor malfunctions, errors in data storage/transfer, and even deliberate dataset contamination in adversarial environments [17][18][19].The outlier-sensitivity of Tucker is attributed to its L2-norm (Frobenius) formulation, which places quadratic emphasis to peripheral tensor entries.To remedy the impact of outliers, researchers have proposed robust reformulations of Tucker.For instance, Higher-Order Robust PCA (HoRPCA) [20] models and decomposes the processed tensor as the sum of a low multi-linear rank tensor (nominal data) and a sparse tensor (outliers).Another straightforward robust reformulation is L1-Tucker [21,22], which derives by simple substitution of the L2-norm in the Tucker formulation by the more robust L1-norm (not to be confused with sparsity-inducing L1-norm regularization schemes).Algorithms for the (approximate) solution of L1-Tucker have been proposed in [21][22][23][24][25][26][27].
In many applications of interest, the tensor measurements arrive in a streaming way.Accordingly, the soughtafter Tucker bases have to be computed incrementally.Incremental solvers are also preferred, from a computational standpoint, when there are too many collected measurements to efficiently process them as a batch.For such cases, researchers have proposed an array of algorithms for incremental Tucker decomposition, including Dynamic Tensor Analysis (DTA), Streaming Tensor Analysis (STA), Window-based Tensor Analysis (WTA) [28,29], and Accelerated Online Low-Rank Tensor Learning (ALTO) [30], to name a few.Despite their computational merits, similar to batch Tucker analysis, most of the existing incremental methods are sensitive against outliers.
In this work, we present Dynamic L1-Tucker: a scalable method for incremental L1-Tucker analysis, with the ability to (i) provide quality estimates of the Tucker bases, (ii) detect and reject outliers, and (iii) adapt to nominal subspace changes.
The rest of this paper is organized as follows.In Section II, we introduce notation and provide an overview of the relevant technical background (tensors, Tucker decomposition, L1-Tucker, and existing methods for dynamic/incremental Tucker).In Section III, we formally state the problem of interest.In Section IV, we present the proposed Dynamic L1-Tucker (D-L1-Tucker) method.Section V holds extensive experimental studies on synthetic and real datasets.Concluding remarks are drawn in Section VI.

A. Notation and Tensor Preliminaries
In this manuscript, vectors and matrices are denoted by lower-and upper-case bold letters, respectively -e.g., x ∈ R D1 and X ∈ R D1×D2 .N -way tensors are denoted by upper-case calligraphic bold letters -e.g., X ∈ R D1×...×DN .Collections/sets of tensors are denoted by upper-case calligraphic letters -e.g., X = {X , Y}.The squared Frobenius/L2 norm, • 2 F , returns the sum of squared entries of its tensor argument while the L1-norm, • 1 , returns the sum of the absolute entries of its tensor argument.
X can be seen as a collection of P n = m∈[N ]\n D m length-D n vectors known as mode-n fibers of X .For instance, given a fixed set of indices i m∈[N ]\n , X (i 1 , . . ., i n−1 , : , i n+1 , . . ., i N ) is a mode-n fiber of X .A matrix with columns of all the mode-n fibers of X is called the mode-n unfolding (or, flattening) of X and will henceforth be denoted as mat(X , n) ∈ R Dn×Pn .X × n A is the mode-n product of tensor X with matrix A of conformable size, and In accordance with the common convention, the order in which the mode-n fibers of X appear in mat(X , n) is as specified in [15].

B. Tucker Decomposition
Consider coherent tensor measurements X t ∈ R D1×...×DN , t = 1, 2, . . ., T .Also, define their concatenation tensor X ∈ R D1×...×DN ×T , such that X (:, :, . . ., :, t) = X t .Tucker analysis of the measurement batch {X t } T t=1 is formulated as max. where seeks N low-rank bases to compress the tensor measurements so that the aggregate preserved variance is maximized.Tucker is commonly implemented by means of the HOSVD or HOOI algorithms.HOSVD is a single-shot method that approximates the N bases in (1) disjointly by N parallel PCAs of the form max.
On the other hand, HOOI is an iterative method that optimizes the N bases jointly.In general, initialized at bases , at iteration i > 0 and for n = 1, 2, . . ., N , HOOI returns Q n,i as the solution to the PCA problem max. where

C. Outliers and L1-Tucker
Outliers appear often in datasets and can significantly compromise the performance of Tucker methods.Motivated by the success of L1-PCA in vector-data analysis [31], L1-Tucker decomposition has been proposed as an outlierresistant Tucker reformulation.L1-Tucker derives by substituting the outlier-responsive L2 norm in (1) by the more robust L1-norm, as max.
where [22,32,33] approximates the solution to L1-Tucker by N parallel L1-PCA problems.That is, for every n ∈ [N ], it finds Q n by solving (approximately or exactly) the L1-PCA max.
On the other hand, L1-HOOI is an iterative process that provably attains a higher L1-Tucker metric when initialized at the solution of L1-HOSVD [22,34].Initialized at {Q n,0 } n∈[N ] (typically by means of L1-HOSVD), at every iteration i ≥ 1, L1-HOOI updates Q n,i by solving max.
where A n,i is defined in (4).
As seen above, L1-HOSVD and L1-HOOI are implemented through a series of L1-PCAs.L1-PCA admits an exact solution by combinatorial optimization with high cost [31].However, there are multiple high-performing approximate L1-PCA solvers in the literature that can be used by L1-Tucker methods.In the algorithmic developments of this work, we consider the L1-norm Bit-Flipping (L1-BF) algorithm of [35].For the sake of completeness, a brief description of L1-BF follows.
Consider matrix X ∈ R Z×Q , for Q ≥ Z, and the L1-PCA max. Q∈SZ×z L1-BF is based on the following Theorem, presented in [31].
The nuclear norm • * returns the sum of the singular values of its argument and, for any tall matrix A ∈ R Z×z that admits SVD A = UΣ z×z V , Proc(A) = UV .
In view of Theorem 1, [35] proposed to initialize at arbitrary B 0 ∈ {±1} Q×z and iteratively conduct optimal single-bit flips (negations).Let e q,Q denote the q-th column of the size-Q identity matrix I Q .Then, at iteration and updates . Among all possible single bit-flips, negation of the (k , l )-th entry of B i−1 offers the maximum possible value in XB i * .Importantly, L1-BF is guaranteed to monotonically increase the metric and converge in finite (in practice, few) iterations.

D. Existing Methods for Incremental and Dynamic Tucker
While HOSVD and HOOI are standard methods for batch processing, incremental alternatives are needed when the tensor measurements arrive in a streaming fashion or they are too many to efficiently process in a batch.
Dynamic Tensor Analysis (DTA) [28,29] efficiently approximates the HOSVD solution by processing measurements incrementally, with a fixed computational cost per update.Moreover, DTA can track multi-linear subspace changes, weighing past measurements with a forgetting factor.Streaming Tensor Analysis (STA) [28,29] is a fast alternative to DTA, particularly designed for time-critical applications.Window-based Tensor Analysis (WTA) is another DTA variant which, in contrast to DTA and STA, adapts to changes by only considering a sliding window of measurements.The Accelerated Online Low-Rank Tensor Learning (ALTO) method was presented in [30].For each new measurement, ALTO updates the bases through a tensor regression model.In [36], authors presented another method for Low-Rank Updates to Tucker (LRUT).When a new measurement arrives, LRUT projects it on the current bases and few more randomly chosen orthogonal directions, forming an augmented core tensor.Then it updates the bases by standard Tucker (e.g., HOSVD) on this extended core.In [37], authors consider very large tensors and propose randomized algorithms for Tucker decomposition based on the TENSORSKETCH [38].It is stated that these algorithms can also extend for processing streaming data.Randomized methods for Tucker of streaming tensor data were also proposed in [39].These methods rely on dimension-reduction maps for sketching the Tucker decomposition and they are accompanied by probabilistic performance guarantees.More methods for incremental tensor processing were presented in [40][41][42][43], focusing on specific applications, such as foreground segmentation, visual tracking, and video foreground/background separation.
Methods for incremental CPD/PARAFAC tensor analysis were presented in [44,45].Robust incremental solvers for PARAFAC were also presented in [46,47].However, the problem of outlier-resistant dynamic/incremental Tucker analysis remains to date largely unexplored.

III. PROBLEM STATEMENT
Focusing on outlier-resistant tensor processing, we wish to estimate the L1-Tucker bases of a tensor-data model, as formulated in (5).We assume, however, that the measurements {X t } T t=1 are originally unavailable and collected in a streaming fashion, one at a time.
To set our algorithmic guidelines, we start by considering two simplistic antipodal approaches.On the one hand, an instantaneous approach would L1-Tucker-decompose each new measurement to return new bases, independently of any previously seen data.While this approach is memory-less and computationally simple, its basis estimation performance is bound to be limited, especially in low Signal-to-Noise Ratio (SNR).On the other hand, an

Initialization
Online update

Batch Initialization Online Updates
Ω(, M) increasing-batch approach would append the new measurement to the already collected ones and re-solve the L1-Tucker problem from scratch.As the data collection increases, this method could attain superior basis estimation performance at the expense of increasingly high computational and storage overhead.

Bases
Both these extreme approaches exhibit an unfavorable performance/cost trade-off.In contrast, a preferred method would leverage each new measurement, together with previous ones, to efficiently update the existing bases.The development of such a method is the main contribution of this paper, as presented in detail in the following section.

IV. PROPOSED ALGORITHM
The proposed Dynamic L1-Tucker Decomposition (D-L1-Tucker) is a method for incremental estimation of the L1-Tucker bases.D-L1-Tucker is designed to (i) attain high basis estimation performance, (ii) suppress outliers, and (iii) adapt to nominal subspace changes.In this section, we present D-L1-Tucker in detail, addressing basis initialization, basis updates, parameter tuning, and modifications for long-term efficiency.

A. Batch Initialization
Considering the availability of an initial batch of B T measurements, B = {X 1 , . . ., X B }, we run on it L1-HOSVD or L1-HOOI to obtain an initial set of L1-Tucker estimates N }.Apart from Q 0 , we also initialize a memory set M 0 = Ω(B, M ), for some maximum memory size M ≥ 0. For any ordered set I and integer Z ≥ 0, we define That is, Ω(B, M ) returns the last min{M, B} elements in B.
If an initialization batch B is not available, the bases in Q 0 are chosen arbitrarily and the initial memory M 0 is empty.In this case, D-L1-Tucker becomes purely streaming.

B. Streaming Updates
When a new measurement Xt , t ≥ 1 is collected 1 we perform a reliability check on it to assess its reliability based on the most recently updated set of bases Q t−1 .Motivated by [27,48], we define the reliability as By definition, the value of r t will be between 0 and 1.If r t = 1, then the bases in Q t−1 perfectly describe Xt .In contrast, if r t = 0, then the set Q t−1 does not capture any component of Xt .Then, we introduce a user-defined parameter τ and consider that Xt is reliable for processing if r t ≥ τ .Otherwise, Xt is considered to be an outlier and it is rejected.
If Xt passes the reliability check, we use it to update the bases and memory as follows.First, we append the new measurement to the most recent memory set M t−1 by computing the extended memory Then, we update the basis set to Q t by running L1-HOOI on Y, initialized to the bases in Q t−1 .
Finally, we update the memory by discarding the oldest measurement, as In view of the above, the cost of the L1-HOOI algorithm remains low across updates because, at any given instance, the extended memory Y will always comprise at most M + 1 measurements.
If Xt fails the reliability check, we discard it and update the bases and memory by setting Q t = Q t−1 and M t = M t−1 , respectively.A schematic representation of the proposed algorithm is offered in Fig. 1.

C. Zero Centering
In specific applications (e.g., in image processing) we are interested to estimate subspaces of zero-centered data.To this end, we can modify the proposed algorithm so that, at every update instance (t − 1), it computes and maintains the mean Then, when Xt is collected, it will be first zero-centered as X c t = Xt − C t−1 and then, if it passes the reliability check, X c t will be used to update the bases, as described above.

D. Adaptation to Subspace Changes
In many applications of interest, the underlying data subspaces change across time.In such cases, an ambiguity naturally rises on whether a rejected measurement was actually an outlier or the nominal data subspaces have changed and need to be tracked.To resolve this ambiguity and allow D-L1-Tucker to adapt, we work as follows.
First, we make the mild assumption that outlying measurements appear sporadically.Then, we introduce a buffer of ambiguous measurements, W, with capacity W > 0. When a streaming measurement fails the reliability check, we insert it to W. If a measurement passes the reliability check, then we empty W. If at any update instance |W| reaches W (i.e., W consecutive streaming measurements were rejected as outliers), then we detect a nominal subspace change.In order to adapt to this change, we empty the memory, we set B = W, and re-initialize (reset) the bases and memory, as described in Section IV-A.Next, the updates proceed as described in Sections IV-B and IV-D.A pseudocode of the proposed D-L1-Tucker algorithm is presented in Fig. 2.

E. Long-Run Efficiency
As measurements are streaming, D-L1-Tucker keeps refining the basis estimates.Naturally, after a sufficiently large number of measurements have been processed, the enhancement rate of the bases estimates can be so low that does not justify the computational effort expended for the update.
In view of this observation, we can enhance the long-run efficiency of D-L1-Tucker by introducing an exponentially decreasing probability ρ t to determine whether or not the t-th measurement will be processed.Intuitively, when a large number of reliable measurements have been processed, ρ t should be low enough to limit the number of updates performed.For example, let us denote by α t−1 the number of consecutive measurements that have passed the reliability check at update instance t − 1.Then, if Xt passes the reliability check, it will be processed with probability ρ t = ρ αt−1+1 , for some initial probability ρ > 0, close to 1.If Xt fails the reliability check, then it is rejected and α t is reset to 0.

F. Parameter Configuration
The performance of D-L1-Tucker largely depends on three parameters: the initialization batch size B, the memory size M , and the reliability threshold τ .Here, we discuss how to select these parameters.
Batch size B: B determines the quality of the initial set of bases.That is, higher values of B will generally offer better set of bases.Naturally, a very large B would contradict the streaming nature of the method.
Memory size M : M determines how many measurements L1-Tucker will process at each time instance.Similar to B, higher values of M can enable superior estimation performance.At the same time, high values of M increase the overhead of storage and computation (cost of L1-Tucker updates).Thus, a rule of thumb is to set M as high as the storage/computation limitations of the application permit.
Reliability threshold τ : For τ = 0, all measurements will be processed (including outliers); for τ = 1, all measurements will fail the reliability check and no basis updates will take place.Appropriate tuning of τ between 0 and 1 may ask for some prior knowledge on the SNR quality of the nominal data.Alternatively, in the sequel we present a data driven method for setting τ .
We start with the reasonable assumption that the initialization batch B is outlier-free.Then, we conduct on B a leave-one-out cross-validation to tune τ .For every i ∈ [B], we first form B i = B \ X i .Then, we obtain the basis set Q i by running L1-HOOI on B i .Next, we capture in r i the reliability of X i evaluated on Q i (notice that X i did not participate in the computation of Q i ).Finally, we set τ to the minimum, median, or maximum value of the cross-validated reliabilities {r 1 , . . ., r B }, depending on the noise-tolerance/outlier-robustness level that we want to enforce.

A. Testing Parameter Configurations
We first study the performance of the proposed D-L1-Tucker algorithm across varying parameter configurations.
We consider T (N = 3)-way measurements X1 , . . ., XT , where for a nominal set of bases . The core tensor G t ∈ R d1×d2×...×dN draws entries independently from N (0, σ 2 s ).N t models Additive White Gaussian Noise (AWGN) and draws entries from N (0, σ 2 n ).O t models sporadic heavy outlier corruption and is non-zero with probability p o .When non-zero, O t draws entries from N (0, σ 2 o ).In order to measure data quality, we define the SNR as and the Outlier-to-Noise Ratio (ONR) Our objective is to recover Q nom by processing measurements Xt∈[T ] in a streaming way.Denoting by Qn the estimate of Q nom n , we quantify performance by means of the Mean Aggregate Normalized Subspace Squared Error (MANSSE) First, we set N = 3, D n = 10 ∀n, d n = 5∀n, B = 5, and T = 30.Moreover, we set σ 2 s , σ 2 n , σ 2 o , such that SNR= 0dB and ONR= 14dB.In Fig. 3, we plot the MANSSE metric versus varying M ∈ {5, 10, 15, 20} and fixed (p 0 , τ ) ∈ {(0.1, 0), (0.06, 0.4), (0.1, 0.6), (0.06, 0.7)}.We observe that the curves corresponding to τ ≥ 0.6 are almost horizontal.This implies that these values of τ are too strict, rejecting almost all measurements.For τ = 0, all measurements are processed (outliers and nominal ones); therefore, we see that the estimation performance improves as M increases, however, the estimation error is somewhat high because of the processed outliers.The curve corresponding to τ = 0.4 exhibits the best performance across the board.
In Fig. 4, we plot MANSSE versus τ for different values of outlier probability p o .We notice that for any τ ∈ [0.3, 0.5], D-L1-Tucker exhibits high, almost identical MANSSE performance independently of p o .This, in turn, suggests that the SNR plays an important role in determining the optimal value of τ , for which nominal measurements will be processed and outliers will be rejected with high probability.For the same study, we present the frequency of rejection versus τ in Fig. 5. Again, we notice that for very low values of τ most measurements are accepted for processing.In contrast, for very high values of τ , most measurements are rejected.Interestingly, this figure suggests that for any given parameter configuration there will be an optimal value of τ for which the frequency of rejection will approach the probability of outliers p o -which, in turn, implies that in general outliers will be rejected and nominal data will be processed.

B. Dynamic Subspace Adaptation
We consider a total of T = T 1 + T 2 streaming measurements, in the form of (13).The first T 1 measurements are generated by nominal bases Q nom,1 .For t > T 1 and on, the measurements are generated by bases Q nom,2 .The We process all measurements by the proposed D-L1-Tucker algorithm for B = 2, M = 12, W = 4, and data driven τ (median of cross-validated batch reliabilities).We also process the streaming measurements with DTA (λ = 0.2, 0.8), LRUT (additional core dimensions k = D − d − 2), and instantaneous HOSVD 2 counterparts.
In Fig. 7, we plot the MANSSE versus update index t.All methods, except for the instantaneous HOSVD, start from a higher MANSSE value and refine their bases by processing streaming measurements until they reach a low plateau.At t = 45, when the outlier appears, we observe that all competing methods suffer a significant performance loss.In contrast, the proposed D-L1-Tucker algorithm discards the outlier and its performance remains unaffected.When subsequent measurements are streaming, the competing methods start recovering until they reach again a low plateau, which is largely determined by the SNR and the parameter configuration of each method.
Interestingly, the instantaneous HOSVD recovers rapidly, after just one measurement, because it is memory-less.DTA (λ = 0.2) recovers faster than DTA (λ = 0.8) but its MANSSE plateau is higher.LRUT also recovers and reaches its plateau performance after it has seen about 10 measurements after the outlier.At time instance 71 the nominal data subspaces shift, affecting all methods expect for the memoryless/instantaneous HOSVD.D-L1-Tucker attains a high value of MANSSE for about W time instances while its ambiguity window is being filled.Right 2 At update instance t, instantaneous HOSVD returns the HOSVD solution of Xt, independently of any previous measurements.after, it rapidly recovers to a low MANSSE value and keeps refining as more measurements are streaming.DTA and LRUT are also adapting to the new underlying structure after processing a few measurements.Another interesting observation is that the low plateau level for each method appears to be the same in the two distinct coherence windows.
In Fig. 8, we plot the reliability of the streaming measurements across updates in accordance with (11).At the same figure, we illustrate the frequency of rejection; that is, the frequency by which measurements fail the reliability check.We notice that the outlier at t = 45 and the W measurements following the subspace change are rejected with probability close to 1.In addition, we observe the instantaneous reliability drop when the outlier appears and when nominal subspaces change.For this value of SNR= −6dB, the reliability level for nominal measurements is about 0.2 and our data driven τ is accordingly low.
We conclude this study by comparing the run time of each method across updates. 3In Fig. 9 and Fig. 10, we plot the instantaneous and cumulative run times, respectively.We observe that the instantaneous HOSVD and DTA exhibit constant run time across updates independently of outliers or subspace changes.D-L1-Tucker also exhibits about constant runtime after its memory has been filled.Moreover, we notice an instantaneous drop in the runtime at index t = 45 which is because D-L1-Tucker discarded the outlier and did not process it.In contrast, when the outlier appears and when the subspaces change, LRUT attains an increase in runtime, as it tries to adapt.

C. Dynamic Video Foreground/Background Separation
Video foreground/background separation is a common task for object detection, security surveillance, and traffic monitoring.The omnipresent background in a static camera scene determines a nominal subspace, while any 3 Reported computation times are measured in MATLAB R2019a, running on an Intel(R) core(TM) i7-8700 processor at 3.2 GHz and 32 GB RAM.
2 and the mean frame C t .Accordingly, we estimate the background as + C t and the foreground as X FG t = Xt − X BG t .We compare the performance of the proposed algorithm with that of DTA, LRUT, OSTD, HOOI (increasing batch), and L1-HOOI (increasing batch).For the last two benchmark approaches, at any frame index t we run HOOI/L1-HOOI on the measurements { Xj } j∈[t] , starting from arbitrary initialization.We notice that DTA is capable of tracking scene changes using a forgetting factor λ. Since the background estimation involves mean subtraction, for a fair comparison with the proposed method, we enable mean tracking for DTA by computing . For all other methods, we compute the mean incrementally at any t as C t = (t − 1)C t−1 + Xt /t.For DTA, we use two values of forgetting factor, λ = 0.95, 0.7 and for LRUT we set the number of additional core dimensions to k n = D n − d − 3.
In Figs.11 and 12, we present the backgrounds and foregrounds obtained by the proposed method, along with other methods under comparison at 75-th frame (scene 1) and 150-th frame (scene 2), respectively.We observe from Fig. 11 that HOOI (increasing batch), LRUT, and OSTD perform similarly with a trail of ghostly appearance behind the person in their respective foreground frames.We notice that OSTD and L1-HOOI (increasing batch) perform better with a smoother trail behind the person in their foreground frames.DTA with λ = 0.7 captures the person in its background, leading to an undesirably smudged foreground estimate.DTA with λ = 0.95 demonstrates a cleaner foreground estimate, similar to that of adaptive mean (background estimated by the same adaptive mean that we use for DTA), however their backgrounds contain a ghostly appearance of the person.The proposed method extracts a cleaner background and foreground owing to its outlier rejection capability.
We demonstrate the performance after the scene changes at t = 100, by presenting the estimated backgrounds and foregrounds at frame index t = 150.From Fig. 12, we observe that HOOI, L1-HOOI, OSTD, and LRUT perform poorly because they are not deigned to track changes in the scene.DTA with λ = 0.95 demonstrates better performance compared to that of λ = 0.7 on frame 75, however, on frame 150, we observe that DTA with λ = 0.95 captures some of the background from scene 1, while DTA with λ = 0.7 obtains a clean background and hence a smooth foreground, wherein the person appears slightly blurry.The proposed method is capable of tracking scene changes and we observe that it obtains a good estimate of the background and a clear foreground.
To quantify the background/foreground estimation performance, we compute, for every frame, the Peak Signalto-Noise Ratio (PSNR) defined as PSNR = 10log 10 255 2 MSE , where MSE is the mean squared error of the estimated background from the ground truth (clean) background.In Fig. 13, we plot PSNR versus frame index and observe that all methods begin with high PSNR and, as they process frames with foreground movement, the PSNR drops.We observe that the PSNR of the proposed method is the highest after approximately frame 25.When the scene changes, the PSNR of all methods drops instantaneously.The PSNR values of HOOI, L1-HOOI, LRUT, and OSTD increase  In Fig. 14, we plot the cumulative run-time versus frame index for the compared methods.For clarity in presentation, we exclude OSTD from this figure, as it is significantly slower and plotting its performance would need a change of scale.For instance, the cumulative run-time of OSTD at frame indices 50, 100, 150, and 200 is (approximately) 390, 762, 969, and 1173 seconds, respectively.Among the other methods, we observe that the increasing memory implementations of HOOI and L1-HOOI consume the highest amount of time, as expected, because they process an increasing number of frames.DTA is the fastest among the tested methods, followed by D-L1-Tucker and LRUT.measurements.We further reduce the size of the matrix measurements by retaining a 250-by-250 area centered at Manhattan wherein most of the activity-in terms of Uber Pickups-occurs.We consider the resulting tensor X uber ∈ R 250×250×183 to be a collection of 183 streaming measurements, one for each day.
Streaming processing: X uber can be seen as a data stream of matrix measurements each of which corresponds to a day.Accordingly, 7 successive measurements across the day index must correspond to a week, which, in turn, is separated into weekdays and Saturdays.We assume that traffic during the weekdays is not the same as traffic on Saturdays and conjecture that weekdays belong in a coherent class/distribution while Saturdays belong to another.
We assume that we are given B = 5 measurements that correspond to weekdays and use those measurements to
is set between 30% and 40 % for every n ∈ [N ].Moreover, we consider that the outlier is only active at instance t = t o = 45.We set N = 3, D n = 10, d n = 3, T 1 = 70, and T 2 = 30.The SNR and ONR are set to −6dB and 18dB, respectively.

Fig. 15 .
Fig. 15.Online tensor compression and classification experiment.Average classification accuracy versus update index.