A Contemporary and Comprehensive Survey on Streaming Tensor Decomposition

Tensor decomposition has been demonstrated to be successful in a wide range of applications, from neuroscience and wireless communications to social networks. In an online setting, factorizing tensors derived from multidimensional data streams is however nontrivial due to several inherent problems of real-time stream processing. In recent years, many research efforts have been dedicated to developing online techniques for decomposing such tensors, resulting in significant advances in streaming tensor decomposition or tensor tracking. This topic is emerging and enriches the literature on tensor decomposition, particularly from the data stream analystics perspective. Thus, it is imperative to carry out an overview of tensor tracking to help researchers and practitioners understand its developments and achievements, summarise the current trends and advances, and identify challenging problems. In this article, we provide a contemporary and comprehensive survey on different types of tensor tracking techniques. We particularly categorize the state-of-the-art methods into three main groups: streaming CP decompositions, streaming Tucker decompositions, and streaming decompositions under other tensor formats (i.e., tensor-train, t-SVD, and BTD). In each group, we further divide the existing algorithms into sub-categories based on their main optimization framework and model architectures. Finally, we present several applications, research challenges, open problems, and potential directions of tensor tracking in the future.


INTRODUCTION
T ENSOR decomposition (TD) has attracted much attention from the signal/image processing and machine learning communities [1].As a tensor is a multiway array, it provides a natural representation for multidimensional data.Accordingly, TD which factorizes a tensor into a set of basis components (e.g., vectors, matrices, or simpler tensors) has become a popular tool for multivariate and high-dimensional data analysis.In particular, we have witnessed significant advances in TD and a rapid growth in its applications over the last two decades [2].Several types of TD, such as CANDECOMP/PARAFAC (CP) [3], high-order SVD (HOSVD)/Tucker [4], tensor train/network [5], t-SVD [6], and block-term decomposition (BTD) [7], have been developed and successfully applied to various domains, from neuroscience [8], [9] and wireless communications [10], [11] to social networks [12], [13].
The demand for (near) real-time stream processing has been increasing over the years since many modern applications (e.g., Internet-of-Things) generate massive amounts of streaming data over time and analytical insights from such data can bring several benefits, e.g., for real-time decision making [14].As its name implies, (near) real-time stream processing needs to immediately deliver and analyse data streams upon their arrival.Since streaming data grow bigger, faster, and become more complex by the time, there exist several inherent problems which are still challenging issues, such as (i) the unbounded size of streaming data, (ii) time-varying model, concept drift, or dataset shift, and (iii) uncertainty and imperfection, etc.We refer the readers to [14], [15] for good surveys on data stream analysis.
When using tensors to represent data streams, TD is generally referred to as dynamic tensor analysis, tensor tracking or adaptive/online/streaming tensor decomposition.Specifically, • Le Trung Thanh is with the University of Orléans, INSA-CVL, PRISME, France and with the VNU University of Engineering and Technology, Hanoi, Vietnam.Email: trung-thanh.le@univ-orleans.fr.• Karim Abed-Meraim and Adel Hafiane are with the University of Orléans, INSA-CVL, PRISME, France.Karim Abed-Meraim is also a member of IUF (Institut Universitaire de France).Emails: karim.abed-meraim@univ-orleans.fr, adel.hafiane@insa-cvl.fr.• Nguyen Linh Trung (corresponding author) is with the VNU University of Engineering and Technology, Hanoi, Vietnam.Email: lintrung@vnu.edu.vn.
factorizing a streaming tensor is nontrivial due to several computational challenges.First, as tensor streams are continuously generated, their volume grows significantly over time and possibly to infinity.Applying the conventional batch TD methods to such tensors is not possible as they require data to be stored and processed offline.Second, properties of streaming tensors (e.g., the low-rank approximation model) can vary with time in unforeseen ways.Moreover, tensor streams often happen in realtime, so retransmission of a stream is difficult, even impossible.Accordingly, batch tensor estimation and decomposition become less accurate when time passes.Last but not least, some modern applications require high-speed data acquisition systems to rapidly acquire and process massive data streams.In such a case, very fast and (near) real-time processing is highly important.However, batch TDs are often of high complexity, and hence turn out to be inefficient.These characteristics make tensor tracking much different from batch tensor decomposition and lead to several distinguishing requirements for tensor trackers, such as low latency and memory storage, high scalability, adaptation to time variation, and robustness, to name a few.
As the literature of tensor tracking has dramatically expanded in recent years, it is imperative to conduct an extensive overview of the state-of-the-art tensor tracking algorithms to help researchers and practitioners to identify: (i) which topics in tensor tracking are significant and emerging, (ii) what kind of tracking models and related analysis techniques have already been deployed to date and how to apply them in specific tasks, and (iii) main research challenges, open problems, and potential directions of tensor tracking in the future.

State-of-the-art Tensor Surveys
The very first introduction to TD was provided by Rasmus in [16] two decades ago.This reference offered a tutorial on CP decomposition covering features, properties, methods, and applications in chemometrics.Since then, there have been many published survey papers which provided different points of view on tensor computation.We can broadly divide them into three classes, including (i) surveys on models, methods, and tools for factorizing tensors, (ii) surveys on general tensor problems, e.g., tensor operations, uniqueness, ranks, filtering, spectral analysis, and Tucker family) and workhorse algorithms for factorizing tensors under these models.Some applications and software for tensors were also mentioned.The second key survey in the literature is the work of Sidiropoulos et al. in [1] that appeared five years ago.To fill some gaps in the existing surveys on CP and Tucker decompositions of that time, the authors provided an in-depth overview of tensors with respect to the following aspects: uniqueness, ranks, bounds, algorithms, and applica-tions.Moreover, an up-to-date list of software and toolboxes for tensor computation was provided therein.To extend beyond the two standard multiway models, Cichocki et al. conducted a comprehensive tutorial on tensor networks in [22], [37].Particularly, its coverage includes tensor network models, the associated operations and algorithms, and their applications.Besides, it also highlighted connections of tensor networks to dimensionality reduction and large-scale optimization problems.Very recently, Liu et al. provided a general overview of tensors from the engineer's point of view in the book [2].It covers various aspects of tensor computations and decompositions, from operations and well-known multiway representations to tensor-based data analysis techniques and practical applications.
However, to date, we are not aware of any survey paper

Notations Descriptions
x, x, X, X scalar, vector, matrix, and tensor, respectively x i 1 ,...,i N / [X ] i 1 ...i N (i 1 , . . ., i N )-th entry of X x = vec(X) vectorization of X X = diag(x) diagonal matrix X with x on the main diagonal X(i, :), X(:, j) i-th row and j-th column of X X ⊤ , X −1 , X # transpose, inverse, and pseudo-inverse of X X (n) = unfoldn(X ) mode-n unfolding of X Y = bcirc(X ) block circulant tensor Y specified by X U (n)  n-th loading factor/matrix •, ⊙, ⊛ outer, Khatri-Rao, and Hadamard product X ×n U n-mode product of X with U, X ⊞n Y concatenation of X with Y along the n-th mode X × 1  n Y mode-(n, 1) contracted product of X with Y X * Y t-product of X with Y (defined in [6]) X ⊆ Y X is a sub-tensor of Y X ∪ Y union of X and Y X \Y relative difference between X and Y {U (n) } N n=1 r i=1 U (1) (:, i) • U (2) Euclidean norm, ℓp norm, and nuclear norm FFT, iFFT Fast Fourier transform and its inverse specifically reviewing the problem of streaming tensor decomposition or tensor tracking.Therefore, it is of great interest to carry out an overview of this topic to enrich the tensor literature.

Main Contribution
In this paper, we present a contemporary and comprehensive survey on the state-of-the-art online techniques which are capable of factorizing tensors derived from data streams.Our survey begins with basic coverage of five common tensor decompositions and their main features.They are CP/PARAFAC, HOSVD/Tucker, BTD, tensor-train, and t-SVD.Two kinds of streaming models are then introduced to represent streaming tensors, namely single-aspect and multi-aspect.Next, we review four main groups of streaming CP decomposition algorithms: (i) subspace-based, (ii) block-coordinate descent, (iii) Bayesian inference, and (iv) multi-aspect streaming CP decomposition.Under the Tucker format, we categorize currently available single-aspect tensor tracking algorithms into two main classes: online tensor dictionary learning and tensor subspace tracking.Multi-aspect streaming Tucker decomposition algorithms are then overviewed.In addition, we provide a short survey on other online techniques for tracking tensors under tensor-train, t-SVD, and BTD formats.Finally, we discuss a number of important challenges and open problems as well as highlight potential directions for the problem of tensor tracking in the future.To the best of our knowledge, our survey offers for the first time a thorough review of techniques for factorizing tensors in an online fashion.Fig. 1 depicts the organization of the paper.For ease of reference, we summarize in Tab. 2 notations which are frequently used in this paper.

TENSOR DECOMPOSITIONS
In this section, we briefly describe the background of the five popular tensor decompositions which have already been deployed to factorize streaming tensors in an online fashion.They are CP, Tucker, BTD, tensor-train, and t-SVD.

CP Decomposition
Under the CP format [16], a tensor X ∈ R I1×I2×•••×I N can be decomposed into a set of N matrices {U (n) } N n=1 sharing the same number of columns as follows where the so-called tensor factor U (n) is of size I n × r with 1 ≤ n ≤ N .The smallest integer r satisfying (1) is referred to as the CP-rank of X .CP is among the best memory-saving format for representing high-order tensors, and hence, it can overcome the curse of dimensionality which particularly limits the order of tensors to be analysed.Under certain conditions, CP decomposition is essentially unique up to a permutation and scale.However, its main disadvantage is that finding the true rank r is known as an NP-hard problem [27].Even though the CP-rank is given in advance, the best rank-r approximation of a tensor may not exist [42].To compute the CP decomposition, one of the most widely-used approaches is based on the alternating least-squares (ALS) technique [18].

Tucker Decomposition
Under the Tucker format [4], we can factorize X into a core tensor G of a smaller size and N factors decomposition is more flexible than CP in the sense that we can write any X in the form (2) and its computation can be done effectively.The two most popular algorithms for computing (2) are HOSVD and Higher-order Orthogonal Iteration (HOOI) [43].Both HOSVD and HOOI offer a good rank-(r 1 , r 2 , . . ., r N ) tensor approximation for X and they can be efficiently implemented in practice.In general, the Tucker representation is not unique but the subspace covering U (n) is physically unique.Therefore, the main interest in Tucker decomposition is for finding subspaces of the tensor factors, and hence, for approximation, dimensionality reduction, and feature extraction [1].

Block-Term Decomposition
Block-term decomposition (BTD) allows to represent X as a sum of low multilinear-rank tensors [7]: where ∈ R In×rn is the n-th tensor factor, and r n ≤ I n ∀ l, n.The BTD format (3) can be considered as a combination of CP and Tucker.As its name reveals, the basic components in BTD are rank-(r 1 , r 2 , . . ., r N ) blocks while they are rank-1 terms in CP.When these blocks are rank-1 tensors (i.e., r n = 1 ∀n), it boils down to CP.When it has only one block (i.e., L = 1), BTD becomes the standard Tucker decomposition.It is worth noting that the number of blocks L relies on the block's size.Like CP, BTD is essentially unique [7].The common approach to find (3) is also based on the ALS technique [44].

Tensor-train Decomposition
Tensor-train (TT) decomposition expresses X as a multilinear product of third-order tensors {G (n) } N n=1 according to where G (n) ∈ R rn−1×In×rn is the n-th TT-core (aka tensor carriage) with n = 1, 2, . . ., N .Here, r 0 = r N = 1 and the quantities {r n } N −1 n=1 are called TT-ranks [5].This type of TD offers several appealing benefits for representing tensors, especially high-order tensors.Given an arbitrary tensor X , we always find a set of TTcores {G (n) } N n=1 satisfying (4) with suitable TT ranks.Besides, its TT-ranks can be effectively estimated in a stable way in contrast to the CP-rank determination [27].TT also offers a memorysaving representation for tensors and can break the curse of dimensionality like CP.With respect to the implementation, the workhorse algorithm to compute TT is TT-SVD [5].

T-SVD Decomposition
Tensor SVD (t-SVD) is another multiway extension of SVD for decomposing tensors in which X is factorized into three tensors U , G, and V of the same order: where U and V are unitary tensors, and G is a rectangle fdiagonal tensor whose frontal slices are diagonal matrices [6].
To define the low-rank tensor approximation under the t-SVD format, the so-called tubal-rank r t is determined as the number of non-zero tubes of G (e.g., when the tensor X is of order 3, where 1 is an indicator function).The t-SVD algebraic framework is quite different from the classical multilinear algebra in other types of TD.Thanks to the t-product and Fourier transform, several linear, multilinear operators and other transformations are successfully extended from matrices to tensors, such as transpose, orthogonality, and inverse.In particular, t-SVD can be obtained by computing SVDs in Fourier domain and its performance (i.e., exact recovery with high probability) can be guaranteed under mild conditions [6].

TENSOR TRACKING FORMULATION
In this section, the problem of tensor tracking is formulated.Specifically, we first divide streaming tensor tracking models into two classes and then construct some terminologies to support the problem statement.They are single-aspect and multi-aspect streaming models.After that, we formulate the general tensor tracking problem which is suitable for many applications.

Single-aspect Streaming Model
In the classical online setting, we are interested in the decomposition of an N -order streaming tensor X t fixing all but one dimension.Without loss of generality, we suppose the last dimension is temporal, and hence, we can write The following definition of temporal slices is useful to formulate the problem of single-aspect tensor tracking.Definition 1 (Temporal slice).Given a streaming tensor Without loss of generality, we assume that I t N = t meaning that at each time instant one new slice of the tensor is observed.Accordingly, the streaming tensor X t can be viewed as a set of temporal slices {Y τ } t τ =1 .In other words, X t is derived from appending the new comming temporal slice Y t to the previous observations X t−1 along the time dimension, i.e., Generally, Y t has the form where L t is a low-rank component, P t is a binary tensor, N t is a noise tensor, and O t is a sparse tensor.The data model (7) is a general form which is suitable for many scenarios.For example, P t represents missing and observed entries of Y t ; N t is an additive white Gaussian noise; and O t denotes the sparse outliers.Meanwhile, the low-rank L t , which can be formulated by any tensor formats, can be static or time-varying.
Based on these terminologies, the problem of single-aspect tensor tracking can be formally stated as follows: Single-aspect Tensor Tracking: At time t, given a temporal slice Y t and old estimates of X t−1 (e.g., core tensors and tensor factors), we want to track the new estimates of X t = X t−1 ⊞ N Y t in time.

Multi-aspect Streaming Model
In some modern online applications, tensor data may evolve in multiple dimensions/modes over time, and thus, the singleaspect streaming model is not useful for modelling such streaming data.In [45], Fanaee-T et al. for the first time introduced the concept of multi-aspect streaming tensors to represent streaming data having more than one dimension increasing with time.Since then, some online algorithms have been developed to deal with the problem of multi-aspect streaming tensor decomposition.
For convenience, we first introduce the definitions of multiaspect streaming tensors and temporal tubes.Definition 2 (Multi-aspect streaming tensor).A set of Norder tensors {X t } t≥1 is called a multi-aspect streaming tensor sequence denoted as {X } when where W t n ≥ 0, 1 ≤ n ≤ N , and X t−1 is a sub-tensor of X t .If X t belongs to such a sequence {X }, we say that X t is a multi-aspect streaming tensor.

Definition 3 (Temporal tube).
Given two successive tensors X t−1 and X t derived from the same multi-aspect streaming tensor sequence {X }, the coming data at time t can be represented by Y t = X t \X t−1 of the same size as X t with entries for 1 ≤ n ≤ N .We say that the non-zero entries in Y t are temporal tubes.
Now, we can state the problem of multi-aspect tensor tracking as follows: Multi-aspect Tensor Tracking: At time t, given temporal tubes in Y t , and old estimates of X t−1 (e.g., core tensors and tensor factors), we want to track the new estimates of It is worth noting that the single-aspect tensor tracking problem also belongs to the class of multi-aspect tensor tracking where most of the tensor dimensions I n are constant by setting W t n = 0, except the last one I t N .Besides, temporal slices may be regarded as frontal slices of the tensor Y t defined as in (8).

General Formulation of Optimization
We here provide a general formulation of tensor tracking which can be used in many applications.In particular, the optimization problem can be written as Here, {G} and {U } denote the set of core tensors and tensor factors respectively, while O is to represent data corruptions by impulsive noise or outliers.Specifically, the three terms in the second line of ( 9) are used to present regularizations or penalty terms imposed on parameters of interest.The last two penalty terms of ( 9) are for the application orientation.The main loss function ℓ(.) is defined to minimize the residual errors between the estimations and observations.

STREAMING CP DECOMPOSITION
The primary objective of this section is to provide technical descriptions of the-state-of-the-art online techniques for factorizing streaming tensors under the CP format.In the literature, there are many streaming CP algorithms and they can be categorized into the following groups: (i) subspace-based methods, (ii) blockcoordinate descent methods, (iii) Bayesian inference, and (iv) multi-aspect streaming CP decompositions.The three former groups are particularly developed for single-aspect streaming models, while the latter is dedicated to the factorization of tensors having more than one temporally varying mode.The readers are referred to Tabs. 3 and 4 for quick comparisons of the existing streaming CP decomposition algorithms.In what follows, we take each group into consideration.

New Observations
Fig. 2. Single-aspect streaming CP decomposition of a 3rd-order tensor.Here, the core I is an identity tensor in which the main diagonal entries (i.e., [I] j,...,j ) are equal to 1 and the remaining ones are zeros.

Subspace-based Methods
The very first study addressing the problem of streaming CP decomposition is of Nion and Sidiropoulos in [46].Specifically, the authors introduced the two novel adaptive CP algorithms called PARAFAC-SDT and PARAFAC-RLS capable of tracking third-order streaming tensors having one temporal dimension.
Both algorithms are based on the subspace-based approach in which we first track a low-dimensional tensor subspace, and then recover the loading matrices from exploiting its Khatri-Rao structure.Following the same line, some other adaptive CP algorithms were proposed for tensor tracking such as CP-PETRELS [53], 3D-OPAST [47], and SOAP [50].In the following, we describe their subspace-based framework for factorizing streaming tensors with time.
First, we recall that the low-rank L t of Y t has the form , where u . Thus, L t can be recast into the form: where H t ∈ R I1...I N −1 ×r plays a role as a mixing matrix while u (N ) t can be viewed as a coefficient vector in subspace tracking problems.Accordingly, streaming CP decomposition can boil down to a constrained problem of subspace tracking where the basis matrix has a Khatri-Rao structure.Particularly for N = 3, the authors in [46], [47], [50], [53] proposed to solve the following objective function: where H = U (1) ⊙ U (2) , y τ = vec(Y τ ), p τ = vec(P τ ), and u τ is the τ -th row of the temporal factor U t , and β is a forgetting factor aimed at discounting the impact of distant observations.Specifically, (11) can be effectively solved by applying the following procedure: t , and then re-update t can be re-estimated as in Step 1 (optional).In stage 1, the authors in [46] proposed two solvers for estimating H t and u t , namely recursive least-squares (RLS) and simultaneous diagonalization tracking (SDT).Chinh et al. in [53] adopted a well-known subspace tracking algorithm called PETRELS.Dung et al. in [47] applied another subspace tracking algorithm for this task, namely OPAST.In [50], the same authors introduced another tracker to estimate H t with rank-1 updates.
a Apache Spark is a unified data analytics framework that supports distributed storage and large-scale processing: https://spark.apache.org/.b PARAFAC2 is a flexible variant of CP [67].While the classical CP model requires the tensor factors to be the same for all tensor slices, PARAFAC2 only requires their cross product to be the same and these factors can be different in size slice by slice.c Holt-Winters is an effective time series forecasting procedure [68].
singular vector of the reshaped matrix from H t (:, i) can provide a good estimate of U t (:, i) and U (2) t (:, i) ← λ i b i Computation of SVD may be expensive when dealing with largescale streaming tensors, we can use the alternative update based on power iteration as follows As these algorithms are only designed for tracking third-order streaming tensors, there are still rooms to develop subspacebased methods capable of handling N ≥ 4.

Block-Coordinate Descent
The second approach is based on the block-coordinate descent (BCD) framework in which we decompose the main optimiza-tion into two main stages at each time t: (i) estimate the temporal factor and the remaining factors.Many tracking algorithms adopt this approach for estimating the low-rank approximation of streaming tensors over time in the literature.We can list here some: TeCPSGD [48], OLCP [49], OLSTEC [56], CP-stream [54], SPADE [59], SOFIA [61], iCP-AM [70], ACP [64], and RACP [65].
In what follows, we review their strategy in each stage.
In stage 1, the general formulation of the optimization to estimate the last row u where ρ u ∥u∥ 2 2 is for avoiding the ill-posed computation and ρ O ∥O∥ 1 promotes the sparsity in O.Then, the temporal factor U (N ) t is obtained by appending the recent updated u Most of the existing BCD-based tracking algorithms suppose that observations are outlier-free (i.e., without O), and hence, they apply the regularized/randomized leastsquares methods for solving (12).In the presence of sparse outliers, ( 12) can be effectively minimized by ADMM or shrinkagethresholding solvers, as presented in SOFIA [61] and RACP [65].
In stage 2, the non-temporal factors {U n=1 can be derived from solving the following optimization min where ρ U R U (.) is a regularization term on U (n) and Here, we can apply one of the two iterative schemes to update U (n) t : the Jacobi scheme supports the parallel and/or distributed processing while the Gauss-Seidel scheme is useful for a sequential (serial) one.The regularization can be ∥U F for slow time-variation, or U (n) ⪰ 0 for non-negativity constraints.Next, we review two common types of solver for optimizing (13): adaptive least-squares filters and stochastic gradient solvers.
a) Adaptive Least-Squares (LS) Filters.We can see that the first term of ( 13) is of a weighted LS form very common in adaptive filtering while the second one is to regularize the estimators.Accordingly, ( 13) can be effectively minimized by adaptive LS filters in general and recursive least-squares (RLS) filters in particular.
In [56], Kasai proposed an exponential RLS algorithm called OLSTEC to minimize (13) when the observations are outlierfree.OLSTEC is, however, designed for third-order streaming tensors only and its complexities are relatively high compared to other algorithms.Thanh et al. in [64] proposed another RLS algorithm called ACP which is capable of dealing with big streaming tensors of higher order (N ≥ 4).ACP is fast and requires much lower complexity than OLSTEC.Very recently, the same authors in [65] proposed a sliding-window version of ACP robust to both sparse outliers and missing data, namely RACP.Interestingly, three algorithms belong to the class of provable online CP algorithms in which their convergence is guaranteed under certain conditions.In [51], Vandecappelle et al. introduced a nonlinear least-squares (NLS) algorithm for computing the streaming CP decomposition of third-order tensors.In particular, the authors recast the objective function of (13) into a truncated exponential window one by incorporating a weighting matrix L = diag [0, . . ., 0, β L−1 , β L−2 , . . ., β, 1] and applied a NLS solver to track the tensor factors with time.Following the same line, Smith et al. in [54] proposed another online CP algorithm called CP-stream.This algorithm has the potential to factorize higher-order streaming tensors as well as support constraints on the CP tracking such as smoothness and nonnegativity.
b) Stochastic Gradient Solvers.Instead of optimizing ( 13) directly, we can minimize its t-th summand: min Three algorithms TeCPSGD [48], OLCP [49], and SOFIA [61] adopt this replacement for tracking tensor factors with time.The main difference among them is the type of R U (.).Besides, they obtain different forms of update: [SOFIA] : [TeCPSGD] : [OLCP] : . It is worth noting that SOFIA is capable of dealing with sparse corruptions.TeCPSGD has the ability to track tensors from missing observations, while OLCP can handle streaming tensors of order greater than 3.
In [70], Zeng et al. proposed an incremental ALS algorithm called iCP-AM to minimize a reinforced version of ( 14) which is defined as where τ .An appealing feature of iCP-AM against other online CP algorithms is that it has a strategy to deal with the variation of the CP rank over time, i.e., to change the number of low-rank components throughout the tracking process.
In parallel, Gujral et al. in [59] proposed an online algorithm called SPADE for tracking tensors under the PARAFAC2 format.Specifically, SPADE tracks a fixed (non-temporal) factor along one mode and allows the other tensor factors (modes) to vary with time.Thanks to its stochastic design, SPADE is fast and memory-efficient.However, the stationary assumption that time variation or concept drift is not allowed limits its applicability.

Bayesian Inference
Besides, another good approach for dealing with the problem of streaming CP decomposition is Bayesian inference.The state-ofthe-art Bayesian-based streaming CP decomposition algorithms are POST [55], BRST [52], and SBDT [71].In general, three algorithms start with a prior distribution of unknown parameters and then infer a posterior that best approximates the joint distribution of these parameters on the arrival of new streaming data.The estimated posterior is then used as the prior for the next update.In this subsection, we briefly describe the two online Bayesian inference frameworks which were already used for tensor tracking: (i) streaming variational Bayes (SVB) and (ii) assumed-density filtering (ADF).Also, prior distributions of parameters of interest are reviewed.a) Streaming Variational Bayes.The two former algorithms POST and BRST adopted the SVB framework [72] which is based on the following Bayes' rule: where Θ denotes the parameters of interest, e.g., tensor factors, CP rank, and other parameters.On the arrival of Y t , SVB first uses the current posterior q t−1 (Θ) := p Θ|X t−1 as the prior of Θ, and then integrates with the likelihood of which can be served as an approximation of the joint distribution p(Θ, Y t ) up to a scale factor.The variational posterior q t (Θ) is derived from maximizing the variational model evidence lower bound (ELBO) L(q(Θ)) = E q log pt (Θ)/q(Θ) which is equivalent to minimizing the Kullback-Leibler (KL) divergence: The optimized form of q t (Θ i ) of ( 21) can be given by where E q(Θ/Θi) [.] is an expectation w.r.t.q over all but Θ i .b) Assumed-Density Filtering.The latter algorithm, SBDT, applied the ADF framework to infer the posterior distribution q t (Θ) over time.Particularly, ADF is an incremental learning framework that allows for computing the approximate posteriors in Bayesian inference for stochastic processes [73].The ADF framework is also grounded on the Bayes' rule (19) but utilizes a distribution from the exponential family (e.g., Gaussian distribution) to approximate the current posterior.Instead of minimizing the KL divergence or maximizing the variational ELBO like SVB, ADF projects pt (Θ) into the selected distribution through moment matching to obtain q t (Θ).c) Prior Distributions over Θ.We list common prior distributions over Θ which were used by POST, BRST, and SBDT.
Prior distribution of tensor factors: All three algorithms assume that the prior over tensor factors is derived from the following Gaussian distribution which is controlled by the hyperparameter λ = [λ 1 , λ 2 , . . ., λ r ]: where u (n) i is the i-th row of U (n) and Λ = diag(λ) denotes the inverse covariance matrix.Here, λ is supposed to follow a Gamma distribution: where Gam λ j |c j , d j = Specifically, the mean and variance of Gam(λ j |c j , d j ) are, respectively, c j /d j and c j /d 2 j which aim to control the magnitude of λ.
Prior distribution of noises: The noise is often assumed to be Gaussian, i.e., N t ∼ i1...i N N (0, τ −1 ) with a noise precision τ > 0. The parameter τ is further assigned to another Gamma distribution p(τ |a, b) = Gam τ |a, b in the same way as for λ.
Prior distribution of sparse components: Only BRST in [52] has the ability to handle sparse outliers.Here, BRST places a Gaussian prior distribution over the sparse O t as where γ is the sparsity precision parameter.If the value of γ i1...i N is large, the corresponding entry in O t is likely to have a small magnitude.By controlling the value of γ i1...i N , we can control the sparsity of O t .
Prior distribution of NN's weights: SBDT in [71] incorporates neural networks (NN) into tensor factorization.SBDT assigns a spike-and-slab prior distribution over NN weights to sparsify the network.Each weight ) where δ(.) denotes the delta function and the binary selection
We refer the readers to Tab. 4 for their key features.In what follows, we first describe the main dynamic tensor decomposition (DTD) framework shared by most of these algorithms and then highlight their characteristics in the following text.
For ease of reference, we denote by the two successive snapshots at t − 1 and t, see Fig. 5 for an illustration (when G t = I).At time t, given X t and the old estimates {U The DTD introduced in [74] offers an online framework for the problem of multi-aspect streaming CP decomposition.Particularly, DTD relaxes the CP representation of X t in the sense that if X t is expressed by {U ∈ R (In+d)×r .Accordingly, DTD enables us to divide X t into two parts X t−1 and Y t = X t \X t−1 in order to take advantages of old estimates.We can first update Ū(n) t−1 with a low cost and then estimate the remaining part t .The tensor factors are particularly derived from min where the loss function ℓ(.) is defined as Here, Ω t denotes the set of observed entries and µ, ρ > 0 are two regularized parameters.Depending on the type of constraints, additional information imposed and the method of optimization, we can obtain several types of estimators for tracking multiaspect streaming tensors with time under the DTD framework.
In [74], Song et al. developed the so-called MAST algorithm for tracking multi-aspect streaming tensors.The authors recast (27) into a constrained minimization and then formed the following Lagrangian function where n=1 and Lagrange multiplier matrices {Λ (n) } N n=1 , and

New Observations
Fig. 3. Online tensor dictionary learning.
η > 0 is a regularization parameter.Since terms of ( 29) are all convex, it can be effectively minimized by several methods.In particular, MAST applies an ADMM solver to minimize (29) in order to balance the trade-off between effectiveness and efficiency in tracking process.Since MAST is not designed for handling sparse outliers, Najafi et al. in [75] introduced a robust version of it called OR-MSTC.In the presence of sparse outliers, they proposed to regularize the objective function of ( 27) by adding an ℓ 1 -norm regularization term λ∥O∥ 1 and replacing Y t with Y t − O in the first term of ℓ(.) in (29).Because the term λ∥O∥ 1 is convex, OR-MSTC also adopts the well-known ADMM method in a similar way to MAST.
In [76], Yang et al. proposed a distributed version of MAST called InParTen2.Thanks to Apache Spark 1 , it can handle largescale streaming tensors efficiently with a limited memory.However, the use of InParTen2 is limited for 3-order streaming tensors only.In [77], Yang et al. introduced another distributed method called DisMASTD capable of dealing with tensors of higher order.One of appealing feature of DisMASTD is that it can avoid repetitive computation and reduce network communication cost.

STREAMING TUCKER DECOMPOSITION
We can broadly categorize the streaming Tucker decomposition algorithms into three main classes: (i) online tensor dictionary learning, (ii) tensor subspace tracking, and (iii) multiaspect streaming Tucker decomposition.Specifically, the first two classes are designed for two specific cases of single-aspect streaming Tucker decompositions, while the latter class is for multi-aspect streaming tensors.

Online Tensor Dictionary Learning
In the class of online tensor dictionary learning methods, we are particularly interested in a specific case of single-aspect streaming Tucker decomposition where the underlying tensor where the core G T is of size In×rn are of fixed size, and the last factor U (N ) is an identify matrix.Specifically, the t-th temporal slice Y t of X T is expressed as where The primary objective here is to estimate G t and incrementally update {U (n) } N −1 n=1 on the arrival of Y t at each time t.In what follows, we review two main approaches to handle this problem.
1. Apache Spark: https://spark.apache.org/a) Incremental Subspace Learning on Tensor Unfolding Matrices.A natural and very first approach for streaming Tucker decomposition is to incrementally update the subspaces covering unfolding matrices of the underlying tensor.The central idea of this approach stems from the fact that the n-th tensor factor U (n) t which is derived from the standard HOSVD is given by where X (n) is the mode-n unfolding matrix of Y τ .Accordingly at time t, we can apply the following dynamic tensor analysis (DTA) framework introduced in [78], [79] to estimate G t and update {U where 0 < β ≤ 1 is a forgetting factor and eig(C t .Since the two steps (33a) and (33b) are generally expensive, there have been some studies offering good modifications or fast alternatives for (33).
In [78]  t .Intuitively, the larger the residual error is, the more U (n) t is updated.The computational complexity of STA is moderate while its effectiveness was demonstrated with the problem of anomaly detection and multiway latent semantic indexing.
In [80], [81], Hu et al. introduced the so-called IRTSA algorithm to track the dominant subspaces {U . Specifically, instead of computing (33a), IRTSA applies a fast incremental SVD (ISVD) proposed by Ross et al. in [100] on the mode-n unfolding matrix (32).Thanks to ISVD, IRTSA shares the same order of computational complexity with STA while offers a better estimation than STA for the problem of background modelling and object tracking.Although the current version of IRTSA is designed for factorizing three-order streaming tensors, it is not difficult to extend IRTSA for dealing with higher-order tensors.Besides, a modified version of IRTSA was introduced by Zang et al. in [82] for the problem of web service recommendation.
In [83], Kuang et al. also proposed an incremental SVD-based streaming Tucker decomposition, namely IHOSVD.In particular, this algorithm performs the following three processes in a serial manner: (i) applies a recursive SVD method to compute singular values and singular vectors of unfolding matrices of the new tensor, (ii) merges the new results with the old estimations from past observations, and (iii) obtains the core tensor with nmode products.Theoretical analyses and experimental results on transportation applications demonstrate the use of IHOSVD.
a Pseudo inputs: a small active pseudo set, which is not necessarily required to be a subset of the real data, is introduced to break the dependencies between outputs and hence avoid the explicit computation of the full covariance matrix.
background and detect anomalies in applications of computer vision.Since RTSL still applies directly the DTA framework, its complexity is relatively high.Thus, it may become inefficient for handling large-scale and high dimensional streaming data.Some other algorithms for streaming Tucker decomposition belonging to this group were presented in [87]- [89], [102], focusing on specific applications such as dynamic brain analysis, smart city services, cyber-physical-social networks and systems.
b) Online Multimodal Dictionary Learning.Another good strategy for the problem of single-aspect tensor tracking is to ap-ply online multimodal dictionary learning (OMDL) techniques.As OMDL is a stochastic version of the multimodal dictionary (multilinear subspace) learning [103], it allows estimating dictionaries (i.e., tensor factors) with one-pass processing.In the literature, there exist some algorithms applying OMDL for tracking the low multilinear-rank component of streaming tensors with time, such as OTDL [92], ODL [104], ORLTM [105], OLRTR [106], D-L1-Tucker [96], and ROLTD [107].
The two former algorithms OTDL and ODL adopt the typical two-step learning procedure to track the tensor factors over time, namely (i) tensor coding or inference of coefficients in the core tensor and (ii) dictionary update per each tensor mode.
Step 1: Tensor Coding.When Y t is observed, the general formulation of optimization for this step is given by: where ρ G R G (.) is a regularization term on the core tensor G to promote sparsity or nonnegativity for instance.Since the first term of ( 35) is differentiable while the second term may admit a proximal operator (e.g., ℓ p -norm), OTDL, ODL, and ROLTD applied proximal methods to minimize it.
Step 2: Dictionary Update.When G t is estimated, the BCD framework can be used to update U (n) t .Specifically, both algorithms optimize the following minimization: (36) with a penalty term ρ U R U (.) on U (n) .Interestingly, (36) can be recast into the standard least-squares cost function which is very common in adaptive filtering theory.Accordingly, OTDL introduced an effective recursive least-squares (RLS) solver to optimize it.Meanwhile, ODL used the stochastic gradient descent method to estimate U (n) t with a low cost.The next two algorithms ORLTM and OLRTR, on the other hand, estimated the tensor factors without the need of tensor coding.In particular, the tensor factor U (n) is directly derived from the following optimization min where the loss function ℓ(.) is defined as Here, R (n) and O (n) play the role of the coefficient and the error, respectively.The main difference between ORLTM and OLRTR is the type of R R (.) used.Specifically, OLRTR uses the simple Frobenius norm regularization R R (R Intuitively, the minimization (37) may be regarded as a robust version of (36) which aims to deal with sparse corruptions.Also, the minimization (38) is not difficult to solve since its terms are all convex.Hence, both OLRTR and ORLTM applied the RLS method to update U (n) t over time.
In [96], Chachlakis et al. proposed a streaming Tucker decomposition called D-L1-Tucker for dealing with streaming tensors.D-L1-Tucker shares the same objective function with ORLTM and OLRTR, but adopts a different approach to handle data corruptions.Particularly on the arrival of Y t , D-L1-Tucker first identifies whether Y t is an anomaly or not based on its reliability which is defined as If r t ≤ τ where τ ∈ [0, 1] is a predefined threshold, Y t is labelled as an outlier slice and then it is disregarded.Otherwise, Y t is considered as reliable and useful for tracking process.In such a case, D-L1-Tucker appends Y t to the memory set Z t = Z t−1 ∪Y t and then applies the batch L1-HOOI algorithm proposed in [108] for factorizing Z t in order to obtain tensor factors.After that, Z t is re-updated by removing the oldest measurement for the next processing.D-L1-Tucker requires a good batch initialization New Observations (fixed size) and its tracking ability is dependent on the threshold τ and the memory size M to store Z t .

Tensor Subspace Tracking
Apart from the model ( 30), X T ∈ R I1×•••×I N −1 ×T and its t-th temporal slice Y t with 1 ≤ t ≤ T can be modelled as follows where the core tensor G ∈ R r1×r2×•••×r N and {U (n) } N −1 n=1 with U (n) ∈ R In×rn are of fixed size except the last factor U (N ) ∈ R T ×r N , and u (N ) t ∈ R 1×r N is the t-th row of U (N ) , see Fig. 4 for an illustration (when N = 3).At time t, given old estimations G t−1 and {U , and n=1 which can compactly represent the temporal slice Y t .We refer this problem to as tensor subspace tracking. 2 It is worth mentioning that single-aspect streaming CP methods also belong to this class as the core tensor G is constrained to be identity.In the literature, there exist some tensor subspace tracking methods which have the potential to deal with a general case of G.Each method adopts a different strategy to factorize streaming tensors.In what follows, we briefly describe their main features in chronological order.a) Augmented Projection.In [85], Baskaran et al. introduced the so-called LRUT algorithm (which stands for Low-Rank Updates to Tucker decomposition) using a randomized projection technique for tracking the low multilinear-rank approximation of streaming tensors over time.When a data stream arrives, LRUT first projects it onto an extended tensor subspace and then forms an augmented core tensor.Specifically, LRUT adds a few more random dimensions to the current tensor subspace defined by old estimations of the tensor factors.The inclusion of some random vectors here plays a role of noise perturbation aimed to prevent the main optimization from getting stuck in local optima.Next, LRUT performs the standard Tucker decomposition (e.g., batch HOSVD or HOOI) on the resulting augmented core tensor to update tensor factors.In this way, we can avoid the computation of SVD on unfolding matrices of the full tensor which is highly expensive in an online setting.However, its computational complexity is still relatively high since LRUT uses several orthogonalization operations on augmented tensor factors and unfolding matrices of the projected tensor slice.
b) Riemannian Optimization.In [86], Kasai et al. developed a Riemannian manifold preconditioning approach for tensor completion.Specifically, its stochastic version can be adapted for 2. This name stems from the following observation: we can recast (40) into the standard vector-matrix form yt = Dut, where yt = vec(Yt), ut = (u ) ⊤ and D is the transpose of the mode-N unfolding matrix of G; {U (n) } N −1 n=1 .Intuitively, it may be regarded as the data model which is very common and widely used in the problem of subspace tracking where we wish to incrementally update D on the arrival of yt at each time t.Since the subspace matrix D has a tensor structure, we label this problem as "tensor subspace tracking" without hesitation.
factorizing incomplete streaming tensors in an online fashion.Since the Tucker format provides an effective representation for tensors in the manifold M r = X ∈ R I1×I2×•••×I N | rank(X ) := r = [r 1 , r 2 , . . ., r N ] , Riemannian optimization can offer a good approach for tensor decomposition and completion [109].Accordingly, the authors proposed an efficient Riemannian gradient based method to estimate the low multilinear-rank component of tensors.The proposed method consists of a rank-one Riemannian gradient computation and a retraction step.Specifically, a novel Riemannian metric on the tangent space of M r and its quotient manifold was introduced to enable the Riemannian optimization framework.Furthermore, a map that combines all retractions on the individual manifolds of tensor factors was used to transform the estimations to the tensor manifold.c) Bayesian Inference.In [97], Fang et al. introduced a Bayesian streaming Tucker decomposition method called BASS-Tucker for handling streaming sparse tensors.Similar to Bayesian methods for streaming CP decomposition, BASS-Tucker adopts the streaming variational Bayes (SVB) framework to infer the posterior of parameters of interest (e.g., tensor core, tensor factors, and nuisance parameters) over time.In addition, BASS-Tucker also utilizes the same priors for the tensor factors and noise variance except that of the core tensor.Here, the following spike-and-slab prior is used to model the core tensor: where S ∈ R r1×r2×•••×r N is a binary tensor, Bern(.|ρ0 ) is the Bernoulli distribution with probability ρ 0 , and δ(.) is the Delta function.We refer the readers to subsection 4.3 for details on prior distributions of {U (n) } N −1 n=1 and other model parameters as well as how the SVB framework works.d) Block-Coordinate Descent.There are three online Tucker algorithms using the BCD framework, including ATD [64], RT-NTD [99] and BK-NTD [99].In general, they go through the following stages when Y t arrives: Stage 1: Estimate the vector u given old estimations G t−1 and {U Stage 2: Estimate the tensor factor The main optimization can be given by min where are respectively the mode-n unfolding matrices of Y τ and W τ .Here, , and where . Here, R U (.), R U (.), and R G (.) are regularization terms on the coefficient u respectively.These penalties can be nonnegativity, smoothness, or sparsity depending on the specific application.
The former ATD algorithm was proposed by Thanh et al. in [64] which is capable of tracking the low multilinear-rank approximation of streaming tensors from highly incomplete observations.In stage 1, ATD particularly recasts (43) into a standard LS optimization and then applies a randomized LS technique to minimize it.In stage 2, ATD introduces a recursive LS solver to optimize (44) in an efficient way.Instead of solving (45) directly, ATD applies the stochastic gradient descent to obtain its solution.
The two latter RI-NTD and BK-NTD algorithms were proposed by Zdunek et al. in [99] for factorizing nonnegative tensors from streaming data.Both algorithms perform nonnegative leastsquare (NNLS) solvers to incrementally update the tensor factors and the core tensor.Particularly, RI-NTD utilizes a recursive strategy involving the nonnegatively constrained Gauss-Seidel method while BK-NTD adopts the block Kaczmarz method.Similar to ATD, both RI-NTD and BK-NTD estimate the core tensor using only the new coming data via a stochastic optimization.

Multi-aspect Streaming Tucker Decomposition
Besides single-aspect streaming Tucker decomposition methods, few online techniques are capable of tracking multi-aspect streaming tensors under the Tucker format over time, such as SITTA in [90] and eOTD in [91].
SIITA in [90] offers an online inductive framework for tracking the low-rank tensor approximation of multi-aspect streaming tensors as well as completing their missing data with side information.On the arrival of new data, SIITA particularly minimizes the following optimization where {S (n) t } N n=1 with S t ∈ R Mn×In is the set of side information matrices and ρ G , {ρ i } N i=1 are regularization parameters.Here, SIITA incorporates the side information into the data model by using {S (n) t } N n=1 as multiplicative terms.Accordingly, SIITA can accelerate the tracking process because the product S (n) t U (n)  transforms the dimensionality of variables from I n to M n , and typically with M n ≪ I n .As every term of ( 46) are convex, SITTA adopts the gradient descent to minimize it.Besides, a simple variant of SIITA namely NN-SITTA was also obtained for nonnegative tensor decomposition.NN-SITTA is specifically derived  Fig. 6.Single-aspect streaming tensor-train decomposition.
from projecting the estimates of SIITA into their nonnegative orthant at each time t.
In [91], Xiao et al. proposed the so-called eOTD algorithm for the multi-aspect tensor tracking problem.Unlike SIITA, eOTD adopts the divide and conquer paradigm to deal with multi-aspect streaming tensors.In particular, it divides the underlying tensor X t into 2 N sub-tensors X (i1,...,i N ) t with i n ∈ {0, 1}, 1 ≤ n ≤ N , and X (0,...,0) t = X t−1 , see Fig. 5 for an illustration.These sub-tensors are grouped into N classes {D n } N n=1 based on the sum of sub-indices.For example, for a third-order tensor, we have , and The factor U where the modified Gram-Schmidt process was applied to compute the orth(.)operation.Finally, the tensor core G t of fixed size is estimated by An appealing feature of eOTD is that throughout the tracking process, eOTD only uses cheap tensor-matrix multiplications and pseudo-inverse operations instead of computing the expensive SVDs on big matrices.This makes eOTD easy for applying to large-scale applications.

OTHER STREAMING TENSOR DECOMPOSITIONS
Apart from the two most popular streaming CP and Tucker decompositions, some methods are capable of tracking tensors under other multiway models.This section focuses on tracking algorithms that exploit TT, BTD, and t-SVD formats to construct the low-rank tensor approximation in the streaming model.

Streaming Tensor-Train Decomposition
Despite success in the batch setting, TT decomposition has not gained in popularity as CP and Tucker for tensor tracking.In the literature, there exist few tracking algorithms developed for the problem of single-aspect tensor tracking under TT see Fig. 6 for an illustration.
In [110]- [112], Thanh et al. proposed three adaptive TT algorithms called TT-FOA, ATT, and ROBOT for factorizing tensors in an online fashion.Particularly, TT-FOA in [110] is, to the best of our knowledge, the very first of its kind in the literature.However, its practical use is limited due to the lack of robustness to data corruption.To overcome the drawback, ATT in [111] and ROBOT in [112] were developed to deal with missing data and sparse outliers, respectively.
All three algorithms share the same optimization framework where BCD and RLS methods are utilized to minimize the cost function.In particular, a general formulation of the optimization problems can be written as min where β ∈ (0, 1] is a forgetting factor to reduce the impact of old observations; n=1 are two regularization terms.Specifically, TT-FOA does not impose the two penalties; ATT adopts to control the smoothness of TT-cores over time; and ROBOT applies the Thanks to the BCD framework, ( 49) can be effectively decomposed into two main stages: (i) estimate the temporal TTcore G (N ) t and outlier O t , and (ii) update non-temporal TT-cores In stage 1, TT-FOA and ATT apply the regularized least-squares method to estimate G (N ) t under the assumption that Y t is outlier-free.Meanwhile ROBOT adopts an effective ADMM solver to account for the sparse outlier O t .In stage 2, an effective RLS solver was introduced to estimate {G and O t (if any) are given in stage 1.In parallel, Liu et al. in [113] proposed an incremental TT method called iTTD to factorize tensors having one temporal mode.Specifically, iTTD considers coming data streams as individual tensors and factorizes them into TT-cores.The results are appended to old estimates derived from past observations.In [114], Wang et al. also developed an incremental TT method called AITT to decompose tensors from industrial IoT data streams.By exploiting a relationship between the directly reshaped matrix and integration of tensor unfolding matrices, AITT can estimate effectively the underlying TT-cores.However, the two frameworks of iTTD and AITT are not really online streaming learning ones but incremental batch learning.Therefore, they are not useful for data streams from dynamical observations in time-varying environments.

Streaming Block-Term Decomposition
The block-term decomposition (BTD) unifies the two well-known CP and Tucker decompositions, and thus, the tracking algorithms under the CP and Tucker formats principally belong to the class of the streaming BTD with one block.When the number of blocks is greater than 2, there are only two BTD methods able to deal with streaming tensors, including OnlineBTD [115] and O-BTD-RLS [116].
The former method was proposed by Gujral et al. in [115] for tensor tracking under the generalized BTD format of L blocks and a multilinear rank-(r 1 , r 2 , . . ., r N ).On the arrival of the temporal slice Y t , OnlineBTD performs the following minimization: where n=1 are supposed to remain unchanged with time except the last tensor factor U (N ) .Prior information of r and rank-(r 1 , r 2 , . . ., r N ) are known in advance.Old estimates of the core tensors and tensor factors of 7. Tracking the rank-(r, r, 1) BTD of the 3-rd order streaming Xt.
X t−1 are used as a "warm start" for OnlineBTD at each time t.To speed up the tracking process, OnlineBTD utilizes (i) an accelerated matricized tensor times Kronecker product, (ii) the pseudo-inverse operator using LU decomposition, and (iii) a dynamic programming strategy introduced by Zhou et al. in [49] to avoid the re-computation of duplicated Kronecker products.
The second method was introduced by Rontogiannis et al. in [116].Specifically, O-BTD-RLS is designed for tracking the low rank-(r, r, 1) terms of three-order streaming tensors (i.e., r 1 = r 2 = r and r 3 = 1), see Fig. 7 for an illustration.In particular, the tensor factors of the underlying tensor are incrementally updated by minimizing the following objective function: Here, ∈ R In×r is the n-th tensor factor of interest and u l , n = 1, 2; u τ and u l are the τ -th row and l-th column of the temporal factor U (3) ∈ R t×L , respectively; W τ = diag(u τ ) ⊗ I r and Ξ = diag(β t−1 , . . ., β, 1); ρ 1 and ρ 2 are two regularization parameters; and η 2 is a small positive number to promote smoothness at zero.Specifically, the former term of (51) has the form of weighted least-squares while the two latter terms are regularizations.Accordingly, an efficient recursive least-squares solver was introduced to minimize (51) effectively.An appealing feature of O-BTD-RLS is that it has the abiltity to reveal the BTD ranks (i.e., r and L) over time by specifying the number of columns of the tensor factors which are non-negligible in magnitude at each time t.

Streaming t-SVD Decomposition
Similar to TT and BTD, streaming t-SVD is still in its early stage.In the literature, there exists only two works of Zhang et al. in [117] and Gilman et al. in [118] addressing the problem of tensor tracking under the t-SVD format.
In [117], Zhang et al. introduced an online tensor PCA for sequential 2D data based on the t-SVD structure.When Y t arrives, the proposed algorithm updates: Step 1: The coefficient matrix W t and the sparse outlier O t from solving the following minimization Step 2: The low tubal-rank tensor U t (a.k.a.basis dictionary) from taking iFFT of the tensor Ût along the third dimension where Ût is specifically derived from Here, t , and the solution Û is a matricization of Ût .
As the online tensor PCA above is not designed for handling missing data, Gilman et al. in [118] proposed another algorithm called TOUCAN which is capable of tracking tensors from missing observations.Specifically, the authors proposed to solve the constrained minimization where is the subsampled inverse Fourier transform, F n ∈ C n×n denotes the Discrete Fourier Transform matrix, the mixing matrix U ∈ R I1I3×rI3 is defined as U = F I3 ⊗ I I1 bcirc(U ) F −1 I3 .Motivated by the so-called GROUSE algorithm for subspace tracking in [119], TOUCAN applies the incremental gradient descent on the tensor Grassman manifold to track U t with time.It is worth noting that the objective function ( 54) is very common in subspace tracking problems.Therefore, we can apply any subspace tracking algorithms which are capable of dealing with missing data to minimize (54) effectively.

APPLICATIONS
Tensor tracking or dynamic tensor analysis has already been found several online applications and this section provides some typical examples in different research fields, from computer vision and neuroscience to anomaly detection.

Computer Vision
We begin this section with one of the earliest and most popular applications of tensor tracking: visual tracking which is an important task in computer vision [120].Naturally, video datasets can be represented as 4-th order streaming tensors of dimensionality, width × height × channel × time.Accordingly, there are several studies devoted to developing tensor-based visual trackers for better modeling the appearance of target objects, such as [80], [121]- [123], to name a few.For example, Hu et al. in [80] proposed the so-called IRTSA tracker using incremental tensor subspace learning to capture the appearance of objects.Zhang et al. in [121] introduced another visual tracker called DTAMU which stands for dynamic tensor analysis with mean update.Weiming et al. in [122] developed a semi-supervised tensor-based visual tracker using graph embedding.Khan et al. in [123] built an online spatio-temporal tensor learning model for visual tracking using Bayesian inference.It is worth noting that most of the existing tensor-based visual trackers correspond to the streaming Tucker decomposition and its variants.
Another notable application of tensor tracking in computer vision is video background and foreground separation which is quite related to visual tracking, but with a different aim of modeling the scene background and detecting the information of changes in the scene.Similar to visual tracking, many tensorbased separators were proposed, such as [65], [105], [112], [124], [125].Particularly in [65], Thanh et al. proposed a robust adaptive CP method called RACP which is capable of modeling video background and detecting moving objects.Li et al. in [105] introduced an online robust low-rank tensor modeling (ORLTM) method and found its success in video background subtraction.Andrews et al. in [124] developed an online stochastic tensor decomposition for background subtraction in multispectral video sequences.A robust streaming tensor-train algorithm was developed in [112] which also has the potential to detect foreground in video.Salut et al. in [125] proposed an online tensor robust principal component analysis and validated its effectiveness with the problem of background and foreground separation.

Neuroscience
The brain can be viewed as a complex system with various interacting regions that can produce large multivariate data over time [130].Many types of brain data can be represented by tensors, such as electroencephalography (EEG), magnetoencephalography (MEG), functional magnetic resonance imaging (fMRI), and near-infrared spectroscopy (NIRS) [131].Apart from three intrinsic modes (i.e., frequency, channel, and time), brain data can have higher-order modes, such as, subjects, conditions, and trials [131].Together with the fact that brain activities can change over time, dynamic tensor analysis has become an useful tool to study the structure and function of brain from such data.
In what follows, we list some appealing brain-computer interface applications to demonstrate the use of dynamic tensor analysis in neuroscience.First, for the problem of detecting dynamic functional connectivity networks (DFCNs), Ozdemir et al. in [87] introduced a recursive tensor-based framework capable of tracking DFCNs over time.The proposed framework was then applied for studying error-related negativity -a brain potential response when patients make errors during cognitive tasks [132].Mahyari et al. in [133] developed a two-step approach using incremental tensor subspace analysis for detecting DFCNs.Particularly, they first detect change points at which the functional connectivity across subjects presents abrupt changes and then summarize DFCNs between successive change points.Recently, Acar et al. in [134] proposed to use the Parafac2 model for tracking the evolution of connectivity networks and compared its performance with ICA and IVA.For the problem of localizing dynamic brain sources over time, Ardeshir et al. in [135] utilized the boundary element method (BEM) [136] and the adaptive PARAFAC-RLST tracker [46] with two operational windowing schemes.A variant using augmented complex statistics in [137] also has the ability to track moving EEG sources with time.
For the problem of online EEG completion, Trung et al. in [138] proposed an adaptive CP algorithm called NL-PETRELS capable of tracking and imputing incomplete EEG data.Thanh et al. in [64], [65] also demonstrated the use of ACP and RACP with real data by applying them for online EEG completion.Other neuroscience applications of tensor analysis were reviewed in [9], [33], [35].

Anomaly Detection
Anomaly detection, which corresponds to identifying patterns and data points that do not conform to normal behavior, plays an essential role in many applications, such as cyber security, statistics, and finance, to name a few [139].Here, we provide some notable tensor-based anomaly detectors which are customized to specific online applications.
Shi et al. in [140] developed the so-called STenSr algorithm for anomaly detection and pattern discovery in spatio-temporal tensor streams from sensor networks.STenSr utilizes an incremental HOSVD and a metric based on Euclidean distance to detect abrupt changes when new data comes.Kasai et al. in [141] introduced an online time-structured traffic tensor tracking framework to detect network-level anomalies from link indirect measurements over time.In particular, it is based on a robust adaptive CP decomposition that uses RLS for tensor tracking and ADMM for detecting abnormal flows.Cao et al. in [142] designed an interactive system called Voila for detecting and monitoring visual anomalies.Voila is a tensor-based anomaly detector with an interaction design that can ranks anomalous patterns based on user input.Lin et al. in [143] proposed a novel method called TBAD to localize anomalous events.TBAD employs a spatial-feature-temporal tensor model and analyses latent patterns through unsupervised learning.Xu et al. in [144] introduced a tensor-based framework, namely SWTF, capable of detecting multiple types of anomalies in road networks.We refer the readers to [34] for a broader interdisciplinary survey of tensors for anomaly detection.

RESEARCH CHALLENGES, OPEN PROBLEMS, AND FUTURE DIRECTIONS
In this section, we first discuss some advantages and disadvantages of the state-of-the-art tensor tracking models.Then, we present several research challenges and open problems that should be considered for the development of tensor tracking in the future.They are data imperfection and corruption; rank revealing and tracking; efficient and scalable tensor tracking; and other aspects such as theoretical analysis, symbolic data, and tracking under some less common tensor formats.Possible solutions for these challenges are also discussed.

Discussions on State-of-the-art Tensor Tracking Models
As reviewed in previous sections, most of the state-of-the-art tensor tracking methods were developed for streaming CP and Tucker decompositions.Indeed, the CP tracking model can be seen as a special case of streaming Tucker decomposition when the core tensor is constrained to be an identity tensor.Therefore, the complexity (w.r.t.computation and memory storage) of streaming CP decomposition is generally much lower than that of Tucker.In other words, CP trackers are often faster than Tucker ones in practice.Another interesting feature of the CP tracking model over Tucker's online variants is the uniqueness in which tensor factors of CP are unique up to a permutation and scale under certain conditions, similar to batch CP decomposition.This feature is a useful property in several applications, e.g., for source separation or to recover exact components and individuals hidden in the underlying data.However, CP is not really a practical model because finding the true CP rank of streaming tensors is nontrivial even when we have enough resources to process multiple snapshots at a time.The problem of tensor rank tracking will be discussed later in subsection 8.3.
Streaming Tucker decomposition is, however, more flexible than CP.It stems from the fact that we here only need to track the column spaces of tensor factors over time, instead of a particular basis.In addition, the Tucker rank determination may be "easier" than that of CP as we can take advances of rank estimation in the problem of subspace tracking (e.g., [156], [157]).However, when we deal with large-scale and high-dimensional streaming tensors, the size of the core tensor can be large which makes Tucker trackers less efficient for stream processing.The BTD tracking model lies between CP and Tucker in the sense that the core tensor is block diagonal.Thus, it has the advantages of both CP and Tucker tracking models.Despite having that, tracking the underlying BTD approximation of streaming tensors may be harder than CP and Tucker as it takes several types of rank into account, i.e., the number of blocks and their size.The tensor-train tracking model shares the same drawback with BTD as it must estimate several TT-cores and their rank.An appealing advantage of streaming TT decomposition is that it can deal with tensors of a very high order, thanks to its memory-saving representation which is linear to the order of tensor [5].The t-SVD tracking model has its own advantage and disadvantage.As its algebraic framework is in the Fourier domain, t-SVD can utilize fast operations (e.g., FFT and its inverse) to speed up the tracking process.Generalizing the t-SVD to tensors of order greater than three can be done via recursion [158] or hot-HOSVD [159] which are quite expensive.It becomes a bottleneck that can make tracking higher-order streaming tensors under the t-SVD model inefficient.

Data Imperfection and Corruption
Dealing with data imperfection and corruption has been a critical issue in many applications and tracking problems in particular [160].We here present two main types of imperfect data that either remain unsolved or are still challenging for tensor tracking: (i) non-Gaussian and colored noises; (ii) outliers and missing data.
a) Non-Gaussian and Colored Noises.Most of the existing tensor tracking algorithms were proposed under the additive white Gaussian noise assumption.This assumption however does not always hold in practice.For example, impulsive noises (e.g., burst, alpha-stable, and spherically invariant random variable noise), which are introduced by human activities and natural sources, are one of the most common non-Gaussian noises that often appear in tracking applications such as direction of arrivals [161], OFDM systems [162] and adaptive system identification [163].This type of noise can significantly impact the tracking ability of estimators and it requires specific treatments [164].In parallel, colored noises that indicate types of noise that are correlated in space and/or time may reduce the performance of tracking algorithms [165].Accordingly, standard tracking algorithms may be less effective in estimation accuracy in the presence of these noises.They need to be readapted or redesigned for more robustness.
To the best of our knowledge, we are not aware of any tensor tracking algorithm capable of handling such noises in the literature.Some potential approaches have been successfully demonstrated in subspace tracking problems (i.e., tracking tensors of order 2), see [164] for a brief survey.In particular, adaptive Kalman filtering and weighted RLS approaches can be adopted for dealing with impulsive noises.Oblique projection and instrumental variable-based techniques can handle colored noises.Therefore, it is desirable to extend these approaches from subspace tracking to tensor tracking.
b) Outliers and Missing Data.They are now becoming more and more ubiquitous in modern datasets.Outliers are data points that appear to be inconsistent with or exhibit abnormal behaviour different from others.Missing observations are often encountered during the data acquisition and collection.Both outliers and missing data can cause several issues (e.g., they introduce bias in estimation) for knowledge discovery from data in general and data streams in particular [166].Accordingly, dealing with them is an essential task in the analysis of corrupted datasets which has been still a hot topic in data mining for decades.In general, handling such corruptions involves removing/ignoring them after detection or replacing them with alternative values.
There exist few tensor tracking algorithms robust to sparse outliers in the literature.Under the CP format, SOFIA [61] applies the robust Holt-Winters forecasting model using a pre-cleaning mechanism to identify and down-weight outliers.RACP [65] introduces a ℓ 1 -norm penalty to promote the sparsity on outliers and then uses an ADMM solver to estimate them.Under the Tucker format, ORLTM [105], OLRTR [106], and D-L1-Tucker [96] are able to deal with sparse outliers.Both ORLTM and OLRTR propose to regularize the main objective function with a ℓ 1 -norm regularization.Meanwhile, D-L1-Tucker adopts a threshold-based method to detect outliers.Except for RACP, most of the mentioned algorithms above are not designed for dealing with missing data.In parallel, most of the existing online tensor completion and tracking are sensitive to outliers, such as TeCPSGD [48], OLSTEC [56], and ACP [64].Accordingly, there are plenty of opportunities to develop robust tensor tracking from incomplete observations as it is still in its early stage.

Rank Revealing and Tracking
Most of the state-of-the-art tensor tracking algorithms suppose that the tensor rank (e.g., CP, Tucker, BTD, TT, or tubal rank) is given as prior information.In practice, it is however a difficult assumption due to the facts that: (i) the tensor rank may change over time and (ii) a good rank determination at the initialization stage is not always guaranteed when the number of training samples is limited and (iii) the exact rank determination may be intractable (e.g., CP rank is NP-hard [27]).Therefore, it is essential to develop tracking algorithms that are capable of revealing the rank over time.
In the literature, there have been many heuristic methods developed for the problem of tensor rank estimation.Most of them adopt the Bayesian approach to infer the tensor rank from data, such as [167]- [169].Theoretically, Bayesian inference offers a good recipe for the tensor rank estimation as we can integrate the low-rank promoting prior as well as the tensor rank into the learning framework.Another possible approach to determine the tensor rank is to use neural networks (NNs), such as [170]- [172].Since the rank can be considered as one type of data feature, NNs which can extract hidden features within data can be used to solve the tensor rank determination.Although these methods often require the tensor data to be fully observed, it is possible to readapt or modify them such that their variant are able to handle tensors in an online fashion.For example, we can adopt online Bayesian inference or online learning algorithms for training NNs.Finally, robustness against rank overestimation errors is an issue that, to the best of our knowledge, has not been considered yet in practice.

Efficient and Scalable Tensor Tracking
Tabs. 3 and 5 indicate that most of the existing tensor tracking algorithms are of high complexity.When we deal with large-scale and high-multidimensional streams, they may become less efficient.Thus, it is necessary to develop efficient and scalable tracking techniques of low cost w.r.t.both computational complexity and memory storage.In what follows, we present three potential approaches which are theoretically capable of accelerating the tracking process, namely (a) randomized sketching, (b) parallel and distributed computing, and (c) neural networks-based methods.
a) Randomized Sketching.It is very well-known that randomized methods can reduce the computational cost of their counterparts while still achieving reasonable estimation [173].
Accordingly, many attempts have been made to take their advantages in computation for tensor decomposition in the literature, we refer the readers to [23] for a good overview.Among them, there are a few online algorithms utilizing successfully randomized techniques to speed up the tracking process, such as [57], [63], [64], [174].Particularly, these algorithms involve solving several overdetermined least-squares (LS) problems.Thanks to the CP and Tucker structures, they use random sampling to build the sampled Khatri-Rao and Kronecker products, and then, recast the original LS problems into randomized ones.Solving the new LS problems can save a lot of computational complexity.Other randomized techniques (e.g., random projections and count sketch) with other tensor formats have not yet been investigated for tensor tracking and they deserve next investigations in the future.
b) Parallel and Distributed Computing.The second approach is to develop parallel and distributed computing frameworks for streaming tensor decomposition.It stems from the fact that we can leverage several computational resources to facilitate the tracking process.Moreover, computing systems in a parallel and distributed environment can offer more reliability than their counterparts in a central one as they can avoid the single point of failure which is a fundamental mistake from flaws in the implementation or design of a system.Besides, another appealing advantage of this computing is the scaling up-andout process in which we can add and/or replace computational resources to the system.We refer the readers to [175] for a good reference.
In the tensor literature, there are several parallel and distributed systems for processing large-scale tensors.We can list here some efficient tools for: (a) distributed CP decomposition (e.g., DFacTo [176], SPLATT [177]), (b) distributed Tucker decomposition (e.g., DHOSVD [88], SGD-Tucker [178]), and (c) distributed TT decomposition (e.g., ADTT [114], ATTAC [179]), etc.These tools mainly distribute the unfolding matrices or sub-tensors among several clusters and integrate their low-rank tensor approximations to find the overall low-rank approximation of the underlying tensor.However, most of the existing distributed tensor decompositions are not suitable for handling streaming data.Therefore, it is of great interest to develop practical distributed systems for tracking tensors from data streams.c) Neural Networks-based Methods.Another potential approach is to incorporate neural networks (NNs) into tensor factorization to benefit from their significant advances in computational power.On the one hand, the connection between TDs and NNs has been established in some studies, such as [180], [181].For example, Cohen et al. in [180] showed that the convolutional NNs with ReLU activation and max/average pooling can be represented by tensor decomposition models.Wang et al. in [181] introduced two NN models for finding the low-tubal-rank approximation of three-order tensors.Accordingly, NN tools can be used to model and learn high-order interactions for tensors, and hence, for tensor factorization and tracking.On the other hand, NNs can directly map data streams (temporal slices) as input to the approximation result as output by applying some online learning techniques.In the literature of machine learning, there exist several kinds of learning capable of dealing with data streams, such as incremental learning, lifelong learning, and online continual learning, to name a few.They can be specifically adapted for tensor tracking.

Others
Next, we present some other issues and problems which also deserve future investigations.a) Provable Tensor Tracking.Although the existing tensor tracking methods can provide competitive performance w.r.t.estimation accuracy and/or convergence rate in practice, most of them lack performance guarantees.The gap between practical uses/implementations and theoretical results in tensor tracking may be caused by the fact that most tensor problems are NPhard [27], e.g., the best rank-1 tensor approximation is NPhard even when all observations (temporal slices) are fully observed.Despite several difficulties, there are still attempts to bridge the gap in the literature.Under certain conditions (e.g., the underlying low-rank model remains unchanged over time), some studies established successfully theoretical results to analyse the convergence behavior of their methods, such as [56], [58], [64], [65], [93].These initial results encourage us to investigate deeper theoretical aspects in tensor tracking, such as time variation, asymptotic convergence, and non-asymptotic convergence in low-sample-size settings.
b) Symbolic Tensor Tracking.In some applications, data may no longer be represented by single (certain) values, but need to be formatted or grouped within sets, intervals, histograms, etc.It leads to the so-called symbolic data analysis (SDA) paradigm in data mining and statistics to deal with such data [182].In SDA, several new variables types and processing tools have been introduced to represent and analyse symbolic data, such as interval-valued, histogram-valued, and categorical modal variables, to name a few.The readers are referred to [182] for a good survey on SDA.In the tensor literature, Mauro et al. in [183] proposed for the first time a symbolic tensor decomposition for factorizing interval-valued tensors under the tensor-train format.Specifically, the authors extended a set of tools aiming to handle interval-valued matrices for high-order tensors and introduced efficient decomposition and reconstruction strategies.As the symbolic tensor decomposition is in its very early stage of development in both batch and online settings, there are a lot of aspects that need to be investigated in the future.c) Tensor tracking under BTD, t-SVD, tensor network formats, and other variants.Despite great success in the batch setting, BTD, t-SVD, and tensor networks (e.g., tensor-train, tensor chain, and tensor ring) have not attracted much attention in real-time stream processing until recently.Thus, developing online methods for tracking tensors under these tensor formats and their variants is essential to take advantage from their advantage in representing large-scale tensors as well as fulfil the gap between the two most common formats and others.

CONCLUSIONS
Tensor tracking has recently gained increasing attention as a powerful tool for multidimensional data stream analysis.In this survey, we have provided a technical overview of online techniques for tracking streaming tensors over time.We highlighted the two most popular streaming CP and Tucker decompositions.Specifically, four main groups of streaming CP decomposition algorithms were emphasized, which are subspace-based, block-coordinate descent, Bayesian inference, and multi-aspect streaming decompositions.We categorized the current streaming Tucker decomposition methods into three major classes based on their model architecture.They are online tensor dictionary learning, tensor subspace tracking, and multi-aspect streaming decompositions.Recent years have also witnessed significant advances in other types of tensor decomposition such as tensortrain, BTD, and t-SVD.A brief survey on the existing methods which are capable of tracking tensors under these formats was presented.Finally, we discussed several research challenges, open problems, and future directions for tensor tracking.
} and then performs the following steps on each vector y (n) m,t : (i) projects it onto the subspace U (n) t−1 , (ii) evaluates the corresponding residual error and the energy for each entry of y (n) m,t , and (iii) updates the matrix U (n)

Structure of the Survey
69]to recover U t (:, i)) ⊤ ).Thus, the right and left

TABLE 3 Main
Features of the State-of-the-art Single-aspect Streaming CP Decomposition Algorithms.

TABLE 4 Main
Features of Multi-aspect Streaming CP Decomposition Algorithms.
, [79], Sun et al. proposed a streaming tensor analysis (STA) algorithm for tracking U

TABLE 5 Main
Features of the State-of-the-art Streaming Tucker Decomposition Algorithms.