Blind Hyperspectral and Multispectral Image Fusion Using Coupled Non-Negative Tucker Tensor Decomposition

Fusing a low spatial resolution hyperspectral image (HSI) with a high spatial resolution multispectral image (MSI) to produce a fused high spatio-spectal resolution one, referred to as HSI super-resolution, has recently attracted increasing research interests. In this paper, a new method based on coupled non-negative tensor decomposition (CNTD) is proposed. The proposed method uses tucker tensor factorization for low resolution hyperspectral image (LR-HSI) and high resolution multispectral image (HR-MSI) under the constraint of non-negative tensor ecomposition (NTD). The conventional non-negative matrix factorization (NMF) method essentially loses spatio-spectral joint structure information when stacking a 3D data into a matrix form. On the contrary, in NMF-based methods, the spectral, spatial, or their joint structures must be imposed from outside as a constraint to well pose the NMF problem, The proposed CNTD method blindly brings the advantage of preserving the spatio-spectral joint structure of HSIs. In this paper, the NTD is imposed on the coupled tensor of HIS and MSI straightly. Hence the intrinsic spatio-spectral joint structure of HSI can be losslessly expressed and interdependently exploited. Furthermore, multilinear interactions of diﬀerent modes of the HSIs can be exactly modeled by means of the core tensor of the Tucker tensor decomposition. The proposed method is completely straight forward and easy to implement. Unlike the other state-of-the-art methods, the complexity of the proposed CNTD method is quite linear with the size of the HSI cube. Compared with the state-of-the-art methods experiments on two well-known datasets, give promising results with lower complexity order.



Abstract-Fusing a low spatial resolution hyperspectral image (HSI) with a high spatial resolution multispectral image (MSI) to produce a fused high spatio-spectal resolution one, referred to as HSI super-resolution, has recently attracted increasing research interests.In this paper, a new method based on coupled nonnegative tensor decomposition (CNTD) is proposed.The proposed method uses tucker tensor factorization for low resolution hyperspectral image (LR-HSI) and high resolution multispectral image (HR-MSI) under the constraint of non-negative tensor decomposition (NTD).The conventional non-negative matrix factorization (NMF) method essentially loses spatio-spectral joint structure information when stacking a 3D data into a matrix form.On the contrary, in NMF-based methods, the spectral, spatial, or their joint structures must be imposed from outside as a constraint to well pose the NMF problem, The proposed CNTD method blindly brings the advantage of preserving the spatio-spectral joint structure of HSIs.In this paper, the NTD is imposed on the coupled tensor of HSI and MSI straightly.Hence the intrinsic spatio-spectral joint structure of HSI can be losslessly expressed and interdependently exploited.Furthermore, multilinear interactions of different modes of the HSIs can be exactly modeled by means of the core tensor of the Tucker tensor decomposition.The proposed method is completely straightforward and easy to implement.Unlike the other state-of-the-art methods, the complexity of the proposed CNTD method is quite linear with the size of the HSI cube.Compared with the state-of-the-art methods experiments on two well-known datasets, give promising results with lower complexity order.

I. INTRODUCTION
YPERSPECTRAL imagery is a technique that utilizes a wide range of electromagnetic spectrum in image acquisition in order to yield more information on what is imaged and identify materials or detect processes.As each band of hyperspectral image (HSI) is the spectral response to a narrow band of the electromagnetic spectrum, it is necessary to collect reflections from a wider area on the scene, decreasing the spatial resolution The authors are with the Department of Electrical and Electronics Engineering, Shiraz University of Technology, Shiraz 715555-313, Iran(e-mail: ma.zare@sutech.ac.ir; ms_helfroush@sutech.ac.ir; kazemi@sutech.ac.ir).
of HSIs.There are several attempts aiming to increase the spatial resolution of HSIs in recent years [1][2][3][4][5][6][7][8][9][10][11][12][13].Basically, all super-resolution HSI approaches can be described in three categories; fusing low resolution hyperspectral image (LR-HSI) and high resolution multispectral image (HR-MSI) as a Bayesian framework [3,11,[14][15][16][17][18][19], non-negative matrix factorization (NMF) based methods [4,[20][21][22][23][24][25] and tensor factorization based methods [1,2,8,[26][27][28][29]. Bayesian frameworks build the posterior distribution based on observed LR-HSI and HR-MSI and some prior information or regularization term and utilize the alternating direction method of multipliers (ADMM) [11] optimization method to estimate super-resolution HSI.In [11] a Bayesian framework based on a sparse representation is introduced to solve the fusion problem, thus, the abundance fractions are estimated using ADMM.In our previous work [30], a smooth graph signal modeling in a Bayesian framework is employed to impose the consistency of subspace projection fractions corresponding to the nearby nodes by exploiting the graph Laplacian matrix.The method proposed in [31] exploits non-local self-similarity based on a spatial and spectral sparse representation to estimate both matrices of abundance fractions and endmembers, which the product of these two matrices leads to the super-resolution HSI.The method proposed in [5] uses spectral unmixing and Bayesian sparse representation to enhance the spatial resolution of hyperspectral images.The most important deficiency of the Bayesian frameworks is the necessity to the regularization terms.They should be able to comprehensively represent spatial information of HSIs, which is lost in the HSI matricizing of HSIs stage.
As an alternative approach, non-negative matrix factorization (NMF) has been widely proposed to fuse pairs of LR-HSIs and HR-MSIs.This methods brings the advantages of clear physical, statistical, and geometric inference, flexible modeling, and less requirement on prior information [32].As HSIs are essentially non-negative data, NMF frameworks are quite compatible with the observed data.These approaches first unfold the HSI into the matrix form and factorize it into two non-negative matrices, called as the basis and the coefficient matrices.In order to avoid trapping in the local minima due to its non-convex objective function, several constraint NMF methods [4,22,23,33] are introduced.Instead of the original L1 regularizer, the work in [22]

H
abundance fractions to enforce the sparse estimation.In [23] a multilayer NMF spectral unmixing is introduced, where the products of the sub-layer output factors lead to the superresolution image.In [33], using sparsity-constrained deep NMF with the total variation regularization, to estimate HR-HSI.Noteworthy, stacking a 3D data into matrix form in NMFbased approaches loses the neighborhood structures, smoothness, and continuity characteristics.In this regard, tensors or multiway arrays have been frequently used in multidimensional data analysis [26,28,[34][35][36].Additionally, exploiting the power of multilinear algebra of the tensor representation shows more flexibility in the choice of constraints that match data properties.It also can extract more general latent components in the data compared with the matrix-based approaches.More recently tensor-based representation of HSI is widely being used [1,8,26,28,32].Representation of an HSI as a third order tensor is a structural and natural without any information loss model.In most of these methods, a low rank tensor representation is exploited to estimate the original HR-HSI.It offers the benefit of extremely noise and memory usage reduction, and extracting discriminative features [32].Accordingly, two well-known tensor decomposition, Tucker decomposition and canonical polyadic decomposition (CPD) are widely used to approximate HR-HSIs [8,26,28,29,37,38].Learning a low tensor-train rank representation is incorporated in [26] for hyperspectral image super-resolution.In [28] a nonlocal coupled canonical polyadic (CP) tensor decomposition framework is used to fuse hyperspectral and multispectral images.The spatio-spectral sparsity prior of HSIs is the constraint that is imposed on the core tensor in the CSTF method [8].Also in [27], nonlocal sparse tensor factorization (NLSTF) of HSIs is proposed to model non-local self-similarity of HSIs.
As a strong prior knowledge, HSIs are naturally non-negative data.Thus, applying non-negative constraints to the tensor factorization method is expected to further improve the performance of the fusion method.In this paper, we extend NMF to a tensor framework which is called non-negative tensor decomposition (NTD).Contrary to the conventional NMFbased methods, where, the spectral, spatial, or their joint structures must be additionally imposed as a constraint, in this paper, a multiway representation of HSIs preserves the spatiospectral joint structures of HSIs with no prior knowledge resulting a blind framework.We take spectral advantages of LR-HSI and spatial advantages of HR-MSI, via straightly applying coupled Tucker decomposition to both of the above images.Therefore, the proposed method is called CNTD.The proposed method comprehensively models multilinear modes interactions of HSIs, using Tucker tensor representation, where the core tensor precisely expresses relations among different modes.The proposed algorithm is straightforward and easy to implement, where, the complexity is quite linear with the size of the hyperspectral data cube.The main contributions of this paper are highlighted below.
 Extending NMF to a tensor framework, which is called NTD.
 Applying coupled Tucker decomposition to the LR-HSI and HR-MSI, to estimate spectral and spatial information of HR-HSI, respectively. Preserving spatio-spectral joint structures of HSIs with no information loss and no prior knowledge requirement, using a multiway representation of HSIs. Giving Lower complexity order comparing with the other state-of-the-art methods.The remainder of this paper is organized as follows.Some preliminaries on tensors are presented in Section II.Section III formulates the HSI-MSI fusion.The proposed coupled nonnegative tensor decomposition (CNTD) method for blind superresolution HSI is introduced in Section IV.The complexity order of the proposed method is mentioned in Section V.The experimental results on two well-known datasets Pavia University and Indian Pines, are illustrated in Section VI.Finally, conclusion and our future work are given in Section VII.

II. PRELIMINARIES ON TENSORS
An N-dimensional tensor  ∈ ℝ  1 × 2 ×…×  has N indices  1 ,  2 , … ,   and its elements are denoted by   1  2 …   where 1 ≤   ≤   .Tensor matricization unfolds an N-dimensional tensor into a matrix.The mode- matricization of  reorders/unfolds the elements of  to form the matrix  () ∈ ℝ   × +1  +2 …   1  2 … −1 .Noteworthy that, matricization of a tensor is quite analogous to matrix vectorization.The mode-n product of the tensor  ∈ ℝ  1 × 2 ×…×  by the matrix  ∈ ℝ   ×  , is defined by  ×  , is an N-dimensional tensor ℳ ∈ ℝ  1 × 2 ×…×  ×…×  , which entries are calculated by The mode-n product  ×   can also be calculated in matrix form as  () =  () .For multiple mode-n product the order is irrelevant, which is and for multiple mode-n products with the same modes, the order is relevant, which is The scalar product of two tensors ,  indicated as . The Frobenius norm of a tensor  is indicated as ‖‖  = √< ,  > .
The Kronecker product of two matrices  ∈ ℝ × and  ∈ ℝ × is a matrix denoted as ⨂ ∈ ℝ × is defined as The other properties of Kronecker product and vectorization operation ((•)) that are used in this paper are as below All basic notations are represented in Table.I.

III. PROBLEM FORMULATION
As HSIs are naturally a 3D data, the tensor is a more efficient representation than matrix form, and we can benefit from its ability to exploit intrinsic structures of HSI and multilinear interactions of its different modes.Hence in this paper, the target HR-HSI is denoted as a three-dimensional tensor  ∈ ℝ ×× , where ,  and  are the dimensions of the width, height and spectral (depth) modes, respectively.It is formally expressed as which is called Tucker representation, where  ∈ ℝ ×  ,  ∈ ℝ × ℎ and  ∈ ℝ ×  are width, height and spectral dictionary matrices, respectively.  ,  ℎ and   are the number of each mode dictionaries atoms, and  ∈ ℝ   × ℎ ×  is the core tensor that shows the interactions among different modes.Mode-n ( = 1,2,3) matricization of  are as follows where (⋅)  denotes the transposition of the matrix.Both LR-HSI and HR-MSI are also denoted as threedimensional tensors  ℎ ∈ ℝ ×ℎ× and   ∈ ℝ ×× , respectively., ℎ and  are the indexes of the width, height and spectral modes, respectively.LR-HSI and HR-MSI are expressed as below where   ∈ ℝ ×  ,   ∈ ℝ ×  and   ∈ ℝ ×  are width, height, and spectral dictionary matrices, respectively, and   ∈ ℝ ×ℎ× and   ∈ ℝ ×× are the independent and identically distributed (i.i.d.) noise of  ℎ and   , respectively.Conventionally, LR-HSI and HR-HSI are considered to be the spatial and spectral down-sampled version of HR-HSI, respectively.Therefore, the LR-HSI acquisition process can be formulated as ) where  1 ∈ ℝ × and  2 ∈ ℝ × are spatial separable downsampling operators of width and height modes, respectively.They are point spread functions (PSF) of imaging sensor and assumed to be known.Similarly, HR-MSI can be expressed as where  3 ∈ ℝ × is spectral down-sampling operator, which is spectral response function (SRF) of the imaging sensor and assumed to be known.

IV. PROPOSED CNTD APPROACH FORMULATION
The goal of fusing LR-HSI and HR-MSI is to estimate a high spatio-spectral target image of HSI.Since  ≪  , ℎ ≪  and  ≪ , the super-resolution problem is severely ill-posed, some prior information is needed to regularize the fusion problem.Orthogonality and statistical independency of basis vectors in the Tucker representation, sparsity, smoothness, and nonnegativity of HSIs are some constraints that help to find a unique solution for the super-resolution problem of HSIs [39][40][41].
Accordingly, in this paper, we propose a new method based on coupled non-negative tensor decomposition (CNTD).The proposed method performs Tucker tensor factorization for LR-HSI and HR-MSI subject to non-negative tensor decomposition (NTD).The original NMF method inherently loses spatiospectral joint structure information when unfolding a 3D data into the matrix form.Therefore in this paper, we impose NTD to the both tensor of HSI and MSI straightly.The CNTD method effectively combines multiple data tensors, where the intrinsic spatio-spectral joint structures of HSI can be losslessly represented and interdependently exploited.The proposed CNTD method is illustrated in Fig. 1.
Considering ( 11), (13), and ( 14 where ‖⋅‖  denotes the Frobenius norm.Hyperspectral and multispectral data fusion based on non-negative Tucker decomposition are achieved by the estimation of the corresponding dictionaries and the core tensor.NTD is straightforward to formulate and easy to implement likewise the conventional NMF.Non-negative Tucker tensor decomposition attempts to decompose a non-negative data tensor into the multilinear products of a non-negative core tensor and nonnegative mode dictionary matrices [42].We execute the multiplicative update rule (MUR) to minimize the predefined optimization problems (20) and (21).We perform the multiplicative updating algorithms for NTD, which we directly derive from NMF multiplicative algorithms.The convergence to local optima under the non-negativity constraints has been proven in [20,43], which can be applied to our case as well.

A. Updating mode dictionary matrices
Updating algorithms for each mode dictionary matrices can be easily derived by matricizing the Tucker model into associated modes.We use extended MUR for the NTDs of  ℎ , hence the first mode matricization of  ℎ is as below where   (1) and  (1) are the first mode matricization of LR-HSI ( ℎ ) and the core tensor (), respectively.Equation ( 22) can be treated as the conventional NMF, where each fraction is updated using MUR.Considering (22), the first mode dictionary is updated as where the fraction line is used here to denote the element-wise division.
Analogously the second mode dictionary matrix can be updated using the second mode matricization of  ℎ , which is as and the second mode dictionary matrix is updated as below Equivalently the third mode matricization of  ℎ is   (2) ≈  (3) (  ⨂  )  .Therefore, the spectral mode dictionary () is updated as which can be treated as the conventional NMF as well.
Incorporating MUR to calculate the core tensor (), which is as considering (10) and ( 22) the numerator of ( 28) is as and its denominator considering (10) As a result the core tensor  is updated as We also perform MUR for   in a similar way, which is detailed for  ℎ .Accordingly updating relations for   factors are given as follows The CNTD proposed algorithm starts from NTD for the LR-HSI owing to its spectral information.As the initialization phase, we use the method used in [8]. ℎ ,  ℎ ,  and  via ( 16), ( 17), (26), and (31), respectively while the other variables are fixed to inherit the reliable spatial information obtained from multispectral data.Then,   ,   ,  and  are alternately updated via ( 23), ( 25), (26), and (31), respectively until convergence of the cost function in (20).
The next step of the proposed algorithm is applying NTD to the HR-MSI.As the initialization phase,   is set by (19) and ,  and  are updated using (32), (33), and (35), respectively benefiting the spectral information of LR-HSI.As the optimization phase, , ,   and  are alternately updated by ( 32)- (35) while the other variables are fixed, until convergence of the cost function in (21).The superresolution HSI is calculated using the estimated core tensor and mode dictionary matrices.Input: LR-HSI ( ℎ ), HR-MSI (  ).
V. COMPUTATIONAL COMPLEXITY Finally, we analyze the computational complexity of the proposed method.According to Algorithm 1, the proposed method includes two sub-optimization problems, which engage MURs to estimate  ℎ and   factors.Each sub-optimization problem mainly contains four updating steps.Therefore, the overall computational complexity of the proposed algorithm is briefly expressed as (  ) + (  ℎ   ) + (   ℎ   ) + (   ℎ   ) + ( ℎ   ) + (  2 ) + ( ℎ 2 ) + (  2 ) + (   ℎ 2   2 ) + (  2  ℎ   ) Noteworthy, the complexity of the proposed method outperforms the other state-of-the-art methods [3,8].Unlike the CSTF method [8] and SSSR method [3], As in (36), It is observed that the complexity of the proposed algorithm is quite linear with the size of HSI cube (, , ), which is the same as that of conventional NMF algorithm.Owing to the fact that each update step of the proposed CNTD method can be considered as an NMF problem.

A. Data sets
The proposed CNTD-based method is performed on two wellknown data sets.The first data set is the Pavia University image [45], which is acquired by the reflective optics system imaging spectrometer (ROSIS) optical sensor upon the urban area of the University of Pavia, Italy.The reference HR-HSI of size 120 × 120 × 93 with spatial resolution 1.3 m per pixel and water absorption bands (with a spectral range from 0.43 to 0.86 m) removal is illustrated in Fig. 2. The LR-HSI is produced by applying a Gaussian blurring filter and down-sampling it by a factor of 4. Therefore, the LR-HSI size is 30 × 30 × 93, which is shown in Fig. 2. The HR-MSI with the size of 120 × 120 × 4 is constructed using the IKONOS like reflectance spectral response function depicted in Fig. 3. (for more details about spectral response and spatial blurring functions see [44]).
The second data set is the Indian Pines image, which was captured by NASA Airborne Visible and Infrared Imaging Spectrometer (AVIRIS) [46].The reference image is of size 120 × 120 × 224 across the spectral range from 0.4 to 2.

B. Evaluation criteria
The performance of the proposed method is validated using the five indexes given below.The first index is the spectral angle mapper (SAM) which measures the spectral distortion between estimated HR-HSI ( ̂) and reference image (), where  ̂ and   are pixel spectral responses of  ̂ and , respectively.It is defined as while the ideal SAM value is zero degree.The second index is the root mean square error (RMSE) to evaluate the quality of estimated HR-HSI ( ̂) compared with the reference image ().It is calculated as The third evaluation index is error relative globaldimensional synthesis (ERGAS), which also measures the spectral distortion between the estimated HR-HSI ( ̂) and the reference image (), which is defined as  (39) where  ̂,: and  ,: are the i th bands of  ̂ and , respectively.  ,: is the mean of  ,: .In the case of perfect reconstruction, the ideal value of ERGAS is zero.
The fourth index is the degree of the distortion (DD), defined as where ‖⋅‖ 1 is ℓ 1 norm, () and ( ̂) are vectorization of tensors  and  ̂, respectively.Note that the smaller DD, the better spectral quality.The fifth index is the universal image quality index (UIQI) [47] The UIQI ideal value is one.All the experiments with this subscene are implemented using MATLAB R2016a version run by an Intel Core I5 at 3.4 GHz and 32-GB random access memory computer.

C. Parameters discussions
In order to evaluate the sensitivity of the proposed CNTDbased method w. r. t. its essential parameters including the number of mode (width, height, and depth) dictionary atoms   ,  ℎ and   .We run the proposed method for the different number of mode dictionary atoms.Fig. 4 (a), Fig. 4 (b), and Fig. 4 (c) show the RMSE of the estimated Pavia University and Indian Pines data sets as functions of the number of mode dictionary atoms   ,  ℎ and   , respectively.As can be seen from Fig. 4 (a) and (b), the RMSE for both data sets has a steep fall when   and  ℎ vary from 5 to 200.However, when they grow higher, the RMSE does not change obviously.Therefore, the   and  ℎ are set as 167 for both data sets.As Fig. 4 (c) shows, the RMSE for Pavia University decreases as   varies form 5 to 40.For Indian Pines, the RMSE curve decreases as   varies form 5 to 100, and then they do not change obviously as   increases further.Hence, we set   as 100 for both Pavia University and Indian Pines data sets, although it gives acceptable results for much lower values of   .Consequently, the proposed CNTD method needs the larger width and height mode dictionaries, and a smaller spectral mode dictionary.The reason is that the spectral signatures of HSIs generally live in low dimensional subspaces.

D. Comparison with the other fusion methods
The proposed fusion method is compared with state-of-the-art methods, which are shown in Table III and IV for Pavia University and Indian Pines data sets, respectively.The comparisons include RMSE, SAM, DD, ERGAS, UIQI, for all approaches.It can be seen that the proposed method outperforms the other competing ones in terms of RMSE DD, and UIQI indexes and promising results for the other indexes.Noteworthy, as can be seen from ( 36), the proposed method achieves lower complexity order than some of the state-of-theart methods such as [8] and [3] which have non-linear complexity order with the size of HSIs cube.
In order to compare the performance of the competing methods in preserving spatial structures, the error images which reflect the differences between the estimated HR-HSI and reference image for the 30 th band of both data sets are shown in Fig. 5.It shows the error images of the LR-HSI, CSTF [8], NLSTF [27] and proposed CNTD method.It can be seen that the proposed method can estimate spatial details of the HR-HSI with much lower error than the others for both data sets.For more visual comparison the 30 th band of LR-HSI and estimated HR-HSI of the CSTF [8], NLSTF [27] and proposed CNTD method are compared with the reference HR-HSI in Fig. 6.It shows that the proposed CNTD method can correctly estimate most of the spatial details of the HR-HSI, though there are a few distortions in the fusion results.

VII. CONCLUSION
The main object of this paper is to extend NMF to a tensor frame and perform it on LR-HSI and HR-MSI to estimate the super-resolution image.The introduced basic NTD model should be treated as/emphasized like /the conventional NMFbased model, which is quite general in nature with no spatiospectral constraints.According to this, the proposed method is called blind hyperspectral and multispectral Images Fusion using CNTD.The proposed CNTD-based method is compared with the state-of-the-art methods, which gives promising results with quite linear complexity order with respect to the size of the HSIs cube.
As our future work, we can incorporate some prior information, such as spectral self-similarity, sparsity, smoothness, and local consistence besides the non-negative tensor decomposition, they may help to find better unique basis vectors in a Tucker representation.

Fig. 2 .Fig. 3 .
Fig. 2. The first and second rows show the composite color images of Pavia University and Indian Pines data sets, respectively; (a) HR-MSI, (b) LR-HSI (c) reference image.

Fig. 4 .
The RMSE as functions of the number of atoms   ,  ℎ and   for the proposed CNTD method; (a)   , (b)  ℎ , (c)   .

Fig. 5 .
Fig. 5.The error images of 30 th band of Pavia University and Indian Pines data sets; The first row shows the error image of LR-HSI for (a) Pavia University data set, (b) Indian Pines data sets, (c) reference image.The second and third rows show the error images of the estimated HR-HSI for Pavia University and Indian Pines data sets, respectively; (d) and (g) CSTF method[8], (e) and (h) the NLSTF method[27], (f) and (i) proposed CNTD method.

Fig. 6 .
Fig. 6.The first and second rows contain the error images of 30 th bands of Pavia University and Indian Pines data sets, respectively; (a) LR-HSI, (b) the CSTF method [8], (c) the NLSTF method [27], (d) the proposed CNTD method, (e) reference image.
. It is calculated for windows of size 32 × 32 and averaged over all windows.The UIQI between the i th band of  ̂ and  is given by is the number of windows,  ̂ and    are the j th window of i th band of the reference image and reconstructed HR-HSI, respectively,      ̂  is the sample covariance between    and  ̂ ,

TABLE III QUANTITATIVE
[45]ICS OF THE DIFFERENT FUSION METHODS ON THE PAVIA UNIVERSITY DATA SET[45]