considering pilot contamination and spatially correlated channels

In this Letter, the authors present a study on linear channel estimators and their respective mean square error expressions acknowledging spatially correlated channels and pilot contamination. They also inves- tigate the impact of imperfect channel covariance matrix knowledge.

In this Letter, the authors present a study on linear channel estimators and their respective mean square error expressions acknowledging spatially correlated channels and pilot contamination. They also investigate the impact of imperfect channel covariance matrix knowledge.
Introduction: In real propagation environments, channels are spatially correlated [1], which means that the elements of the channel are to some extent, correlated. Generally, the channel covariance matrices exhibit spatial correlation substantiating in the different diagonal elements and non-zero off-diagonal elements. This Letter studies linear channel estimators for multi-cell multi-user massive MIMO systems with spatially correlated channels and pilot contamination.
System model: In this work, we consider a multi-cell multi-user system with L cells where each one of the cell has a base station (BS) at its centre with M co-located antennas and K single antenna users. We consider correlated Rayleigh fading channels, and therefore, the M × 1 channel vector from the kth user in the lth cell to the M antennas at the ith BS is defined by g ilk = [g ilk1 , g ilk2 , . . . , g ilkM ] T CN (0 M , R ilk ), where R ilk [ C M ×M is the positive semi-definite channel covariance matrix. It is important to notice that in our case, R ilk is not a scaled identity matrix, but describes the spatial propagation environment and array geometry (i.e. it describes macroscopic effects, which include the average path-loss in different spatial directions and the spatial channel correlation).
We assume that users in different cells transmit at the same timefrequency resources (a typical scenario in massive MIMO) and that the pilot reuse factor is one, the worst possible use case scenario. The N-length pilot sequence sent by the kth user, The received uplink training sequences at the ith BS can be represented as a M × N matrix, which is defined as where p is the pilot power or average pilot signal to noise ratio and N i is a M × N noise matrix with independent and identically distributed elements following CN (0, 1).

LS channel estimation:
A sufficient statistic for estimating the channel vector, g iik , at the ith BS is given bŷ where The estimation error vector, The mean square error (MSE) per antenna of the LS estimator is given by where Tr[ · ] is the Trace operator. As can be seen, the LS estimator does not rely on any prior information on the channel statistics, such as the large-scale fading coefficients. Additionally, this estimator is known to have inferior performance than the MMSE estimator [2].  Remark 3: From (2) we see thatĝ LS ilk =ĝ LS iik , ∀l, which means that the channel estimates are parallel vectors and therefore, the BS is unable to separate these users that transmitted the same pilot sequence.

MMSE channel estimation:
The MMSE channel estimator of g iik , ∀k, based on the observation Y i at the ith BS is defined bŷ Due to the MMSE properties under the Gaussian model, the channel estimateĝ MMSE iik and the estimation error,g MMSE is uncorrelated with the de-spread received vector, z ik , and is consequently independent of it as both are jointly complex Gaussian distributed. The MSE per antenna of the MMSE estimator is given by Remark 4: If the elements of g iik are i.i.d. circularly-symmetric complex normal variables for all i, l, and, k,

Remark 5: Due to pilot contamination
Remark 6: If R iik is invertible, then from (4) we see that , ∀l. In [3], the authors show that if R ilk , ∀l are mutually asymptotically linearly independent, then the channels are not parallel vectors and consequently, the BS is able to separate users transmitting the same pilot sequence.
Remark 7: If g iik is an i.i.d. complex Gaussian vector, then, again from , ∀l, meaning that the channel estimates are parallel vectors that only differ by the scaling factor, b ilk /b iik , and therefore, the BS is also unable to separate users transmitting the same pilot sequence.
Approximate MMSE estimation: In general, the acquisition of the covariance matrices, R ilk , ∀i, l, k, is a daunting task as it involves the estimation of LK matrices. However, a simple but yet effective solution to this problem comes from the observation of (4) and the finding that the sum of all covariance matrices plus the inverse of the pilot power, i.e. Q ik , can be estimated. Therefore, we estimate Q ik and replace it back into (4). The classical approach to this estimation problem is to approximate the covariance matrix with the sample covariance matrix. The estimation of Q ik is based on the fact that E[z ik z H ik ] = Q ik , which can be approximated by the sample covariance matrix where z ik (n), n = 1, . . . , N Q are the N Q different observations of (2).
Additionally we see that E[Q ik ] = Q ik . Note that this estimate is obtained from the several observations of the de-spread pilot signals, z ik , used for channel estimation, and thus, no extra pilots are necessary. The sample covariance matrix almost surely converges to the true covariance matrix as N Q which follows directly from the law of large numbers and the fact that the channels are assumed ergodic. Considering [Q ik ] j and [Q ik ] j as the jth columns of theQ ik and Q ik matrices, respectively, then Remark 8: If g ilk is an i.i.d. complex Gaussian vector, then The errors in all the M 2 elements ofQ ik harm its eigenstructure, making its eigenvalues and eigenvectors unaligned with those of Q ik [4]. Hence, it has a great impact on the system performance, as the MMSE channel estimator takes advantage of the eigenstructure of Q ik to acquire better channel estimates. Therefore, in order to overcome such issues, we estimate the covariance matrix as the following convex combination scheme, as suggested in [4]: between (6) and its diagonalised version,Q diag.
ik . This kind of regularisation turnsQ ik into a full-rank matrix for any value of h , 1, even for the case where N Q , M , and it underestimates the values of the unreliable off-diagonal elements.
In this work, we focus on the performance assessment when we estimate Q ik and do not consider the estimation of the individual R ilk , ∀i, l, k. In [5] the authors propose a specific training phase for estimating R ilk . By replacing (6) into (4) and treatingQ ik (h) as the true covariance matrix, we can then approximate the MMSE estimate of g iik asĝ Assuming that z ik is independent ofQ ik (h), i.e. z ik is not used to estimate the covariance matrices,Q ik (h), and that N Q is large enough to produce good estimates of Q ik , then the MSE per antenna of this estimator is given by which tends to that of the actual MMSE estimator as N Q 1. The regularisation factor h can be selected so that the MSE per antenna is minimised.
After all the previous derived MSE equations, we can define the following Lemma.
Lemma 1: Considering the channel estimatorĝ iik = A ik z ik , then the MSE per antenna of the estimation is given by where A ik is a deterministic matrix defined as Proof: The proof of (12) is obtained through the direct calculation of the MSE as calculated earlier. □ Simulation results: For our simulations, an adequate correlation model is the one in which the channels are spatially correlated, and all eigenvalues of the correlation matrix are non-zero. Thus, we adopt the Exponential correlation model described in [3] with correlation factor, r = 0.5, and large-scale fading variations along the array with s = 4. We assume the same challenging symmetric setup of Fig. 3 in [3] with K = 2 UEs per cell, L = 4 cells, coherence block, t c , of 200 channel uses, and that each device transmits with a power of 100 mW. For all results, except the one in Fig. 1, we consider M = 100 antennas. Except for the results shown in Fig. 2, h = 0.5. As described in [3], the pilot contamination is very high in that setup. All figures presented next plot the normalised MSE (NMSE) per  Although exhibiting the highest MSE value, it is important to highlight that the LS estimator does not account for the large-scale fading coefficients knowledge. As the same in LS estimator, the MMSE estimator also presents constant MSE along all considered h values. However, differently from the LS estimator, it has the lowest MSE among the studied estimators. Both estimators, LS and MMSE, present constant MSE as they do not depend on h. On the other hand, the approximated MMSE estimator performance depends on the considered h value. From h values ranging from 0 up to 0.4, the approximated MMSE estimator has MSE values quite similar to the MMSE estimator. The resulting performance is due to the correlation elements between the channels (i.e., off-diagonal elements) that have smaller weights when compared to the diagonal elements. In h ranging from 0.5 up to 0.9, the performance is worse than that of the MMSE estimator but, on the other hand, is still better than the LS. For h values greater than 0.9, the MSE of the approximated MMSE estimator presents inferior performance, becoming worse than the other estimators. This is due to the fact that the covariance matrix is not a full-rank matrix anymore. Another important conclusion that can be drawn from the figure is that for the correlation model adopted here, the estimation of only the diagonal elements of the covariance matrix would suffice to have a good channel estimator. Fig. 3 depicts the performance of the channel estimators in terms of the NMSE per antenna versus the variation of the number of channel observations, N Q . As expected, both the MMSE and the approximated MMSE estimators have better performance than the LS for all considered N Q values. A critical remark is that LS estimator does not need any prior channel information, and the MMSE estimator assumes perfect knowledge of the channel statistics. As a result, the MSE is constant over all N Q values for the LS and MMSE. On the other hand, the approximated MMSE estimator depends on the number of observations used to estimate the covariance matrix and, consequently, presents performance dependable on N Q . As can be noticed, the performance of the approximated MMSE asymptotically tends to that of the MMSE estimator as N Q increases. The NMSE of all three estimators decreases as the number of antennas also increases. It is also essential to highlight the fact that, as expected, while N Q increases, the approximated MMSE performance tends to that of the MMSE estimator.  The performance of the approximated MMSE becomes better as N Q increases, providing as consistent estimates of the covariance matrices as the MMSE estimator. While p increases, the MMSE performs better, although the same behaviour cannot be observed with the approximated MMSE estimator if N Q is insufficient. Fig. 5 depicts the NMSE for the LS, MMSE and the approximated MMSE estimators as a function of the correlation factor. The covariance matrix is estimated averaging N Q = 1000 channel observations. For the LS estimator, the NMSE is constant for all range of correlation factor. For the MMSE and approximated MMSE, the NMSE decreases as the correlation factor, r, increases. Note that since N Q is large, it produces a good estimation and consequently the approximated MMSE performs closely to the MMSE estimator. However, when the correlation between the channels becomes higher, the MMSE estimator is better than the approximated MMSE since the MMSE has the knowledge of the channel statistics. Moreover, if we analyse (5), it is possible to conclude that both terms become similar as the correlation factor increases, which consequently causes the NMSE to decrease. The same analysis is applicable to (11).