Channel Parameter Estimation in the Presence of Phase Noise Based on Maximum Correntropy Criterion

Oscillator output generally has phase noise causing the output power spectral density (PSD) to disperse around a Dirac delta function. In this paper, the AWGN channel is considered, where the sent signal accompanying with phase noise is added to the channel Gaussian noise and received at the receiver. Conventional channel estimation algorithms such as least mean square (LMS) and mean square error (MSE) criterion are not suitable for this channel estimation. We (i) analyze this phase noise channel estimation with information theoretic learning (ITL) criterion, i.e., maximum correntropy criterion (MCC), leading to robustness in the channel estimator’s steady state behavior; and (ii) improve the convergence rate by combining MSE and MCC as a novel mixed-LMS algorithm.



Abstract-Oscillator output generally has phase noise causing the output power spectral density (PSD) to disperse around a Dirac delta function. In this paper, the AWGN channel is considered, where the sent signal accompanying with phase noise is added to the channel Gaussian noise and received at the receiver. Conventional channel estimation algorithms such as least mean square (LMS) and mean square error (MSE) criterion are not suitable for this channel estimation. We (i) analyze this phase noise channel estimation with information theoretic learning (ITL) criterion, i.e., maximum correntropy criterion (MCC), leading to robustness in the channel estimator's steady state behavior; and (ii) improve the convergence rate by combining MSE and MCC as a novel mixed-LMS algorithm.
Index Terms-Information theoretic learning, phase noise, least mean squares, maximum correntropy criterion.

I. INTRODUCTION
NFORMATION theory has opened a wide vision for researchers in signal processing. This field of study has been first introduced to machine learning by Principe and Erdogmuz [1]. They introduced information theoretic learning (ITL) criteria such as minimum error entropy (MEE) and maximum correntropy criterion (MCC) to improve machine learning performance in the presences of outliers or heavy-tailed noise distributions. Subsequent works showed the superiority of these ITL criteria compared to conventional Mean Square Error (MSE) criterion. [2] compared MEE and MSE criteria using MLP as an equalizer in both high and low SNR regimes for additive white Gaussian noise (AWGN). Later, [3] made the same comparison for non-Gaussian noise distributions such as exponential and Cauchy and in [4], a mathematical expression was derived for this superiority in the sense of KL divergence. [5] and [6] devised a conjugate gradient (CG) and Levenberg-Marquardt (LM) algorithm based on correntropy criterion, respectively. These two criteria have not been compared to each other theoretically until recently, [6] proposed an information theoretic approach to formulate this difference, which is the Euclidean distance between the noise distribution and a Manuscript was submitted on November 11, 2020. This work was supported by the Ferdowsi University of Mashhad. (Corresponding author: Ghosheh Abed Hodtani) Gaussian distribution. Impulsive noise is present in environments such as underwater communications [7,8], powerline communications (PLC) [9] , digital subscriber lines [10], OFDM [11], etc.
Phase noise (PN) as another case of non-Gaussian noise [12] needs to be mitigated at the receiver. Low-cost oscillators at the receiver cannot produce an ideal dirac delta power spectral density (PSD). Thus, denoising techniques have to be utilized. [13][14][15] investigated the effect of phase noise in an OFDM symbol. Inner and outer bounds for channel capacity have been calculated in the presence of Wiener phase noise for point-topoint [16] and broadcast channels [17].
Pilot-based channel estimation involves the detection and extraction of channel state information (CSI) using the observed symbols at the receiver while undesirable effects, including AWGN, phase noise, doppler shift, multipath fading, intersymbol interference, etc distort data symbols. Various methods and algorithms have been incorporated in equalizers depending on the assumptions made on channel's conditions like linearity, Gaussianity, sparsity, infinity of impulse response, modulation techniques, etc.
Least Squares (LS) and Minimum Mean Square Error (MMSE) have been used in OFDM receivers. However, By increasing computation power, adaptive equalizers gained attraction. Reference [18] adopted MCC in sparse channel estimation, [19] used a recursive least square (RLS) algorithm that adopted a generalized form of MCC and [20] added MCC to the cost function of an adaptive filtering problem two fold. Firstly, MCC is the main term in the performance function and secondly, a Correntropy Induced Metric (CIM) mimics l0-norm as the penalization term.
In this paper, we adopted a complex LMS method, as in [21], to estimate complex channel coefficients of a multipath fading channel corrupted by Tikhonof phase noise plus AWGN. The main difference to the mentioned reference is that we incorporated MCC instead of MSE as the learning criteria to combat a non-Gaussian noise distribution, namely, Tikhonof phase noise.
Throughout this paper, uppercase bold letters and lowercase The rest of the paper is organized as follows. In section II the basic concepts behind correntropy and its key properties are reviewed. In section III, the complex multipath fading channel model is illustrated and our proposed algorithm is explained based on this model. In section IV, simulation results validate the theoretical findings. Finally, section V concludes the paper.

A. Correntropy
In this section a brief review of MCC is presented and its main properties are discussed. Correntropy as a similarity measure between two random variables X and Y is defined as [1]: Where [. ] is the expectation function, , ( , ) is the joint probability density function (PDF) and is a mercer kernel function with a free parameter known as kernel bandwidth (KBW) . k is assumed Gaussian in most ITL applications and is defined as Since correntropy is symmetric, positive, bounded, and reaches its maximum if and only if x=y, it can be used as a similarity measure [22]. Expanding the exponential function in (. ) by means of Taylor series expansion, it can be deduced that correntropy involves all the even moments of the error random variable = − : This can be viewed as a generalized MSE because setting = 1, results in an MSE criterion. As a result, more information is extracted from the data samples in MCC [23].
Since only a limited number of error samples could be observed at the receiver, the above equation is estimated, using sample mean estimator: Where is the number of error samples. Note that is replaced by the special case of a Gaussian kernel.
An interesting property of MCC is the mapping of data samples to a Reproducing Kernel Hilbert Space (RKHS) [23]. Thus, unlike MSE that considers the Euclidean distance between two random variables, MCC maps data to the surface of a hyper-sphere in a high dimensional feature space defined by ( ) = 〈. , 〉. In this space, the inner product between two elements can easily be obtained by the kernel function without even knowing the non-linear mapping function (. ): Consequently, a nonlinear version of MSE is achieved [23].

A. System Model
Consider a point-to-point multipath channel of length impaired by AWGN and phase noise which is described as: Where ( ) = [ ( ), … , ( − + 1)] , ( ) = ℎ , ( ), … , ℎ , ( ) + ℎ , ( ), … , ℎ , ( ) , Φ, , and are QAM modulated symbols, complex channel vector, phase noise, zero-mean circularly-symmetric complex Gaussian noise, and received data, respectively. If a phaselocked loop eliminates some phase perturbations, Φ( ) is small, i.e. |Φ( )| ≪ 1. Using a linear approximation (5) is rewritten as [24]: The second and third terms on the right-hand side are complex noise terms. The real part is Gaussian while the imaginary part is non-Gaussian if Φ( ) is assumed non-Gaussian. For a Von Mises distributed phase noise, the PDF of the imaginary part could be estimated through a kernel density estimation (KDE) as shown in Fig. 1. In this figure, the imaginary noise term displays a more heavy-tailed behavior than the standard Gaussian PDF. The KBW is chosen according to Silverman's law [25] and kappa parameter for Von Mises distribution is set to = 8. As a result, the heavy-tailed assumption for phase noise is valid.

B. Proposed Algorithm
We aim at obtaining an estimation of , namely, via a training algorithm which minimizes the error: Where ( ), ( ), and ( ) are the estimated received symbol, real part of the estimated channel coefficient vector, and imaginary part of the estimated channel coefficient vector at time instant . The conjugate of the error symbol is expressed as: * ( ) = * ( ) − ( ) ( ) + ( ) .
The performance function of the algorithm is described by: The gradient of ( ( ) * ( )) with respect to is expressed as: One can obtain the gradient of ( ( ) * ( )) with respect to in a similar fashion: ( ( ) * ( )) = e(n) j * (n) + e * (n) −j (n) .

(11)
Using the conventional gradient descent (GD) algorithm, the real and imaginary parts for the estimated channel coefficient is updated by the following formulas: Substituting (9) in (12) and (13), and utilizing the gradients derived in (10) and (11), we obtain: Combining (14) and (15) using ( ) = ( ) + ( ), results in: We name it complex MCC-based LMS (MCC-LMS) to distinguish it from the standard complex LMS (MSE-LMS) algorithm which is characterized by: Comparing (16) and (17), it is obvious that an adaptive step size = (e) improves MCC-LMS's performance in the presence of heavy-tailed distributions. As [22] puts it, it is a localized similarity measure in contrast to global MSE. Locality means MCC puts more emphasis on the points near the line x=y, suppressing the outliers that are away from this bisector in the joint space. This emphasis is defined by the free parameter (KBW) in the Gaussian kernel function (e). On the other hand, MSE sees all the points in the joint space through the same lens and each error sample contributes linearly to the overall performance function's output in respect to its distance to = .

IV. SIMULATION RESULTS
In this section, simulation results are presented for a multipath fading ( = 5 paths) channel with complex coefficients: Where the constant factor is used to normalize the channel vector power. KBW is set to = 2 through trial and error and all simulations are carried out over 50 Monte Carlo iterations. As can be seen, applying MCC criterion provides us with no significant improvement. In other words, it hiders convergence rate badly. Besides, the improvement in the steady state MSE is only 4dB.
It is worth mentioning that the slow rate of convergence in MCC roots in the exponential factor in (16) because large errors at the beginning contribute to small . Therefore, the coefficients are updated slightly but in the correct direction.
Adding PN turns the table in favor of MCC. Fig 3. shows the algorithms' behavior in the presence of moderate PN, where = 4 and SNR=60. Obviously, the steady states of both algorithms get worse. Moreover, The difference between the two rises to 8dB while it was about 4 dB in the lack of PN.
Next, we present some meaningful results about the significance of KBW's impact on MCC-LMS. Large KBW worsens convergence rate while it improves MCC-LMS's robustness against outliers. This trade-off is a design parameter and the decision is dependent on the application.    In order to have both good convergence rate and desirable steady state, we combine both algorithms and introduce mixed-LMS. More precisely, for the few initial iterations after a change in the channel coefficients (i.e., a rise in MSE of error), MSE-LMS is utilized and for the following iterations after the change, MCC is replaced in the algorithm. This change can be triggered by a condition such as, the number of passed iterations or MSE value falling beneath a predefined threshold, say = −40 . In Fig. 5, We used the latter.

V. CONCLUSION
With regard to higher order statistics (HOS), we presented a complex correntropy-based learning algorithm, named MCC-LMS, that could achieve a lower MSE for channel coefficients estimation. This algorithm is robust against non-Gaussian distributions. Therefore, the parameters of a channel corrupted by phase noise can be well estimated by our proposed algorithm. Although this algorithm illustrated good performance in the steady state, the convergence rate was quite low. To address this issue, we proposed a mixed-LMS algorithm that exploited both fast convergence rate of standard LMS and good steady state of MCC-LMS.