Automatic Modulation Classiﬁcation for MIMO Systems via Deep Learning and Zero-Forcing Equalization

—Automatic modulation classiﬁcation (AMC) is one of the most criticaltechnologiesfornon-cooperativecommunicationsystems.Recently, deeplearning(DL)basedAMC(DL-AMC)methodshaveattractedsignif-icantattentionduetotheirpreferableperformance.However,thestudy ofmostofDL-AMCmethodsareconcentratedinthesingle-inputandsingle-output(SISO)systems,whilethereareonlyafewworksonDL-based AMCmethodsinmultiple-inputandmultiple-output(MIMO)systems.Therefore,weproposeinthisworkaconvolutionalneuralnetwork(CNN) basedzero-forcing(ZF)equalizationAMC(CNN/ZF-AMC)methodforMIMOsystems.SimulationresultsdemonstratethattheCNN/ZF-AMC methodachievesbetterperformancethantheartiﬁcialneuralnetwork(ANN)withhighordercumulants(HOC)-basedAMCmethodunderthe conditionoftheperfectchannelstateinformation(CSI).Moreover,wealsoexploretheimpactoftheimperfectCSIontheperformanceofthe CNN/ZF-AMCmethod.Simulationresultsdemonstratedthattheclassiﬁ-cationperformanceisnotonlyinﬂuencedbytheimperfectCSI,butalso associatedwiththenumberofthetransmitandreceiveantennas.

based on efficient classifier designs [3]. Specifically, signal features are extracted from the signal, and then apply support vector machine (SVM) or traditional artificial neural network (ANN) to classify the modulation types [4]. In addition, these features can represent different modulation types. The modern features includes high order cumulants (HOC), instantaneous frequency features, wavelet transformation (WT) features, and so on. The most common combination of the traditional AMC method is ANN with HOC [4], [5], which is applied into both single-input and single-output (SISO) systems and multiple-input and multiple-output (MIMO) systems.
Recently, deep learning (DL) has emerged as one of the most powerful tools for classification [6]- [11]. Thus, DL has been applied into various communication technologies [12]- [16], e.g., beam management [17], resource allocation [18]- [20], non-orthogonal multiple access (NOMA) [21], [22], traffic control [23], [24], to enhance physical layer and network layer communication [25]. DL can be divided in two different categories. The first one is based on the in-phase and quadrature (IQ) components of signals. T. J. Oshea et al., firstly proposed a convolutional neuron network (CNN)-based AMC method, trained on a large number of IQ samples, and achieved outstanding performances [26]. Then, various neural networks, such as long short-term memory network (LSTM) [28] and convolutional long short-term deep neural networks (CLDNN) [27], were proposed for AMC under various noise conditions. The other is the constellation diagram-based AMC methods, where the trimmed CNN-based supervised AMC method and generative adversarial network (GAN)-based semi-supervised AMC method have been proposed [29], [30], respectively. What's more, DLbased AMC in MIMO systems has been explored. Y. Wang, et al. [31] proposed a CNN-based multiple-antenna cooperative AMC (Co-AMC) in a uncorrelated MIMO channel. M. H. Shah, and X. Dang [32] proposed two kinds of AMC methods via sparse auto-encoder (SAE)-based deep neural network (DNN) and radial basis function network (RBFN) for space-time-block-codes (STBC)-MIMO system.
In this paper, we propose a CNN-based zero-forcing (ZF) equalization AMC (CNN/ZF-AMC) method for MIMO systems. The ZF equalization technology is adopted to enhance the classification performance under the perfect CSI and the imperfect CSI, because ZF equalization can increase SNR of the received signal under perfect CSI or imperfect CSI (with limited channel estimation errors). In the former case, we compare the perfect CSI-aided CNN/ZF-AMC method with the traditional methods. Our results reveal the huge advantage of the CNN/ZF-AMC method. In the latter case, the imperfect CSI is generated by the channel error model rather than the estimated CSI to study the factors affecting the classification performance.

II. SYSTEM MODEL
Assuming that the MIMO channel is a time-invariant complexvalued MIMO channel, the received signal at the n-th sampling time can be given as where H is the MIMO channel matrix of size N r × N t (N r ≥ N t ), and it obeys the circular symmetric complex normal distribution with zero mean and unit variance; R(n) = [R 1 (n), R 2 (n), . . . , R Nr (n)] T is the N r -received signal vector, obtained perfectly by Nyquist sampling without phase offset and frequency offset; T(n) = 0018-9545 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
[T 1 (n), T 2 (n), . . . , T N t (n)] T is the N t -transmitted signal vector, and H is the conjugate transpose operation; G(n) is the additive white Gaussian noise (AWGN), the element of which obeys the circular symmetric complex normal distribution with zero mean and E G variance. Equalization is applied to reveal the ambiguity of the received signal sequence [4], and ZF equalization is considered. The received signal via ZF equalization can be written as, where ZF (Ĥ) =Ĥ † = (Ĥ HĤ ) −1ĤH is the equalization matrix, where (Ĥ HĤ ) −1ĤH is denoted as the pseudo inverse operation of H. In addition,Ĥ is the estimated channel matrix. In this paper, we consider perfect CSI case (i.e.,Ĥ = H) and imperfect CSI case (i.e., H = H).
Assuming the perfect CSI, our proposed CNN/ZF-AMC method is compared with other AMC methods, while we adopt a channel error model to generate the channel matrix as the imperfect CSI. The channel error model is written aŝ where σ e is the channel error coefficient, and E is the error matrix, which is independent of H and each element obeys the zero-mean and unit-variance circular symmetric complex normal distribution. The reason why we choose this model is that the mean and variance of the element inĤ is the same with that of H. However, it is difficult to relate the channel estimation error with σ e in (3). Hence, we define the normalized mean square error (NMSE) betweenĤ and H, which is generally applied to measure the channel estimation error [33], and it can be written as where h ij andĥ ij are the (i, j)-th element of H andĤ, respectively. NMSE can be easily associated with σ e , and it can be expressed as where the second term is achieved by Taylor expansion. When σ e 1, NMSE is approximately equal to σ e .

III. THE PROPOSED CNN/ZF-AMC METHOD
In this section, we introduce the CNN/ZF-AMC method, whose structure is shown in Fig. 1(a). The CNN/ZF-AMC method consists of three main parts: channel estimation, ZF equalizer, and CNN applied for identifying modulation types. In order to make understanding easier, we mainly introduce the part from three aspects: dataset generation, CNN for the ZF-AMC method, and ANN and HOC for the traditional AMC method.

A. Dataset Generation
Here, a complex-baseband equivalent multi-antenna system model is considered and the process of dataset generation is shown in Fig. 2. Specifically, random data are modulated with different modulation types, including binary phase shift keying (BPSK), quadrature phase shift keying (QPSK), eight phase shift keying (8PSK), and sixteen quadrature amplitude modulation (16QAM). The modulation signal vector can be denoted as X, the size of which is 1 × N (N is the number of symbols, and here N = 128). In addition, for a fair comparison, X is  normalized with unit power, i. e., ||X|| 2 2 = 1. Then, X is reshaped into a N t × N/N t matrix, and it can be represented as is the transmitted signal vector at the i-th antenna.
When passing through the MIMO channel, the received signal vector at the j-th receive antenna is denoted as Next, the received signal matrix can be equalized by ZF equalizer, and the equalized signal sequence is [R 1 ,R 2 , . . . ,R N t ] T with size N t × N/N t , which is vectorized into a 1 × N vectorR. The training and test samples are extracted fromR. Specifically, the real part and imaginary part ofR: R(R) and I(R) are separated and then they are combined into a 2 × N matrix [R(R); I(R], which is a sample for training or test. It is noted that we prepare 20,000 samples for training, and 10,000 samples for testing for each SNR value.  [3], [4] B. The Proposed CNN/ZF-AMC Method 1) CNN Structure: In this correspondence, we adopt a simple CNN with one feature extraction module with two convolutional layers and one classification module with three fully-connected layers, the structure of which is shown in Fig. 1(b). What's more, rectified linear unit (ReLU), batch normalization (BN), and dropout follow behind each available layer except the last fully-connected layer, and the former one is as activation function, while the latter two are to prevent overfitting and slightly accelerate the training process. In addition, Softmax is chosen as the activation function of the last layer.
2) Training and Test Phase: Before training, the training dataset is divided into training part and validation part for cross-validation, which are applied to update the trainable parameters of CNN, and choose the best trained model or parameters, respectively. We choose an adaptive learning rate optimizer of ADAM, and select the classification cross entropy function as the object function. Other parameters, including the maximum epoch, early-stopping epoch, batch size are set as 100, 20 and 500, respectively. After training, the test samples are fed into the trained CNN for the predicted labels.

C. Review of Traditional AMC Method
Here, ANN and HOC-based traditional AMC method, which is a classical combination of classifier and feature [4], is as a comparison for highlighting the superior performance of the CNN/ZF-AMC method. The structure of the traditional method is similar to the CNN/ZF-AMC methods in Fig. 1(a), "CNN" in the CNN/ZF-AMC method is replaced with "ANN+HOC" in the traditional AMC method. Specifically, the fourth order HOC features are applied, which is denoted as C 4 and is shown in Tab. I, and the feature vector is extracted from the dataset for CNN in the CNN/ZF-AMC method. In addition, the ANN structure has the same structure as the classification module in Fig. 1(b).

IV. RESULTS AND DISCUSSIONS
In this section, we show two sets of simulation results which are respectively in the perfect CSI and imperfect CSI cases. In the former case, the CNN/ZF-AMC method and the ANN and HOC-based traditional AMC method are compared, while the impact of the channel estimation error on the classification performance is investigated in the latter case. Here, the correct classification probability is adopted as the evaluation metric, and it can be represented as P cc = S c /S × 100%, where S c is the number of correctly classified samples, and S is the number of the total samples for the given SNR.

A. Performance Comparison in the Perfect CSI Case
The classification performances are shown in Fig. 3. It can be obviously observed that the CNN/ZF-AMC method has a great advantage over the ANN and HOC-based traditional AMC-method, where "CNN (N r , N t )" represents the former one and "ANN with C 4 (N r , N t )" is the latter one. In addition, the fewer transmitter antennas, the better performance, when the number of the receive antennas is fixed. For explaining this result, we perform some analysis as follows. The received signal sequence via ZF equalization with perfect CSI can be written as follows.
and the post-processing noise can be represented asĜ(n) =Ĥ † G(n). Thus, the post-processing SNR [4] can be written as where [·] ii is the i-th diagonal element of a matrix, and γ = E T /E G is actual SNR. In addition, 1/[(H H H) −1 ] ii is known as a chi-quare distributed random variable with 2(N r − Nt + 1) degrees of freedom [33], i.e., 1/ and the SNR gain is determined by the difference, Δ = N r − N t between the number of the receive antennas and that of the transmit antennas Δ = N r − N t , which means that the larger Δ, the more performance improvement, but the performance gap with different transmit antennas in the CNN/ZF-AMC method is more limited than that in the traditional method.

B. Performance Comparison vs. Channel Error Coefficient
The perfect CSI is hardly obtained in the actual communication systems. Thus, we focus on the CNN/ZF-AMC method in the imperfect CSI case. It can be observed that with the increasing of σ e , the classification performance is gradually decreasing, which is shown in Fig. 4. However, there is a huge differences in the classification performance for a different combination of receive and transmit antennas, when σ e is the same. Specifically, when σ e = 0.2 and SNR = 10 dB, the correct classification probability of the MIMO system with N r = 4 and N t = 1 can reach up to nearly 100%, but that of the MIMO system with N r = 4 and N t = 4 barely exceed 50%, which are shown in Fig. 4(a) and Fig. 4(c). We give some analysis for the detailed factors that lead to the above performance difference, which are shown as follows.
Based on the channel error modelĤ, the ZF equalization-based received signal can be given bŷ whereĤ † = ( √ 1 − σ e H + √ σ e E) † , which can be approximated to Thus, (9) can be approximated aŝ (10) and the post-processing transmitted signal and noise can be represented byR(n) = T(n)/ √ 1 − σ e , andĜ(n) = 1 respectively. Then, the postprocessing SNR can be expressed as where i ∈ [1, N t ] and tr(·) is the matrix trace operation, and tr((H H H) −1 ) can be ignored [33]. The exception of γ i is when γ is very high (e.g., γ → ∞), eq. (12) can be approximated by From the above function, It is obvious that the classification performances of the CNN/ZF-AMC method depend not only on σ e , but also on Δ for the given value of N r . Hence, when number of transmitting and receiving antennas is unchanged, the higher σ e will lead to the worse classification performance, which is demonstrated in Fig. 4. In addition, if σ e and N r are fixed, the larger N t will bring in the smaller Δ. Then, it will result into the worse identification performances under the condition of the more receiving antennas, which is obvious in Fig. 4, especially when σ e = 0.2.

V. CONCLUSION
In this correspondence, we proposed an effective CNN/ZF-AMC method for MIMO systems. Specifically, ZF equalization technique was applied to reveal the ambiguity of the received signal with the aid of CSI for the improvement of various AMC methods. We considered the perfect CSI and the imperfect one. In the perfect CSI case, the CNN/ZF-AMC method can achieve much better performances than the traditional ANN and HOC-based AMC method. We also explored the classification performances of the CNN/ZF-AMC method in the case the imperfect CSI. Then, we demonstrated that the classification performance of the proposed method is not only influenced by channel error coefficient, but also related to the number of the transmit antennas and receive antennas. The proposed CNN/ZF-AMC method in the MIMO systems with more transmit antennas has worse classification performance under the same receiving antennas and error coefficient, and vice versa.