Generalized Automatic Modulation Classification Under Non-Gaussian Noise with Varying SNR Conditions: A CNN Enable Method

Automatic modulation classification (AMC) is an critical step to identify signal modulation types so as to enable more accurate demodulation in the non-cooperative scenarios. Convolutional neural network (CNN)-based AMC is believed as one of the most promising methods with great classification accuracy. However, the conventional CNN-based methods are lack of generality capabilities under time-varying signal-to-noise ratio (SNR) conditions, because these methods are merely trained on specific datasets and can only work at the corresponding condition. In this paper, a novel CNN-based generalized AMC method is proposed, and a more realistic scenario is considered, including white non-Gaussian noise and synchronization error. Its generalization capability stems from the mixed datasets under varying noise scenarios, and the CNN can extract common features from these datasets. Simulation results show that our proposed architecture can achieve higher robustness and generalization than the conventional ones.


I. INTRODUCTION
Automatic modulation classification (AMC) is an essential technology in non-cooperative communication systems for demodulation tasks of unknown signals [1]- [4]. It has various applications, such as intercepted enemy signal recovery, adaptive modulator [5], and spectrum sensing [6], in both military and civilian strategies. In recent years, various methods were proposed for AMC, and they can be classified into two common AMC methods that are based on likelihood functions and features [7], respectively.
In the likelihood-based methods, AMC can be formulated as a hypothesis testing problem [8]. It is necessary to design a correct likelihood function to evaluate likelihood for each T. Ohtsuki is with the Department of Information and Computer Science, Keio University, Yokohama 223-8521, Japan (E-mail: ohtsuk-i@ics.keio.ac.jp).
F. Adachi is with the Research Organization of Electrical Communication (ROEC), Tohoku University, Sendai 980-8577 Japan (E-mail: adachi@ecei.tohoku.ac.jp) modulation type within hypothesis pool. Then, the likelihoods of each modulation type are compared to make a final decision. However, the likelihood-based AMC methods excessively depend on channel state information (CSI) of wireless channels.
In the feature-based methods, AMC is modeled as a pattern recognition problem and it consists of three steps: pre-processing, feature extraction, and classifier design [9]. Various AMC methods have been developed using instantaneous features (or signal spectral-based features), wavelet transform-based features, high-order statistics-based features, cyclic spectrum analysis-based features, and so on. To realize modulation type classification by extracted features, they usually adopt classifiers, such as support vector machine (SVM), decision tree (DT), k-nearest neighbor (KNN) and multilayer perceptron (MLP).
In recent years, deep learning (DL) is considered as a powerful tool, because it is expert in automatic feature extraction from huge amounts of data, instead of the complex and difficult design of manmade features [10], [11]. For this reason, DL has been successfully applied in wireless communications [12]- [16] and Internet-of-Things [18]- [24].
In addition, DL has been applied in multiple-input and multiple-output (MIMO) [17], non-orthogonal multiple-access (NOMA), and cognitive radio (CR). For example, H. Huang, et al. [25], [26] proposed a fast beam forming technology for downlink MIMO based on unsupervised learning. G. Gui, et al. [27] applied a long short-term memory (LSTM) network into a typical NOMA system for enhancing spectral efficiency. M. Liu, et al. [28], [29] introduced DL into resource allocation in CR.
Moreover, state-of-the-art DL-based AMC methods have been developed in recent two years. T. OShea and J. Hoydis [30] proposed a convolutional neural network (CNN)-based AMC, which is realized by training CNN on the in-phase and quadrature (IQ) components of signals. B. Tang, et al. [31] transformed the modulated signals into constellation diagrams, and then generative adversarial network(GAN) was applied to distinguish these constellation diagrams. Y. Tu, et al. [33] proposed a lightweight and fast CNN-based AMC method for edge devices on their previous works [32], [33]. Y. Wang, et al. [34] proposed combined IQ sample-based CNN and constellation diagram-based CNN method to recognize different modulation types.
Although these CNN-based AMC methods have been proposed to demonstrate better performance than traditional meth- The AMC-based receiver in a non-cooperative communication system. There are no agreement and authentication between receivers and transmitters in this system. The demodulation relies on the "AMC method" module to correctly and fast identify modulation types, and then it has possibility to demodulate the received signals.
ods, most of them are trained by dataset with single signal-tonoise ratio (SNR). It means that these CNNs can just achieve satisfying performance at the corresponding single SNR rather than all SNR scenarios. These independent CNNs are hard to be generalized. If we adopt these CNNs in practical applications, we must train various CNNs with dataset collected from different SNR conditions, and choose correct CNN-models according to actual communication environments, which is not convenient.
In this paper, a CNN-based robust generalized AMC method with higher generality capability under varying noise conditions is proposed, considering a more practical scenario, characterized by white non-Gaussian noise (WNGN) and nonideal synchronization, i. e., frequency offset or phase offset. The proposed AMC method has more powerful recognition capability than the traditional feature-based AMC method, and it achieves higher robustness for actual applications at a slight performance loss. Compared with other CNN-based AMC methods, our proposed method has two obvious advantages, which are listed as follows: Undesired time-varying SNR estimation: Other CNNbased AMC methods rely on a precise SNR estimation, because CNNs are trained on samples with single SNR. For multiple CNNs-based solution [7], [9], [30], [32]- [34], a precise SNR estimation is essential to assist systems in selecting correct CNN model from trained models. If SNR cannot be estimated precisely, these methods may be ineffective. Unlike these conventional methods, CNN in our proposed method is trained on a mixed dataset containing different signals with SNR ∈ {-5 dB, 0 dB, 5 dB}. Unknown signals with SNR ranging from -5 dB to 5 dB can be recognized by the same CNN, and SNR estimation is unwanted.
Less device memory: When SNR is ranging from -5 dB to 5 dB with an interval 1 dB, far more than one CNN model should be trained for responding to different SNR conditions in other CNN-based AMC methods. However, our proposed method just need to train one CNN model in actual applications. In the case of the same network structures, our method just requires less device memory than other CNNbased AMC methods.
The rest of this paper is organized as follows. Section II includes the system model, the signal model, and dataset. In Section III, we propose a CNN-based generalized AMC (GAMC) methods. In Section IV, various simulation results are provided to compare their performance, respectively. Finally, we conclude this paper.

A. System Model
A typical non-cooperative communication system is considered, where transmitters transmit digital modulation signals through wireless channel, and the receiver does not get a priori information about modulation types, symbol rates, and so on.
After receiving these digital modulation signals, the system makes preprocessing, including down conversion, low pass filtering, and analog-to-digital conversion and so on. After preprocessing, we can get baseband signals, which are fed into an AMC module to identify modulation types. The AMCbased receiver is shown in Fig. 1.
In this paper, we focus on the feature-based AMC methods, and they generally consist of three steps: processing, feature extraction, and classification [9]. In traditional AMC methods, the most difficult part is the design of effective manmade features, and classifiers are usually based on machine learning or simple threshold detection, which is shown in Fig. 2(a).
Unlike the traditional methods, DL methods, e. g., CNN or recurrent neural network (RNN), can simultaneously achieve feature extraction and classification. Moreover, the DL-based AMC method can get rid of complex and difficult manmade feature design. The framework of the DL-based AMC method is depicted as Fig. 2

B. Signal Model
Assuming that the received complex-valued baseband equivalent signal r(n) is sampled at the Nyquist rate, it is given as follows.
where α, ∆θ, ∆f , and N represent attenuation factor, carrier phase offset (CPO) caused by wireless channels [7], normalized carrier frequency offset (CFO), and the number of sampling points in an independent observation phase, respectively. In this paper, we consider a flat fading and time invariant channel, so α, ∆θ, and ∆f are constant in each observation phase.
In addition, {x(n)} N n=1 is the symbol sequence, and three modulation types of frequency shift keying (FSK), phase shift keying (PSK), and quadrature amplitude modulation (QAM) are considered in this paper; w(n) represents additive noise. In this paper, we mainly consider WNGN based on Gaussian mixture model (GMM) with K components [35]. GMM-based WNGN consists of K independent noise obeying complex Gaussian distributions. Its probability density function (PDF) is given as and it consists of K independent noise obeying complex Gaussian distributions. λ k is the ratio of each component, and K k=1 λ k = 1 and 0 < λ k < 1.

III. AMC METHODS
In this section, a CNN-based generalized AMC (GAMC) method is proposed with better generalization performance under varying noise conditions. In contrast, previous proposed CNN-based AMC method [7], [9], [30], [32]- [34] is denoted as a fixed AMC (FAMC) method, and it does not equip with powerful generalization capability. In addition, traditional AM-C methods, based on classical manmade features and typical machine learning-based classifiers, are firstly introduced as a comparison.

A. Traditional AMC Method
For the purpose of highlighting the performance of DLbased AMC method, we adopt one of a traditional AMC method as a comparison. The structure of the traditional AMC method is shown in Fig. 2(a), where high-order cumulants (HOC) [37] are classical manmade features, which are described below.
In detail, the normalized fourth-order cumulants [38] are applied, and they can be describe as: where M mk represents the normalized moments. It can be denoted as Hence, C 40 , C 41 , C 42 works as a feature vector and SVM acts as a clssifier.

B. CNN-based GAMC Method
Here, a CNN-based GAMC method is stated from dataset, CNN structure, classifier, loss function, and training and test strategies.
1) Dataset: Dataset, applied in this paper, contains multiple in-phase and quadrature (IQ) samples, which is transformed from the received complex signal sequence. The received sequence is defined as R = {r(n)} N n=1 . To avoid the scaling problem, the power of R should be normalized, and the normalized received sequence R is equal to Then, we separate real part real R and imaginary part imag R from the normalized received sequence. Next, real R and imag R are combined into a matrix with dimension 2 × N , which is treated as one sample for training or testing.
In addition, the real part and imaginary part are also inphase (I) component and quadrature (Q) component of signal, respectively. So this training or test sample is also called as IQ sample, which is shown in eq. (6).
Firstly, "Input" is the dataset with IQ samples and their corresponding labels. Then, "Convolutional Layer" is to automatically extract features and contains two "Conv2D" layers. In these "Conv2D" layers, the convolutional kernel sizes are 2 × 4 and 1 × 8, respectively, which are is designed according to input data: IQ sample, the dimensionality of which is 2×N .
Next, "Fully-connected Layer" is fundamentally a classifier with three "Dense" layers, and its output is a probability distribution, which contains the possibility of each modulation type. Finally, "Output" is the predicted modulation type, and it is given by the maximum a posteriori (MAP) classifier based on probability distribution of the last "Dense" layer.
Besides, rectified linear unit (ReLU) plays the role of the activation function in each layer except the last dense layer, where Softmax is applied. Assuming that x i is the output of the i-th neuron in a certain layer, the function of ReLU and Softmax can be described as follows. (7) In addition, batch normalization (BN) and dropout, after each activation function (except in the last fully-connected layer), are applied to accelerate training, improve performance slightly and avoid overfitting. They can be considered as two implicit regularization terms. BN is to normalize the output of each layer in each batch and can be represented as where M ean(O minibacth ) and V ar(O minibacth ) are the mean and variance of the output of mini-batch data, respectively; γ and β are trainable parameters [39], and is a minimum value to prevent denominators from being zero. In addition, Dropout is to temporarily disable partial neurons with a certain probability in the training process. It is noted that the same CNN structure is applied into both FAMC and GAMC.
3) MAP classifier: AMC is to identify modulation types in a limited modulation type candidate pool. Assuming that the modulation type candidate pool is M = {m i } Ntype i=1 , where m i represents a certain modulation type and this pool contains N type different modulation types, MAP criterion in CNNbased AMC [36] can be described as: wherem i is the predicted modulation type; f model represents model structure, and Θ = {Θ trainable , Θ untrainable } is the model parameters, which contains massive trainable parameters and a few untrainable parameters; p(·) is a PDF that refers to the output of Softmax function in the last fully-connected layer. 4) Loss function: In this paper, the categorical cross entropy (CCE) function is applied as data loss function (or experience loss function), considering that AMC is essentially a multiclass classification task.
is applied for the training of CNN, where s i , l i and N s represents IQ sample, ground truth sample label through one-hot encoding, and the number of training samples, respectively. The CCE loss function is given as follows.
However, the final loss function contains not only the data loss function but also the structure loss function, and it can be written as follows.
where J(·) is just the structure loss function (or regularization term), which is to avoid overfitting, and λ model is applied to balance these two loss functions. In this paper, BN and dropout are as the components of the structure loss function, which have been introduced above. 5) Training and test strategies: Training optimizer, training process, and test process are introduced in this part. The same training optimizer is adopted in both GAMC and FAMC method, and their main difference focuses on the training and test process between our proposed GAMC method and FAMC method.
Training optimizer: stochastic gradient descent (SGD) is introduced as an optimizer to minimize the function (11) by iteratively optimizing and updating trainable parameters Θ trainable in Θ. The optimizing rule is written as follows.
where η is referred to as a learning rate to control scale of parameter adjustment.
Training process: The training processes of GAMC and FAMC are shown in Fig. 4. From Fig. 4(a) and Fig. 4(b), it can be observed that GAMC differs from FAMC in the IQ dataset for training. CNN in GAMC is trianed on mixed dataset, while CNN in FAMC is fed with single dataset for training.
When training CNN in GAMC, three datasets with SNRs of -5 dB, 0 dB and 5 dB are proportionally mixed. Then, the mixed dataset is divided into training dataset and validation dataset by 7:3 in random. Training dataset is fed into CNN for training and validation dataset is applied to measure the performance of trained CNN after each epoch. The trained CNN can be employed into modulation type recognition of unknown signals with SNR ranging from -5 dB to 5 dB. However, CNN in FAMC is trained on IQ samples with fixed SNR = i dB, and this CNN just can be employed to identify received signals with SNR = i dB.
Powerful generalization performances of GAMC under varying SNR conditions originate from the mixed datasets. CNN in GAMC can extract features from various dataset with different SNRs simultaneously. Then, CNN has the ability to filter out common or similar features to classify modulation types. Hence, these extracted features are universally suitable and more roust under varying noise conditions.
For training GAMC, we choose IQ dataset with different SNRs by sampling SNRs = [-5, 5] dB at an equal interval, i. e., datasets' SNRs = {-5 dB, 0 dB and 5 dB}, and this approach can reduce the amount of dataset used for training. Test process: In the test process, signal samples in test dataset are fed into the trained CNN. Then CNN will give a predicted probability distribution, and the MAP classifier is applied to identify modulation type. In the test phase, CNN just contains feed forward propagation.
There is a great deal of differences between test processes of GAMC and FAMC, and their test processes are shown in Fig.  5. In the FAMC-aided communication system, multiple CNN models are trained on different IQ samples with fixed SNRs. Hence, it is fundamental to equip with the SNR estimation for the choice of a correct CNN model under varying SNR conditions.
From the test process, the differences between FAMC and GAMC can be easily observed from two aspects. On the one hand, the SNR estimation technique is removed in GAMCbased system, because there is just one CNN model and GAMC is independent of the SNR estimation technique to obtain the estimated SNR for the next CNN model choice. However, FAMC contains multiple CNN models, in order to confront the varying SNR conditions, and it rely on the estimated SNR to choose a corresponding and correct model.
On the other hand, for the testing of IQ samples with SNRs = [-5, 5] dB, FAMC must prepare eleven CNN models and GAMC just needs one CNN models. It means that GAMC only requires 1/11 device memories of FAMC, and the size of CNN model in GAMC is 23.3 MB, while that in FAMC exceeds 256 MB. Moreover, it is noted that FAMC and GAMC has the same computation complexity, because of the same CNN structure in these two AMC methods. Training process of FAMC is simpler than that of GAMC, however, it is on the contrary in their test processes. The GAMC's test process is extremely simple, and the test results can be given by CNN, after inputting pre-processed samples without other operations. However, pre-processing and SNR estimation should be implemented at the same time in FAMC, and the latter operation is applied to choose a correct model, such as "CNN (j dB)", from multiple CNN models. Then, the pre-processed IQ samples are fed into "CNN (j dB)" to give the predicted modulation types.
IV. EXPERIMENTAL RESULTS In this section, simulation results will be given to evaluate performances of various AMC methods. The detailed parameters and their corresponding values for simulation are given in Table I  The simulation requires powerful computing resources, so it is conducted on the platform with one Intel i7-8750H CPU and one NVIDIA GTX 1080Ti GPU. The implementation of neural networks relies on Keras 2.2.2 with Tensorflow 1.10 and Python 3.6.5 as the backend. SVM is carried out in Sklearn-Python library. Moreover, Matlab R2018a is applied to build our datasets.
Here, three metrics are applied to evaluate classification performances, and The former two metrics are correct classification probability (CCP) at SNR= i dB: P i cc and average correct classification probability (AveCCP): P cc , which are shown as follows.
where N i cc , N test , and N SN R represent the number of correctly recognized samples at SNR= i dB, the amount of test samples at each type and SNR, and the amount of sampling SNRs, respectively. P i cc appears with the format of graphs and P cc is shown as table format.
For the visualization of the specific classification performance for each modulation type in various AMC methods, the third metric, applied in this paper, is the confusion matrix with dimension 3 × 3.  6. The classification performance of FAMC, GAMC, and traditional AMC with the condition of CFO and CPO. It can be observed that FAMC and GAMC have perfect and similar CCP at each SNR, and the classification accuracy of FAMC is slightly higher than that of GAMC, but traditional AMC has far weaker performance than these two CNN-based AMC methods.

A. Classification performance and computation complexity of FAMC, GAMC and traditional AMC
The specifies of various AMC methods under WNGN condition and without CFO and CPO are depicted in Fig. 6. From these experimental results, we can observe that the traditional (a) (b) Fig. 9. The classification performance of GAMC considering different CFOs and CPOs. With the increase of the normalized CFO, the classification performance of GAMC is slightly affected. However, the increasing CPO leads to the sharp performance degradation, while it has the limited performance decline at high SNR, such as 5 dB. Fig. 8. The specific classification performance of FAMC and GAMC with the considering of CFO (∆f = 0.9) and CPO (φ = π), respectively. FAMC and GAMC with the consideration of CPO have lower P i cc than that without the consideration of CPO or with the consideration of CFO. In addition, there is the limited performance gap between FAMC and its corresponding GAMC.
82.14 GAMC (∆f = 0, φ = π) 81.13 AMC method, based on SVM and HOC, has unsatisfactory performances, compared with the CNN-based AMC methods. P i cc of GAMC is similar with that of FAMC, and their maximum gap of P i cc only can reach up to 1.22% at SNR = 2 dB or -5 dB. It means that GAMC and FAMC has few performance gap. In addition, this phenomenon also appears in the other metric: P cc , and their performance gap of P cc is less than 1%, which is shown in Table III.
The confusion matrices of FAMC and GAMC at three SNRs are given in Fig. 7. Compared with the confusion matrices can be precisely distinguished from other modulation types in these two AMC methods, even at low SNR, such as -5 dB. Besides, computation cost or computation complex is another metric to measure AMC methods. In this paper, unit computation time is applied as a metric for the evaluation of three methods, which are shown in Table II. The time represents the average computation time of single IQ sample after testing a large number of samples, and it is calculated in the same platform listed in the head of this section.
From Table II, it can be observed that FAMC and GAMC not only on GPU but also on CPU have far higher computation speed than traditional AMC. These results demonstrate that the CNN-based AMC methods are more efficient than the traditional AMC, particularly in communication systems equipped with GPU.

B. Classification performance of FAMC and GAMC considering CFO and CPO
In this section, the traditional AMC is not considered, because of the weak classification performance and slow computation speed. The influence of CFO and CPO for the classification performances of the CNN-based AMC method is shown in Fig. 8 and Table III. It is noted that the normalized  What's more there are almost no expansion of the classification performance gap between FAMC and GAMC, because their P cc gap of FAMC and GAMC are within or slightly higher than 1%. In addition, their maximum P i cc gap is 1.01% with the consideration of CFO, while the value just reaches up to 1.49%, when considering CPO. The confusion matrices in Fig. 10 and Fig. 11 also illustrates that FAMC and GAMC have similar performances under any circumstances in this paper.
Then, the classification performances of GAMC under the condition of different CFOs and CPOs is considered, and their experimental results are shown in Fig. 9, Table IV and Table  V. As mentioned before, P i cc and P cc of GAMC is almost unaffected by normalized CFO.
As is shown in Fig. 9(b) and Table V, the influence of CPO is weak at φ = π 16 or high SNR, such as 5 dB. However, the influence of CPO is gradually getting strong with the increase of the value of φ. Phase offset correction algorithms should be considered to aid CNN-based AMC methods for the improvement of classification performances, when φ is too large.

C. Generalization capabilities under varying SNR conditions
In the former sections, the specific classification performances of FAMC and GAMC have been introduced carefully through three metrics, and it can be concluded that there are extremely weak performance gaps between FAMC and GAMC. In this section, it is illustrated that GAMC has more powerful generalization capabilities than FAMC at the expense of the slight performance loss. What's more, we give two sets of simulation results for generalization capabilities. The one set of results is tested within the ranges of training SNRs, i. e., SNRs = [-5, 5] dB, and the other set is out of the ranges, including SNRs = [-15, -6] dB and [6,15] dB.
1) Within the range of training SNRs: To compare the generalization capabilities between FAMC and GAMC under varying noise conditions, we depict three curves of CNN (j dB) in FAMC trained at different SNRs (i. e., j = -5, 0, 5) in Fig. 12, and we tested them at SNRs ranging from -5 dB to 5 dB.
The CNN (j dB) in FAMC performs well when testing SNRs are close to j dB (1 dB error can be allowed), but its performance gets worse at other SNRs, which means that the FAMC does not have higher robustness and generalization capabilities. On the contrary, Fig. 12 demonstrates that GAMC can work well at all the testing SNRs, whether or not CFO and CPO are considered.
2) Out of the scope of training SNRs: In order to make the future compare the generalization capabilities of FAMC and GAMC, we present the simulation results for SNRs outside the range of training SNRs in Fig. 13. In Fig. 13(a), the CCPs of GAMC at SNRs, which is higher than 5 dB, is far beyond that of FAMC. The similar simulation results also appear in Fig. 13(b), when SNR is lower than -5 dB.

V. CONCLUSION
In this paper, we have proposed a CNN-based GAMC method with better robustness under varying noise conditions. Compared with the traditional AMC method, the classification accuracy of GAMC is far beyond. Besides, GAMC is more robust than FAMC at the cost of negligible performance loss, because the CNN in GAMC is trained by a mixed IQ dataset containing received signals with SNRs of -5 dB, 0 dB and 5 dB, and it can be applied to recognize modulation types of signals with uncertain SNR from -5 dB to 5 dB. Moreover, our proposed GAMC method is more practical than the CNNbased FAMC methods, because we just have to train one CNN model in GAMC rather than many CNN models in FAMC, which means less device memory assumption. In addition, precise SNR estimation is unnecessary for GAMC to choose suitable CNN models. Hence, our proposed CNNbased GAMC method is meaningful for practical applications.
What's more, the CNN in FAMC works perfectly, yet this CNN can also be applied into GAMC with more complex mixed dataset. It is demonstrated by experimental results that the CNN in GAMC has the capability to extract universal and robust features from mixed dataset with different SNRs. It also illustrates that the CNN, designed by human experience here, has massive redundancy and its operation speed is also limited. Thus, our future work will focus on finding more effective and streamlined neural network model, such as network slimming algorithm [40] and neural architecture search (NAS) [41].