Deep-Learning-Based Non-Coherent DPSK Multiple-Symbol Differential Detection in Single-User Massive MIMO Systems

In view of reducing the complexity of signal detection in massive multiple-input multiple-output (MIMO) receivers, the use of non-coherent detection is favored over usual coherent techniques that require complex channel estimation. In this work, non-coherent massive MIMO differential phase-shift keying modulation (DPSK) detection is considered to get rid of the complexity of channel estimation. However, most of the well-performing DPSK detection techniques require high computational complexity at the receiver. The use of deep-learning is proposed for detecting the transmitted DPSK symbols over a single-user massive MIMO system. Two deep-learning-based multiple-symbol differential detection receiver designs are proposed and compared with differential detection (DD), decision-feedback differential detection (DFDD), and multiple-symbol differential detection (MSDD) for the same system parameters. Where multiple-symbol differential sphere detection (MSDSD) is used to implement MSDD. The results show that the proposed deep-learning-based classiﬁcation neural networks outperform decision-feedback differential detection and achieve an optimal performance compared to conventional multiple-symbol differential detection implemented by multiple-symbol differential sphere detection.

al. [24], used machine learning approaches for signal detection in MIMO systems with BPSK modulation and channel coding. Lin et al. [25] introduced a deep-learning signal detection approach for MIMO non-orthogonal multiple access technique with an M -arry phase modulation.
Samuel et al. [26] and Wang et al. [27] used deep learning for signal detection in MIMO systems with binary phase-shift keying modulation. In [28], and [29], the authors used deep learning for signal detection in MIMO systems with quadrature phase-shift keying modulation. The work in [30] studied quadrature amplitude modulation with deep learning for MIMO systems signal detection.
Regarding the use of deep-learning in non-coherent signal detection, Wang et al. [24] considers a machine learning approach for MIMO signal detection where the channel matrix is unknown.
Xue et al. [31], presented an unsupervised deep-learning approach for non-coherent receiver design in multi-user MIMO system. In [32] a differential detection deep-learning-based scheme for single user massive MIMO was proposed.
However, from the above discussion, it is noted that, none of the conducted state-of-the-art work focused on deep-learning with DPSK and massive MIMO systems signal detection in channels with large coherence time that allows larger DPSK block sizes.
In this paper, two novel deep-learning-based models were deployed for massive MIMO noncoherent signal detection systems along with DPSK modulation. To the best of our knowledge, this is the first paper that uses deep-learning in non-coherent massive MIMO to implement multiple-symbol differential detection based on deep-learning. The paper has the following contributions: • Deep-learning is proposed to provide an optimum multiple-symbol differential detection implementation to reduce the computational complexity of conventional multiple-symbol differential detection for massive MIMO systems.
• Two deep-learning-based multiple-symbol differential detection massive MIMO receiver designs are proposed.
• The proposed models are compared with differential detection, decision-feedback differential detection, and multiple-symbol differential sphere detection for performance evaluation using the same simulation parameters.
The paper is organized as follows, In Section II, the studied system model is illustrated. The definitions of the referred algorithms for comparing and evaluating the achieved results are reviewed in Section III. In Section IV, the proposed deep-learning-based system architecture and receiver designs are described in details and the deployed deep-learning-based classification algorithms are introduced. In Section V, the proposed deep-learning-based detection systems are simulated, and the proposed models results are discussed. The performance of the proposed designs are compared to DD, DFDD, and MSDSD. Finally a brief conclusion is given in Section VI.
II. SYSTEM MODEL The considered system model is shown in Fig. 1. It consists of a single user with a single antenna communicating with a base station equipped with N rx 1 receive antennas. Fig. 1: Illustration of the considered simplified single-user massive MIMO 1 system model [13].
A transmission scenario which presumes a burst transmission with an observation window with N time slots is considered in the proposed model. In each discrete time instant k, the transmitted differential encoded symbol is given by where a k is the transmitted information symbol drawn from an M -ary constellation and each differential encoded symbol b k is calculated by multiplying the transmitted information symbol a k with the previously transmitted consecutive information symbols in the block.
Assuming the transmission of conventional digital pulse-amplitude complex baseband signal, the channel between the user and the receiver antenna number m can be presented by a complex channel fading coefficient h m . The complex Gaussian noise on each antenna is denoted as n k,m .
The noise has zero mean and variance σ 2 n at each receive antenna. The channel is a flat-fading channel with large coherence time, that leads to a constant channel over a long burst of transmitted symbols 2 . The received complex signal is denoted as r k = [r k,1 , . . . , r k,Nrx ] T where The channel coefficients from the user to the receive antennas are grouped in the channel vector h = [h 1 , . . . , h Nrx ] T , and n k = [n k,1 , . . . , n k,Nrx ] T denotes the noise vector. The signal-to-noise ratio (SNR) is defined as the ratio of transmitted energy per DPSK symbol E s and the noise power spectral density N 0 . The average signal-to-noise ratio noted as, E s /N 0 is equal to 1/σ 2 n due to PSK signaling with a unity average energy.
For vector notation of DPSK block with N symbols transmission, the corresponding baseband received signal at the base station is where the complex matrix R ∈ C Nrx×N , can be written as where b is the differential encoded vector transmitted for this block is and N ∈ C Nrx×N is the noise matrix and it is given by, N = [n 0 , . . . , n N −1 ].
The user power-space-profile (PSP) represents the average receive power, P m = E{|h m | 2 } for each receive antenna at the receiver [14]. The knowledge of the PSP is assumed to be available at the base station as noted in Fig. 1. The N × N correlation matrix Z, for the received block R is where Σ = diag(ς 1 , . . . , ς Nrx ) is a real-valued diagonal weighting matrix that contains weights matched to the power-space-profile as described in [13], [14], where we set ς 2 m ∼ P m . The complex matrix Z is a Hermitian matrix, i.e., Z = Z H and composed of the correlation with z k,l = z * l,k , which form a set of sufficient statistics for non-coherent maximum-likelihood detection.

III. CONVENTIONAL NON-COHERENT DPSK SIGNAL DETECTION TECHNIQUES
The following non-coherent differential detection techniques are used for performance evaluation of the proposed deep-learning-based signal detection receivers for the considered massive MIMO system model.

A. Multiple-symbol differential detection
Multiple-symbol differential detection (MSDD) [33] is the optimum block-wise detection scheme for non-coherent detection to estimate multiple-symbol block of information as following where the main diagonal of Z is useless in detecting b, since |b|= 1 for all the symbols in the block.
MSDD is the optimum differential detection technique however, it is computationally complex for larger block sizes in case of brute-force search to get the optimum solution.
Efficient algorithms for performing MSDD exists. Eg., in [34] a fast algorithm for implementing MSDD with complexity N × log 2 N was proposed. In [16], for power efficient transmission, multiple-symbol differential sphere decoding (MSDSD) was proposed as a low-complexity version of MSDD that searches for a symbol detection solution. Sphere decoder only examines the candidate symbols that lie inside a sphere of a certain radius [35]. MSDSD is used to implement MSDD when the transmission block size is large and implementing the brute-force search of MSDD becomes infeasible.
Differential detection (DD) is the most simple non-coherent detection form for an observation window, N = 2. Each two consecutive symbols are processed jointly in order to get a single information symbol which could be obtained from (9) by restricting the block size to N = 2.
B. Decision-feedback differential detection Decision-feedback differential detection (DFDD) [13], [36] improves the performance of DD and still avoids the high computational complexity of MSDD. It combines the principle of decisionfeedback and optimum decision order [13], [14], and lead only to a small loss in performance where Q PSK {x} is defined as The decisions are taken in that optimized order, where the sorted estimated symbol b DF DD where the optimum sorting indices arê and the phase quantization error ∆Q PSK {x} is defined 3 For all the detection techniques described above, the information symbol a k is obtained by reversing (1) such that: To provide a reference for comparison between the proposed detection schemes and conventional non-coherent DPSK detection, coherent maximum-ratio combining (MRC) is used. In MRC, knowledge of the channel matrix is used to detect the transmitted information symbol such that deep-learning-based multiple-symbol differential detection. In the second one deep-learningbased multiple-symbols differential detection is implemented with different architecture, which simplifies the deep-learning-classification problem for larger block sizes.
A. Deep-learning-based multiple-symbol differential detection The proposed deep-learning-based DPSK multiple-symbol differential detection transceiver presents a deep-learning-based counterpart implementation for MSDD to find the optimum block solution that maximizes (9). Fig. 2 depicts the model that consists of the transmitter part, channel part, and the proposed receiver design. The transmitter side performs differential phase-shift keying modulation for the generated information symbols then transmits the modulated symbols. The massive MIMO differential phase-shift keying signal is transmitted over a flat-fading channel.
Finally, The proposed receiver is composed of the auto-correlation block, the data preparation block, followed by the classification deep neural network.
The auto-correlation block is responsible of calculating the correlation matrix coefficients in (10). Then, the correlation matrix is passed as an input to the data preparation block described in (18). The data is prepared by splitting the complex upper part of the correlation matrix into real ( {.}) and imaginary ( {.}) parts, then passed as an input to the neural network.
The classification deep neural network (DNN) block has two operational modes that divides the detection process into two phases as depicted in Fig. 3. The offline training phase is responsible for generating the DNN offline training labeled data for the DNN block. The online testing block is the one used to simulate the real-time massive MIMO DPSK modulated signal transmission.
In the offline training mode, the offline training block is active, while the testing block remains inactive. Then the testing mode is activated for the real-time operation of the system after the network has been trained. In this phase, the offline training block is suspended, and the online block accesses the DNN to predict the transmitted information symbols.
The classification deep neural network output weights are passed to the maximum operator block to output the learned block of symbols that corresponds to the maximum network output weight, In this receiver design, the network is trained for the whole block of N symbols jointly to predict the best candidate transmitted block B. Deep-learning-based multiple-symbol differential detection alternative architecture In this section an alternative equivalent architecture for multiple-symbols differential detection is proposed to simplify the deep-learning classification problem of deep-learning-based multiplesymbol differential detection into simpler multiple classification problems as justified in the Appendix. This design outputs each decoded individual symbol similar to DFDD unlike the previous architecture that outputs the whole detected block similar to MSDD. In the proposed deeplearning-based multiple-symbol differential detection alternative architecture receiver design, the neural network input data preparation block is the same used in deep-learning-based multiplesymbol differential detection described in Setup 18. After that, deep-learning classification is Setup 1 Deep-learning-based DPSK multiple-symbol differential detection neural networks offline-training and testing. 1: Initialize the DNN model; 2: Generate the offline training data for a number of m training realizations. For one realization, the neural network input data are denoted as x = [x [1] , x [2] , . . . , is the input per node. The output label per realization is denoted as y.
3: Convert the output label y = y i ∈ M N −1 to one-hot vector [w [1] , . . . , , where:   Table I. First, the neural network parameters optimization have been studied. Afterwards, the symbol-errorrate performance of the two proposed deep-learning-based differential detection architectures for different block sizes is evaluated. Finally, the achieved results are compared with conventional differential detection algorithms.
The software applications used to obtain the numerical results are, MATLAB and Python 3.6. As a Setup 2 Deep-learning-based multiple-symbol differential detection alternative architecture neural networks offline-training and testing. 3: For each output label y k ∈ y, y k = y i ∈ M, convert y i to one-hot vector [w [1] k , . . . , w   Number of output layer neurons for multiple-symbol differential detection alternative architecture 4 Batch_size 32

Number of epochs (epochs) 10
The batch_size is the number of training samples to work through and passed as an input in a matrix form for just one iteration of the learning Adam optimization training algorithm [37] before the internal model fitting parameters are updated. Such that, the network training is done in a number of m batch_size training iterations. The larger the batch_size, the faster the training algorithm. The number of epochs represents how many times does the neural network see the whole labeled training samples with size m over and over again to improve the precision of the network.

A. Network Parameters Optimizations
To study the effect of the offline training data size on the performance of the DNN model, Fig. 6 presents the results of the simulations for single-user transmission with block size of N = 3 for deep-learning-based multiple-symbol differential detection alternative architecture.
It shows the SER versus SNR curves of just one epoch offline training for different generated training data sizes. After the offline training of the DNN, the network is tested for the target SNR range. An optimum brute-forth implementation for MSDD algorithm is used as a reference for the performance. It is clear that the more samples used in the training, the better the symbol- network hyperparameters for a better fitting of the offline data [38]. Regarding the effect of the number of epochs used in the offline training of the network on the performance of the DNN model, Fig. 7 presents the results of the proposed scenario simulations for single user transmission with block size of N = 3 for deep-learning-based multiple-symbol differential detection alternative architecture. It shows SER versus SNR curves for training data size m = 100, 000 used for the DNN offline training. After the offline training of the DNN, the network is tested for the target SNR range. The results show that, the higher the number of epochs, the better the classification accuracy that results in better SER performance till 80 epochs are reached, after that the performance saturates when optimum MSDD results are reached.
The performance of the model for different training data sizes while fixing the number of epochs was studied in Fig. 6, as well as the case when the number of epochs changes for a fixed data size in Fig. 7. Now the trade-off between increasing the number of epochs and the training data size for a better performance, is studied in Fig. 8. It shows the symbol-error-rate performance for a block size of N = 3 measured at E s /N 0 = 15dB, while fixing all the other network parameters for different fixed total number of used m × epochs training data samples. The measuring point SNR= 15dB was chosen based on Fig. 6, since the highest error gap appears at that point. The optimum MSDD performance measured at the same SNR value is shown in the dotted line to provide a lower error bound for the numerical results comparison. It is noted that, the minimum data size that achieves MSDD results is m = 250, 000 when m × epochs = 10, 000, 000 which corresponds to passing the training data set by the network for epochs = 40 times. However, the results approaches MSDD as well when the generated data size m is increased while decreasing the number of epochs. The minimum data size that achieves MSDD performance for just one epoch is m = 10, 000, 000 samples.
It is worth noting that, generating more training data samples to be used as the whole m training data, will always improve the network performance. However, if generating the huge needed training data is not possible, reusing the available generated training data for several times will improve the performance as well.
To check the effect of changing the batch_size parameter on the performance, for fair comparison, Fig. 9, depicts the simulation results for two cases of m × epochs = 1, 000, 000. It is clear that the batch_size effect on the symbol-error-rate performance could be neglected. The smaller the batch_size the slower the offline training algorithm speed in the simulation time since the algorithm exhibits higher number of iterations. However, it should be noted that the network training is done offline for one time at the beginning, therefore, the training speed is ignored.

B. Deep-learning-based MSDD Symbol-Error-Rate Performance Analysis
To evaluate the performance of the proposed deep-learning-based multiple-symbol detection schemes, Fig. 10 shows the symbol-error-rate (SER) performance versus signal-to-noise ratio for transmission block of size 2, 3, 5, and 10 symbols compared to coherent maximum-ratio combining detection. The larger the block size, the closer the performance to coherent detection.
However for block size equals to 10 and higher, the number of output layers nodes increases, which leads to higher neural network classification complexity and the classification accuracy becomes poor. To solve the high classification neural network complexity for larger block sizes, deep-learningbased multiple-symbols differential detection alternative architecture was proposed. Fig. 11 depicts the symbol-error-rate versus signal-to-noise ratio performance for deep-learningbased multiple-symbols differential detection alternative architecture, where deep-learning-based multiple-symbols differential detection with the alternative architecture outperforms deep-learningbased multiple-symbol differential detection for large block sizes N ≥ 5 compared to the results in Fig. 10.
It is worth noting that the power efficiency loss due to noisy channel estimates to acquire the channel matrix in coherent MRC detection is not consider here (we assume perfect channel estimation), which means that the margin between the achieved results and coherent detection in practical scenarios is smaller. It should be noted that, for deep-learning-based multiple-symbol 20 Fig. 9: Symbol error rate vs. 10 log 10 (E s /N 0 ) (in dB) for block size N = 3. Non-coherent deep-learning-based multiple-symbol differential detection alternative architecture for (m = 100, 000, number of epochs=10) and (m = 1, 000, 000, number of epochs=1) increasing the batch_size comparison. differential detection when the DPSK block size is increased (N ≥ 10), the coherent results could be approached in principle.

C. Comparison with conventional non-coherent detection
The symbol-error-rate versus signal-to-noise ratio performance of the proposed deep-learningbased non-coherent multiple-symbols detection receiver architectures are compared with conventional non-coherent DPSK detection algorithms mentioned in Section 3 (conventional differential detection, sorted decision-feedback differential detection, and optimum multiple-symbol differential detection implemented via multiple-symbol sphere decoding). The results of the comparison is shown in Fig. 12, Fig. 13 Comparison between the proposed deep-learning-based multiple-symbol differential detection for different block sizes. Reference: coherent MRC detection.
detection results that outperforms sorted-DFDD. Comparison between the proposed deep-learning-based multiple-symbols differential detection alternative architecture for different block sizes. Reference: coherent MRC detection.
for signals classification is done in the offline training phase however, for the online execution of the system, it is done after the training in the real-time with no complex computations just by using the trained mathematical model to directly predict the transmitted symbols.

VI. CONCLUSIONS
The use of non-coherent differential phase-shift keying signal detection for massive MIMO systems is advised to get rid of the complexity and the power needed for the process of channel estimation. However, most of the conventional non-coherent differential detection techniques still require some complex computations at the receiver side to perform differential detection.
Conventional algorithms of performing non-coherent differential detection include multiplesymbol differential detection and decision-feedback differential detection. MSDD is the optimum and most complex algorithm with O(M N −1 ) computational complexity for the brute-forth search.
DFDD performs differential detection with lower complexity with a trade-off in performance. 23 Fig. 12: Symbol error rate vs. 10 log 10 (E s /N 0 ) (in dB) for massive MIMO 4-DPSK for block size N = 3. Comparison between the proposed deep-learning-based non-coherent multiple-symbol differential detection and conventional non-coherent differential detection related work. Reference: coherent MRC detection.
In this paper deep-learning is used to provide low complexity solution to the optimum multiplesymbol differential detection algorithm. Two deep-learning-based multiple-symbol non-coherent DPSK signal detection receiver designs have been proposed. Which are, deep-learning-based multiple-symbol differential detection and deep-learning-based multiple-symbols detection alternative architecture. The power of deep-learning classification was used for low complexity detection, since training the deep-learning-model to learn the mathematical relation between the input and the output is done offline once and the detection complexity is simplified into substitution in the trained model to predict the output. The performance of the designs were compared to sorted decision-feedback differential detection and multiple-symbol differential detection. Where multiple-symbol sphere differential detection was used to implement MSDD.
The proposed deep-learning-based multiple-symbol differential detection alternative architecture outperformed sorted-DFDD and achieved an optimum MSDD performance while getting rid of the computational complexity and execution time for real-time operation compared to MSDD and sorted-DFDD as well. Both of the proposed deep-learning-based multiple-symbol differential detection architectures behaved the same for smaller block sizes while, deep-learning-based multiple-symbol differential detection alternative architecture outperformed deep-learning-based multiple-symbol differential detection for larger block sizes. Since the huge block classification problem was reduced to multiple-symbols smaller classification problems.
Future directions to extend the use of deep-learning-based classification for non-coherent signal detection is encouraged. The proposed single-user multiple-symbol differential detection receiver design could be extended to multiple-users multiple symbols classification for higher capacity and data rates. Due to the power of deep-learning ability to provide mathematical models for complex algorithms with no online execution complexity, further encoding schemes could be In the first deep-learning-based multiple-symbol differential detection architecture, it needs a huge 4 N −1 -labels classification problem, that could be reduced to N − 1 4-labels classification problem as proposed in deep-learning-based multiple-symbols differential detection alternative architecture. Generally, for data classification problems, increasing the number of output labels decreases the probability of satisfying the convex property needed for correct label classification [39]. Nevertheless, the growing of the number of output layer classification labels, leads to a hyper-linear growth of the neural network size, which increases the training data, time and the memory usage significantly. A solution for the huge number of labels classification problem is to convert the problem to solving smaller sub-problems instead of solving one huge problem.
By comparing our proposed models to binary classifiers, the reduced classification problems could be quaternary (4-classes) problem instead of binary (2-classes) problem. The number of code words, is equivalent to the number of block combinations 4 N −1 , and the number of decoded symbols per transmission block is N − 1. In summary one quaternary classifier is learned for each transmitted symbol as shown in (19).
where each f i is the "symbol-position function" of one quaternary classifier.
For binary and M -ary classification to be valid, two properties should be satisfied for proper multi-class problem output classification. which are, row separation and column separation as discussed in [40].
Row separation: Each block of symbols should be well-separated in terms of Hamming distance from other block combinations.
Column separation: Each symbol-position function f i should be uncorrelated with the other learned functions of other symbols.
Those two properties are satisfied in the proposed deep-learning classification multiple-symbols differential detection and its alternative architecture. Row separation is achieved between the possible blocks, since each combination of a block is separated from other combinations by how many symbols they differ from each other [41], which is defined as Hamming distance. The minimum hamming distance between the blocks is equal to 1 symbol, since the total number of the blocks corresponds to all the possible 4 N −1 combinations. Nevertheless, column separation is achieved in the proposed designs, since every transmitted symbol a k is independent from the next transmitted symbol a k+1 .
Which proves that, solving the proposed classification problem by one huge 4 N −1 -one-hot output vector, is equivalent to solving the problem by N − 1 small 4-one-hot output vectors. In both cases the classification network is able to learn the dependencies between the input symbols data of the whole block symbols and detects each of the transmitted symbols in the block. In case of one huge classification problem, the network learns the dependencies between the input symbols data then outputs which one of the 4 N −1 -labels was transmitted. In case of dividing the classification network into N − 1 smaller networks, each network targets detecting a symbol in the block by learning the dependencies between the input symbols data and outputs the targeted symbol out of 4-symbols.