Adaptive Deep Learning Aided Digital Predistorter Considering Dynamic Envelope

Memory effects of radio frequency power amplifiers (PAs) can interact with dynamic transmitting signals, dynamic operations, and dynamic environment, resulting in complicated nonlinear problems of the PAs. Recently, deep learning based schemes have been proposed to deal with the memory effects. Although these schemes are powerful in constructing complex nonlinear structures, they are still direct learning-based and are relatively static. In this paper, we propose an adaptive deep learning aided digital predistortion (DL-DPD) model by optimizing a deep regression neural network. Thanks to the sequence structure of the proposed DL-DPD, we then make the linearization architecture more adaptive by using multiple sub-DPD modules and an ensemble predicting process. The results show the effectiveness of the proposed adaptive DL-DPD, and reveals that the online system handovers the sub-DPD modules more frequently than expected.

throughput of secondary users under interference constraint [14], [15]. Thus, the modulated and modified signal with non-constant envelope can interact with the environment (e.g., self-heating and temperature), resulting in a challenge to the PA linearization techniques. Therefore, for future smart systems containing different types of connections and demands, it is essential for the DPD to keep pace with the changing operations and environment.
Considering the pros and cons of conventional direct learning [16]- [18] and indirect learning [10], [19], [20] architectures, H. Le Duc et al. [21] proposed a DPD architecture cascading an offline direct learning-based module and an adaptive indirect learning-based module. In the cascaded architecture, the static direct learning-based module can deal with the PAs nonlinear memory effects, and the adaptive module is designed to handle residual distortion introduced by the dynamics. Recently, machine learning [22] provides an alternative option in the domain of wireless communications, for example, the 5G and beyond [23]- [32]. Some deep learning-based linearization methods [4], [33], [34] have been conducted and have been proven advanced in modeling and linearizing the PAs. However, the proposed long short-term memory (LSTM) [35] or bidirectional long short-term memory (BiLSTM) [36] based DPD architectures are all offline (static) direct learning-based. It is necessary to make the deep learning-based schemes more adaptive to match the varying factors.
The cascaded structure and the deep learning-based ideas motivate us to design a new framework to take advantages of both the static and dynamic strategies. On one hand, the deep learning-based DPD (DL-DPD) can be more accurate than conventional direct learning architectures. On the other hand, the online adaptive strategy can meet the demands posed by new dynamic scenarios. The contribution of this paper is listed as follows: r To deal with the PAs nonlinear behavior with memory effects, a direct learning-based DPD is proposed by optimizing a deep regression neural network [37].
r We propose an adaptive strategy to track the PAs interaction with the dynamic operations and environment.
r We conduct subsystem level and system level evaluations of the proposed architecture in comparison with other deep learning based DPD models.

II. PROBLEM FORMULATION
This section forms the behavioral modeling and linearization problem connecting deep learning and adaptive strategy. A block diagram containing offline learning and online implementation of the static DPD is shown in Fig. 1. In the offline case, the behavioral model structure is first chosen to represent the PA. For instance, the Volterra seriesbased and polynomial-based topologies [10], [38] are widely used for modeling PAs that exhibiting memory effects. Next, the coefficients of the behavioral model can be extracted by using the gathered input and output samples of the direct connected PA. The direct learning scheme then computes an inverse model of the behavioral model to obtain the DPD. As shown in Fig. 1, there are two kinds of evaluations that can be conducted, where the substituted evaluation means that the evaluation is based on a virtual PA represented by its behavioral model.
If the PA's behavioral model is implemented by a deep neural network, the digital baseband input and output samples can be used to 0018-9545 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information. In the BiLSTM-based learning architecture from [4], M(·) can be deduced into a nonlinear transformation between the input and estimated output. Using a forward LSTM layer, a backward LSTM layer, and other layers, we can obtain where L 1 , . . . , L p represent the followed layers of the deep regression neural network. As an inversion of the behavioral model, the BiLSTMbased DPD can be expressed as where K 1 , . . . , K p represent the followed layers of the DPD model. z 1 , z 2 , . . . , z k are the pre-distorted input signals. Furthermore, if we consider the dynamics of the input (e.g., changing envelope), the optimal hyper-parameters of the layers will be variable. The expected adaptive DPD can be denoted as where t represents the update time of the deep regression neural network. Thus, the followed sections are aiming at solving the above formulation.

III. PROPOSED ADAPTIVE LINEARIZATION ARCHITECTURE
The overall architecture of the proposed adaptive DL-DPD solution is depicted in Fig. 2, where multiple deep regression neural network-based DPD modules and an adaptive strategy are integrated. The individual sub-DPD modules are trained by different I/Q datasets with different PAPRs, respectively. Unlike the conventional feedback schemes, the proposed adaptive strategy is executed on the input sequences before feeding the sequences into the DPD modules, which makes the overall architecture forward and concise.

A. Static Deep Learning-Based DPD
If the PA's behavioral model can be accurately extracted, proper inverse technique can be undertaken to obtain the DPD linearization [4]. In our scheme, the behavioral and DL-DPD models share the same selected structure as shown in Fig. 3. The structure consists an input layer, a BiLSTM layer, a LSTM layer, and four fully-connected layers. Compared with the behavioral model, the DL-DPD just need an inverse training process.
Data preparation: The sampled baseband I/Q signals are divided into training and testing datasets. The I/Q elements are then normalized to avoid the problem of gradient explosion. Thus, the output values of the DL-DPD need to be denormalized by using the statistical parameters of the training dataset. To match the sequence-to-sequence regression task, the I/Q elements are reorganized to form I/Q sequences with the length T M (i.e., the truncated memory depth of the PA).
Training options: The DL-DPD model is trained in Matlab with Deep Learning Toolbox. We use the Adam [39] as the optimizer, and it can achieve better performance than the SGD [40] in our implementation. In order to approach the optimal solution point gradually, the learning rate drops periodically as the training epoch increases.
Structure selection: We begin with a baseline structure consisting of a BiLSTM layer and three fully-connected layers. We observe that an extra LSTM layer following the BiLSTM layer can earn profits. More fully-connected layers can result in better regression accuracy; however, the performance will degrade if there are more than four fullyconnected layers. Another interesting observation is that the selected optimal (perhaps sub-optimal) layers meet the following relationship: where − −−− → LSTM represents the LSTM layer, F i denotes the i-th fullyconnected layer, length(·) is used to calculate the length of each layer, and N i is an integer.
Non-causal concern: To obtain a better predistortion, future input is utilized in the DL-DPD involving a backward LSTM layer. Accordingly, a time delay module [4] can be used to wait for the future baseband input. The time delay module will result in additional but acceptable latency.

B. Adaptive Strategy for Dynamic Envelope
In general, the envelope of high-order and complex modulated signals is dynamic. For instance, the orthogonal frequency division multiplexing (OFDM) utilized in the 4G and 5G can result in dynamic envelopes and high PAPRs. The linearization of the PAs with dynamic input can be conducted by a unified DL-DPD, where the DL-DPD is trained by a mixed dataset involving as many as input sequences and cases. However, if we consider a reconfigurable system where the power control may be executed irregularly (e.g., in an intelligent cognitive radio scenario), the long-term self-heating will result in changing behavior of the PAs. Therefore, it is necessary to make the DL-DPD and the corresponding training process more adaptive.
Instead of the unified DL-DPD, the adaptive strategy aims at providing a multiple DL-DPD modules-based solution. The choice of the number of the parallel DL-DPD (sub-DPD) modules is based on the granularity of power control. As it is shown in Fig. 2, we use an envelope detection module to detect the standard deviation σ x = [σ I ; σ Q ] of the current input I/Q sequence in the temporary storages, and we have σ I = 1 where I(x i ) and Q(x i ) are the in-phase and quadrature parts of the input element x i , respectively. σ I and σ Q  are the in-phase and quadrature standard deviation values of the input sequence, respectively. μ I and μ Q are the in-phase and quadrature mean values, respectively.
Then we use the gates (namely, g1, g2, and g3 in Fig. 2) to control the weight (or on-off) of each sub-DPD. The weights β (t) i are computed by comparing σ (t) x with the constants σ i (i = 1, 2, . . ., k) of the normalization processes, resulting in where σ i is the prior standard deviation of each training dataset. In a simplified on-off mode, we quantize the maximum weight as one, and set other weights as zero. Hence, the adaptive DL-DPD can be expressed as an ensemble predicting process, namely 1 )). . .) (7) IV. RESULTS AND DISCUSSION

A. Experiment Setup
The training and testing of the proposed architecture is conducted on the basis of real measurements of a wideband PA. In correspondence with a dynamic scenario, the I/Q samples have been first acquired by an experiment and modulated signals with bandwidth of 50 MHz and PAPR of 8.92 dB. The baseband I/Q are sampled by adopting a sampling rate of 245.76 MHz. To simulate a scenario executing the power control, three simulation datasets are generated by introducing additional dynamics to the measurements. As a result, the three datasets have different PAPRs and AM/AM transformations, describing a virtual PA with more complicated behavior.

B. Testing of the Individual Direct Learning Model
To test the performance of the proposed direct learning architecture in Fig. 3, we have compared a multilayer perceptron (MLP)-based model, a CNN-based model, a LSTM-based model, and a BiLSTM-based model [36] with this work in terms of the behavioral modeling accuracy.
The tests are handled on the same static dataset. We set the metric as power spectral density (PSD) of the prediction errors, and the results are depicted in Fig. 4.
From Fig. 4, it is apparent that the proposed architecture can outperform other schemes for the tested case. Generally, the architectures with bidirectional structures have better ability in modeling the memorial PAs. It is inferred that the bidirectional structures can learn from the future input to forecast the self-heating intensity, and thus it can make more accurate prediction of the output. The modeling performance of each individual DPD can be further enhanced if we introduce an extra LSTM layer and optimize the permutation of the fully-connected layers. In addition, each DPD needs approximately 4.8 MB storage (on average) for storing the hyper-parameters.

C. Testing of the Adaptive DL-DPD
We have designed a dynamic testing scenario where the transmitting power varied at three different levels over time (see Fig. 5, where the darker color denotes the higher average power). Then we have tested the proposed adaptive DL-DPD architecture in the simplified on-off mode. Because most weights of the sub-DPD modules were set as zero, just one sub-DPD was activated at every moment. Fig. 5 shows the activated sub-DPD changing over time. The results in Fig. 5 illustrate that although the training environment and testing environment are alike, the online system handovers the sub-DPDs more adaptively than expected (e.g., sub-DPD3 can be selected with some probability even if the average transmitting power is relatively low). Fig. 6 depicts the efficiency comparison of the different schemes in linearizing the tested PA, including upper and lower adjacent channel power ratio (ACPR) values. Here, the unified-DPD denotes the DPD model trained by a mixed dataset (composed by three datasets with different PAPRs). And the individual sub-DPD is trained and tested under two static datasets, respectively. Note that all the different DPDs inherit the base structure shown in Fig. 3. As it can be seen in Fig. 6, the proposed adaptive DL-DPD architecture presents the best PSD performance among the four schemes. Compared with the individual sub-DPD tested under a static dataset, the adaptive DL-DPD tested under a dynamic scenario achieves a better ACPR. In fact, because of the complex modulation in the 4 G and 5 G signals, the static dataset without power control is also dynamic. Therefore, the adaptive DL-DPD may surpass the unified-DPD and each sub-DPD in different scenarios by using more storage and approximately equal computing.

V. CONCLUSION
An adaptive deep learning-based DPD architecture has been proposed in this paper to track the PA's nonlinear behavior and interaction with the changing operations and environment. The proposed architecture can be regarded as an ensemble predictor consisting multiple DL-DPDs (i.e., sub-DPDs). Most of the memory effects of the tested PA can be mitigated by the BiLSTM-based neural networks. The problem introduced by the dynamic scenarios can be dealt with by the proposed adaptive strategy.