When Probabilistic Shaping Realizes Improper Signaling for Hardware Distortion Mitigation

Hardware distortions (HWD) render drastic effects on the performance of communication systems. They are recently proven to bear asymmetric signatures; and hence can be efficiently mitigated using improper Gaussian signaling (IGS), thanks to its additional design degrees of freedom. Discrete asymmetric signaling (AS) can practically realize the IGS by shaping the signals' geometry or probability. In this paper, we adopt the probabilistic shaping (PS) instead of uniform symbols to mitigate the impact of HWD and derive the optimal maximum a posterior detector. Then, we design the symbols' probabilities to minimize the error rate performance while accommodating the improper nature of HWD. Although the design problem is a non-convex optimization problem, we simplified it using successive convex programming and propose an iterative algorithm. We further present a hybrid shaping (HS) design to gain the combined benefits of both PS and geometric shaping (GS). Finally, extensive numerical results and Monte-Carlo simulations highlight the superiority of the proposed PS over conventional uniform constellation and GS. Both PS and HS achieve substantial improvements over the traditional uniform constellation and GS with up to one order magnitude in error probability and throughput.


I. INTRODUCTION
Exponentially rising demands of high data rates and reliable communications given the limited power and bandwidth resources impose enormous challenges on the next generation of wireless communication systems [1], [2].Various research contributions propose new configurations and novel techniques to address these challenges [3], [4].Nonetheless, the performance of such systems can be highly degraded by the hardware imperfections in radio frequency (RF) transceivers [5]- [7].Such imperfections give rise to additive signal distortions emerging from the phase noise, mismatched local oscillator, imperfect high power amplifier/low noise amplifier, non-linear amplitude-to-amplitude and amplitude-to-phase transfer [8]- [14].Various contributions emphasized the distinct improper behavior of these hardware distortions (HWDs) [15]- [18], which requires effective compensation techniques to meet the performance demands.

A. Motivation
The improper Gaussian signaling (IGS) is proven as an effective scheme to mitigate the deteriorating effects due to the existence of improper noise or interference in wireless S. Javed, A. Elzanaty, O. Amin, B.Shihada and M.-S.Alouini are with CEMSE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Makkah Province, Saudi Arabia.E-mail: {sidrah.javed,ahmed.elzanaty,osama.amin,basem.shihada,slim.alouini}@kaust.edu.sacommunication systems.More precisely, IGS is a generalized complex signaling that allows the signal components to be correlated and/or to have unequal power, as opposed to proper Gaussian signaling [19].IGS offers an additional degrees of freedom (DoF) in signaling design characterized by the circularity coefficient [20].Several studies highlight the significance of IGS to improve the system performance under improper interference [21]- [28].Recent studies quantified the impact of IGS in dampening improper noise effects in multi-antenna or multi-nodal system settings [17], [29]- [33], IGS has emerged as promising candidate to improve the average achievable rate performance in multi-antenna systems suffering from HWD [30], [31].Moreover, IGS benefits can also be reaped in various full-duplex/half-duplex relay settings by effectively compensating the residual self-interference, inter-relay interference and/or HWD [17], [32], [33].Additionally, the ergodic rate maximization and outage probability minimization based on a generalized error model for hardware impairments in singleinput multiple-output (SIMO) and multiple-input multipleoutput (MIMO) systems is studied in [30], [31].

B. Background
Despite the overwhelming benefits of IGS, it is practically infeasible owing to the high detection complexity and unbounded peak-to-average power ratio [2], [34].This motivated the researchers to design some equivalent finite and discrete asymmetric signaling (AS) schemes for practical implementation.Improper discrete constellation, or AS, entails redesigning the symmetric discrete signal constellation to convert it into an asymmetric signal [2].Several studies focused on geometric shaping (GS) as a possible designing scheme to improve system performance.GS transforms equally spaced symbols to unequally spaced symbols (due to correlated and/or unequal power distribution between quadrature components of the symbols) in a distinct geometric envelop such as ellipse [35], parallelogram [34], [36] or some irregular envelop [37].A family of improper discrete constellations generated by widely linear processing of a square M -ary quadrature amplitude modulation (QAM) depict parallelogram envelop [34].Similarly, GS based on optimal translation and rotation also yields parallelogram envelop [36].However, conditioned on high signal-to-noise ratio (SNR) and higher order QAM, the optimal constellation is the intersection of the hexagonal lattice/packing with an ellipse where the eccentricity determines the circularity coefficient [35].GS has emerged as a competent player to reduce shaping loss and improve reception at lower signal-to-noise ratios in terrestrial broadcast systems [38], [39].GS parameters can be designed for diverse objectives such as capacity maximization [34], bit error rate (BER) reduction [36], and symbol error probability minimization [35].Although the asymmetric discrete family of constellations is practical, they exhibit two types of loss, i.e., shaping loss and packing loss in approaching IGS theoretical limits [34].

C. Related Work
Most of the efforts to close the gap between AS and ideal IGS are concentrated around GS with a limited focus on probabilistic shaping (PS) as another way to implement AS for HWD.Given a fixed number of symbols and the symbol locations, an asymmetric constellation can be obtained by adjusting the symbol probabilities [40].PS maps equally distributed input bits into constellation symbols with nonuniform prior probabilities [41].This can be achieved using distribution matching (DM) for rate adaptation such as constant composition DM [42], adaptive arithmetic DM [43], syndrome DM [44], [45] DM-based compressed sensing [46], [47].
PS-based schemes have been employed to enhance the system performance in optical fiber communications (OFC) and free-space optics (FSO).In OFC, multiple transformations are presented to approach Gaussian channel capacity using PS including prefix codes [48], [49], many-to-one mappings combined with a turbo code [50], distribution matching [51] and cut-and-paste method [52].Furthermore, multidimensional coded modulation format with hybrid probabilistic and geometric constellation shaping can effectively compensate nonlinearity and approach Shannon limits in OFC [53].Coded modulation scheme with PS aims to solve the shaping gap and coarse mode granularity problems [54].Interested reader can read the classic work [55] for the design guidelines of AS in the coherent Gaussian channel with equal signal energies and unequal a priori probabilities.Probabilistic amplitude shaping is another concept that can only be used for symmetric constellation with coherent modulation, which greatly limits its application [56].For FSO, a practical and capacity achieving PS scheme with adaptive coding modulation is proposed with intensity modulation/direct detection [57].
The concept of PS is widely employed in the OFC and FSO systems.However, it is quite not well investigated in wireless communication systems and only a few studies have contributed in this domain [58], [59].For example, enumerative amplitude shaping is proposed as a constellation shaping scheme for IEEE 802.11 which renders Gaussian distribution on the constituent constellation [58].Moreover, PS has been proposed to maximize the mutual information between transmit and receive signals for non-linear distortion effects in additive white Gaussian noise (AWGN) channels [59].To the best of authors' knowledge, PS has not been used to enhance the error performance or to realize the IGS for wireless communication systems with HWD.

D. Contributions
In this paper, we propose PS as a method to realize improper signaling, which is beneficial in mitigating the impact of HWD on the BER performance.Motivated by IGS's theoretical results in various scenarios [2] and the issues associated with GS, such as high shaping gap and coarse granularity, we adopt PS to realize the IGS scheme and combat HWD to assure reliable communications.In the following, we summarize the main contributions as: • We derive the optimal maximum a posterior (MAP) detector for a discrete AS and carry out BER analysis for the adopted HWD communication system.• We design the probabilistic shaped AS under power and rate constraints for hardware distorted system and propose adaptive algorithm that tune the symbol probabilities for PS to minimize the BER performance.• We further suggest a hybrid shaped AS scheme that reaps benefits of both PS and GS and present an adaptive algorithm that tune both signal probability and shaping parameters.
• Finally, we present numerical Monte-Carlo simulations to validate the performance of the proposed techniques and compare the BER and throughput performance of PS, GS, and hybrid shaping (HS) in AWGN and Rayleigh fading channels.

E. Paper Organization and Notation
The rest of the paper is organized as: Section II describes statistical signal characteristics, HWD model, and optimal receiver for the adopted HWD system.In section III, we present the error probability analysis using the union bound on pairwise error probability and derive instantaneous BER for generalized M -ary modulation scheme.Next, we propose PS design using successive convex programming (SCP) algorithm and some toy examples for comprehensive illustration in section IV.Later, HS parameterization and design along with the respective MAP and error probability analysis is carried out in section V, followed by the numerical results in Section VI and the conclusion in Section VII.
Notations: In this paper, |a| and a * represent the absolute and complex conjugate of a scalar complex number a.The probability of an event A is defined as Pr(A).The notations f z (z) and f z|y (z|y) denote the probability density function (PDF) and conditional PDF of a random variable (r.v.) z given y.The operator E[.] denotes the expected value.Considering a r.v.Λ, the real/in-phase and imaginary/quadrature-phase components of Λ are denoted as Λ I and Λ Q , respectively.Moreover, f (x) denotes the first order derivative of f (x) with respect to x.Additionally, Z + represents a set of positive integers k) and p (k) represent the instance values of the variable x and vector p, respectively, in the k th iteration of an algorithm.

II. SYSTEM DESCRIPTION
Impropriety incorporation is crucial for the systems dealing with improper signals, noise, or interference.Such characterization helps in meticulous system modeling, accurate performance analysis, and optimum signaling design.We begin by presenting the statistical signal model to introduce some preliminaries of the impropriety characterization.This will help to comprehend the impropriety concepts in the adopted system model with HWD.Then, the transceiver HWD model is described, and the optimal receiver is derived.

A. Statistical Signal Model
The impropriety characterization of a random variable (r.v.) x involves the identification and extent of improperness described by the pseudo-variance and circularity coefficient, respectively.
Definition 1.The pseudo-variance of x is defined as σ2 x = E[x 2 ] as opposed to the conventional variance x signifies a proper complex r.v.whereas a nonzero σ2 x identifies an improper complex r.v.Definition 2. The degree of improperness is given by the circularity coefficient C x σ2 x /σ 2 x , where 0 ≤ C x ≤ 1 [20].C x = 0 indicates proper or symmetric signal and C x = 1 indicates maximally improper or maximally asymmetric signal.
Evidently, the pseudo-variance is bounded, i.e., 0 ≤ σ2 x ≤ σ 2 x .Interestingly, a complex Gaussian random variable v = a + ib can be fully described as, where µ v and R vv are the mean and augmented covariance matrix of v, respectively [61], i.e.,

B. Transceiver Hardware Distortion Model
Consider a single-link wireless communication system suffering from various hardware impairments.The non-linear transfer functions of various transmitter RF stages, such as digital-to-analog converter, band-pass filter and high power amplifier result in accumulative additive distortion noise η t ∼ CN (0, κ t , κt ), where |κ t | ≤ κ t [6], [9].These distortions raise the noise floor of the transmitted signal x tx = x m + η t , where x m is the single-carrier band-pass modulated signal taken from M -ary QAM, M -ary phase shift keying (PSK), or M -ary pulse amplitude modulation (PAM) constellation with a probability mass function p m p X (x m ) rendering the transmission probability of symbol x m , and p [p 1 , p 2 , • • • , p M ].Let us define the set that includes all possible symbol distributions as The transmitted signal further undergoes a slowly varying flat Rayleigh fading channel g ∼ CN (0, λ, 0).Moreover, the receiver further induces an additive distortion η r , resulting from the non-linear transfer function of low noise amplifier, band-pass filters, image rejection low pass filter, analog-todigital converter.It is important to highlight that the receiver distortions are in addition to the conventional thermal noise at the receiver.
It is important to note that (5) reduces to the conventional signal model y = √ αgx m + w in case of ideal hardware, i.e., κ = 0, which is induced by imposing κ t = κ r = 0 and also κ = 0, which is deduced from Definition 2.
HWD can leave drastic effects on the system performance as they raise the noise floor.Although, the entropy loss of improper noise is less than the proper noise but it is difficult to tackle.It requires some meticulously designed improper signaling like IGS for effective mitigation.However, IGS is difficult to implement because of the unbounded peak-toaverage power ratio and high detection complexity [2], [34].Therefore, researchers resort to the finite discrete AS schemes obtained by GS.
We propose PS as another way to realize AS in order to effectively dampen the deteriorating effects of improper HWD.PS aims to design non-uniform symbol probabilities for a higher order QAM to minimize BER offering more degrees of freedom and adaptive rates.In the following section, we carry out the error probability analysis of the adopted system which lays foundation for the proposed PS design.

C. Optimal Receiver
Conventional systems with Gaussian interference employ least-complex receivers with either minimum Euclidean or maximum likelihood detectors.However, such receivers cannot accommodate the unequal symbol probabilities and improper noise.Therefore, the optimal detection in the presented scenario can only be achieved by the MAP detector at the expense of increased receiver complexity.Considering the improper Gaussian HWD and the non-uniform priors of the constellation symbols, the optimal MAP detection is given by where f YI,YQ|X,g (y I , y Q |x m , g) is the conditional Gaussian PDF of y representing maximum likelihood (ML) function given x m and g, as expressed in (10) at the top of next page.

III. ERROR PROBABILITY ANALYSIS
Considering the non-uniform priors and improper noise, the error probability analysis is carried out based on the optimal MAP detector presented in Section II.Symbol error probability P s is the accumulated error probability of all symbols with respect to their prior probabilities and is given as where Pr (e|x m ) is the probability of an error event given symbol x m was transmitted.In order to yield a tractable and simplified analysis especially for higher order modulation schemes, P s can be upper bounded as where, P mn is the pairwise error probability (PEP), which represents the probability of deciding x n given x m was transmitted, ignoring all the other symbols in the constellation [66].The PEP can be evaluated using the MAP rule in (9) as By substituting the conditional probability from (10) in (13) and after some mathematical simplifications, the PEP can be written as in (14), shown in the next page.Now, we find the inphase and quadrature-phase components of the received signal y for a given transmitted symbol x m as follows and respectively.Then, we substitute y I and y Q in ( 14), which can be further simplified obtaining, where with ξ mn = g d mn = g (x m − x n ) representing the distance between m th and n th symbol with channel coefficient g, and ψ is obtained by the superposition of z I and z Q as ) Clearly, ψ is another zero mean Gaussian random variable with variance σ 2 ψ expressed as Conclusively, P mn is the complementary cumulative distribution function of ψ and is given as Substituting the PEP derived in (21) to (12) along with the gray mapping assumption yields the following bound on BER where β mn 1 − ρ 2 z / √ αγ mn .The BER expression depends on the size of the constellation, prior probabilities of all the symbols, power budget, mutual distances between the transmitted and received erroneous symbols under Rayleigh fading, and HWD statistical characteristics.
In contrast to the monotonically decreasing BER for the ideal systems, the BER saturates after a specific SNR in the hardware-distorted transceivers.In this regard, we carry out the asymptotic analysis of the bit error probability to quantify the error floor as high SNR.Let us set the error floor can be upper bounded from (22) as in (24).We can see that the error floor depends on the adopted M -ary constellation, channel coefficient, HWD statistical characteristics, and symbol probabilities.
IV. PROPOSED PROBABILISTIC SIGNALING DESIGN We aim to design the non-uniform symbol probabilities, which minimize the BER of the adopted system suffering from HWD.The optimization is carried out given power and rate constraints.The rate of the conventional QAM with uniform symbol probabilities and modulation order M u is fixed, i.e., R = log 2 (M u ).However, we seek the maximum benefits of PS by allowing a higher-order modulation with M nu > M u , where M nu is the modulation order of the constellation with non-uniform probabilities p.Thus, the rate of this scheme can be designed such that R H(p) ≥ log 2 (M u ), rendering more design flexibility and hence is capable of reducing the BER.PS is capable of changing the transmission rate by changing the symbol distribution for a fixed modulation order, unlike uniform signaling, which needs to change the modulation scheme's order to change the rate for uncoded communications.
After designing the symbol probabilities, we can implement PS by using distribution matching at the transmitter to map uniformly distributed input bits to M nu -QAM/PSK symbols [42], [43], [46].Moreover, they can be detected using the proposed MAP detector (9) at the receiver that incorporates the prior symbol distribution.In the following, we formulate the PS design problem and propose an algorithm to obtain the non-uniform symbol probabilities followed by some toy examples.

A. Problem Formulation
The probability vector p [p 1 , p 2 , . . ., p Mnu ], containing probabilities of the symmetric M nu −QAM/PSK modulated symbols with M nu > M u 1 , is designed to minimize the upper bound on the BER derived in (22).In particular, we formulate the problem as 1 For Mnu = Mu, the distribution should be uniform to satisfy the rate constraint because uniform signaling has the largest entropy.
P1 : where (25b) and (25c) represent the average power and rate constraints, respectively, and H(p) is the source entropy, which represents the transmitted rate in terms of bits per symbol per channel use and is defined as The concave nature of information entropy in (25c) renders a convex constraint in p and the rate fairness is justified based on the trade off between BER minimization and rate maximization, while satisfying a minimum rate.Therefore, the idea is to employ a higher order non-uniformly distributed M nu −QAM/PSK as compared to a lower order uniformly distributed M u −QAM/PSK with same energy and at least the same rate to minimize BER.

B. Optimization Framework
The optimization problem P1 (25) is a non-convex optimization problem owing to the non-convex objective function even though all the constraints are convex.Therefore, we propose successive convex approximation approach to tackle it.We begin by approximating P UB b (p) with its first order Taylor series approximation.
First order Taylor series approximation of a function f (x) around a point x (k) is given as In order to compute ∂P UB b /∂p t , we rewrite (22) as where From (29) and by applying the Leibniz integral rule, we get Now, P UB b can be approximated from ( 27), (28), and (31) using first order Taylor series expansion around an initial probability vector p (k) as Successive convex programming minimizes P1 by iteratively solving its convex approximation P1a as presented in Algorithm 1.
It begins with the initiation of counter i, stopping criteria and the stopping threshold δ.Secondly, we choose some feasible PMF set p (i) ∈ S which satisfies the constraints (25b) and (25c).The while loop starts by evaluating the approximation PUB b p, p (i) around p (i) .The convex problem P1a is solved using the Karush Kuhn Tucker (KKT) conditions derived in Appendix C to obtain the optimal probabilities for P1a [67].The solution obtained in this iteration is updated as p (i+1) and is used to evaluate the stopping criteria ← p (i+1) − p (i) as shown in Algorithm 1.The loop ends when the change in two subsequent solution parameters in terms of the 2 norm is less than a predefined threshold δ.Once the stopping criteria is attained, the solution parameters p ( * ) are guaranteed to render a BER P * b which will be lower than the bound P UB b (P * ).

C. Toy Examples
A comprehensive illustration of probabilistically shaped M nu = 8-QAM with a 2 bits/symbol rate constraint, corresponding to M u = 4, is presented in Fig. 1 and Fig. 2. The relation between prior probabilities and different SNR values is presented in Fig. 1.Clearly, the probability distribution is quite random for lower SNR level such as α = 0 dB.However, it starts adopting uniform distribution of 0.25 for four of it's symbols, i.e., s1, s3, s6, and s8 while zero probabilities for the rest four symbols.This technique provides lower BER while maintaining 2 bits/symbol rate for a fair comparison with traditional 4-QAM.Interestingly, it achieves a lower BER by transmitting half of the symbols which are not the nearest neighbors.It is important to highlight that the proposed approach achieves this performance with the same power budget and transmission rate.
Another example illustrates the trend of probabilitic shaping for 8-QAM constellation at lower SNR level (keeping in mind that it assigns the uniform probabilities to four symbols at high SNR levels).The trend for lower HWD level such as η = 0.11 is quite random.However, it follows a decreasing probability trend for middle to higher HWD levels.Intuitively, it assigns higher probabilities to the symbols with least power and lower probabilities to the symbols with higher powers.This trend decreases the BER while maintaining the average power constraint.

V. HYBRID SHAPING WHERE CONVENTIONAL MEETS STATE-OF -THE-ART
In this section, we increase the AS design flexibility by allowing joint GS and PS, which we call it here HS, to improve the underlying communication system performance further.Throughout the design procedure, HS transforms the equally spaced uniformly distributed QAM/PSK symbols to unequally spaced symbols in a geometric envelope with non-uniform prior distribution.Thus, HS aims to optimize the symbol probabilities PS) and some spatial shaping parameters for the constellation (i.e., GS).

A. Hybrid Shaping Parameterization
Apart from the non-uniform priors, consider the asymmetric transmit symbol T resulting from the GS on the conventional baseband symmetric M -QAM/M -PSK symbol x m = [x mI x mQ ] T as v m = ARx m , where with translation parameter ζ ∈ (0, 1).Furthermore, the rotation is given by with rotation angle θ ∈ (0, µ π/2) for some constant µ.
Uniformly distributed symmetric M -QAM constellation has a rotation symmetry of n π/2, n ∈ Z + rendering µ = n to be good choice for GS.However, non-uniformly distributed M -QAM constellation can only be rotationally symmetric after 2nπ, thus µ = 4n is suitable for HS.This technique renders non-uniformly spaced symbols in a parallelogram envelop.It is important to highlight that this transformation preserves the power requirement.Power invariance of the rotation is a well known fact in the literature [66].However, the wisdom behind the structure of A (ζ) is unfolded in the following theorem.
Remark 1. GS parameterization using translation matrix A (ζ) preserves the power invariance of a complex random variable and inculcates asymmetry/improperness with the circularity coefficient ζ.
Proof.The proof is presented in Appendix B. Furthermore, the generalization of the same concept to the symmetric discrete constellations such as M -QAM and M -PSK is also described in Appendix B.

B. Optimal Receiver
The optimal receiver for hybrid shaped AS is also a MAP detector as derived in ( 9), but with a modified reference constellation v m in place of x m for all m ∈ {1, 2, • • • , M nu }.More precisely, the detected symbol, mHS , is the one that maximizes the posterior distribution, i.e., mHS = arg max 1≤m≤Mnu p V (v m )f YI,YQ|V,g (y I , y Q |v m , g) , (36) where, f YI,YQ|V,g (y I , y Q |v m , g) is similar to (10) by replacing all appearances if x m with v m for all m ∈ {1, 2, It is worth noting that non-uniform prior probabilities are inculcated in the detection process using MAP detector in place of ML detector.Moreover, the geometrically shaped symbols are taken from a modified symbol constellation.Hence, this requires updating the reference constellation for appropriate detection.

C. Error Probability
HS follows the same BER bound as derived in (22) but with modified γ mn .It can now be written using the following quadratic formulation as a function of ζ and θ.
where x mn is the real composite vector form of ξ mn = gd mn given by and G contains the statistical characteristics of the aggregate noise including in-phase noise variance, quadrature-phase noise variance, and the correlation between these components.
Thus, the BER of HS can be upper bounded as

D. Problem Formulation
HS targets the joint design of PS PMF p and GS parameters involving translation ζ and rotation θ parameter to minimize the BER bound given in (40).
where the average power constraint (25b) is updated as (41b) to account for the possible change in the power of the symbols by geometrically shaping the constellation.However, the proposed rate constraint (41c) remains intact.Additionally, there are some boundary constraints on ζ and θ, respectively.Intuitively, it is quite difficult to tackle this non-convex multimodal joint optimization problem.Therefore, we resort to the alternate optimization of PS parameters (p) and GS parameters (ζ, θ) using sub-problems P2a and P2b, respectively.Problem P2a designs the PS parameters for some given ζ and θ.It is quite similar to the problem P1 and thus, can be solved using Algorithm 1.
On the other hand, the GS optimization problem designs ζ and θ for fixed symbol probabilities p, given as The optimization problem P2b is a multimodal non-convex problem which is hard to tackled even by the SCP approach as employed in Section IV.The difficulty arises due to the absence of any constraints which restrict the feasibility region.The feasibility space enclosed by the boundary constraints is highly insufficient to serve our purpose.Therefore, we can approximate the solution using any of the following two methods • Trust region reflective method: This method defines a trust region around a specific initial point and then approximate the function within that region.The convex approximation is the first order Taylor series approximation using the gradient.It begins by minimizing convex approximation of the function to obtain a solution.This solution is the perturbation in the initial point rendering a new point which should minimize the original function.
Otherwise, we need to shrunk the trust region and repeat the process.Reflections are used to increase the step size while satisfying box constraints.After each iteration, we receive a new point which renders a lower objective function than the initial point.This iterative approach leads us to a local minimum and stops when some specified stopping criterion are met [68], [69].• Gradient descent: This method is a relatively faster approach to tackle the problem at hand.It is owing to the fact that it does not involve any approximation and Algorithm 2 Alternate Optimization 1: Initialize j ← 0, ← ∞ and Set tolerance δ 2: Choose feasible starting points p (j) , ζ (j) , and θ (j) .3: Evaluate P UB(j) b,HS p (j) , ζ (j) , θ (j) .4: while ≥ δ do 5: Solve P2a using Algorithm 1 with starting point p (j)  and given ζ (j) , θ (j) to obtain p (j * )

8:
Evaluate P UB(j+1) b,HS p (j+1) , ζ (j+1) , θ (j+1) .underlying optimization.It begins with an initial point and keeps updating the point in the descent direction using the gradients and a step size until it reaches a local solution or satisfies some stopping criterion [67].Interestingly, both of these methods require the gradients of P UB b,HS (p, ζ, θ) with respect to ζ and θ.Gradients are used either to approximate the function with it's first order Taylor series approximation within a trust region or to find the next point in the descent direction.The gradients are evaluated and presented in Appendix D.

E. Proposed Algorithm
The joint optimization problem P2 can be tackled using the alternate optimization algorithm as presented in Algorithm 2. It solves the sub problems P2a and P2b alternately and iteratively.It begins with some starting feasible points p (j) , ζ (j) , and θ (j) and evaluates P UB(j) b,HS p (j) , ζ (j) , θ (j) as a benchmark.The alternate optimization begins by solving P2a to minimize P UB b,HS with respect to p given a pair of ζ and θ.It is achieved by replacing all entries of x m with v m = ARx m ∀m.p (j * ) is obtained using the framework provided in Algorithm 1 which solves P1a iteratively.Then, the optimum p (j * ) is used as a given PMF to obtain the pair ζ (j * ) and θ (j * ) by solving P2b.These optimum parameter values are updated to attain next initial points.Moreover, P UB(j+1) b,HS p (j+1) , ζ (j+1) , θ (j+1) is also evaluated to compare the decrease in objective function.The norm of this difference is stored in and the process is repeated until this value drops below a preset threshold δ.Eventually, the solution parameters are updated in (p * , ζ * , θ * ) which yield the minimized BER upper bound P UB * b,HS using HS.Therefore, these HS parameters are capable of rendering a BER P * b,HS lower than the bound P UB * b,HS .Numerical evaluations reveal that the stopping criteria is mostly met in just one iteration.Interestingly, Step 5 and 6 in Algorithm 2 are interchangeable and need to be chosen carefully.For instance, PS demonstrates better performance Fig. 3: Different Asymmetric Signaling Designs at higher HWD levels so it is intuitive to design the HS by first PS and then GS in order to attain further gain over PS.Whereas, GS depicts lower BER at lower HWD levels so it is recommended to design HS by first GS and then PS in order to achieve better performance than GS using the added DoF offered by PS.HS can be implemented by choosing the transmit symbols for the translated and rotated signal constellation, i.e., v m = A (ζ * ) R (θ * ) x m .Furthermore, the symbols are transmitted according to the optimized p * where ζ * , θ * and p * are designed using Algorithm 2. Upon reception, they are detected using the MAP detector as presented in (36).

F. Illustrative Example
We present a comprehensive example to highlight the design of various distinct shapes for a fixed rate of 4bits/symbol.The black color is used for the reference constellation.The blue color depicts the possible transmission symbols whereas red symbols highlight the improbable transmission symbols.Fig. 3a presents uniformly distributed 16-QAM constellation with no-shaping.Fig. 3b illustrates geometrically shaped 16-QAM with parameters ζ = 0.5 and θ = π/2.The parallelogram envelop encloses equally prior QAM symbols.Next, we employ 32-QAM and design non-uniform probabilities as detailed in section IV.The red symbols highlight the symbols with negligible transmission probabilities whereas blue symbols have some notable transmission probabilities as depicted in Fig. 3c.The proposed algorithm tends to discard symbols with minimal transmission power to reduce the BER.One possible reason is that these symbols are mostly affected by the improper HWD owing to their comparable power/variance.Furthermore, this probabilistic shaped constellation undergoes GS to demonstrate hybrid shaped QAM constellation as shown in Fig. 3d.

VI. NUMERICAL RESULTS
Numerical evaluations of the adopted HWD system are carried out to study the drastic effects of hardware imperfections and the effectiveness of the mitigation strategies.The performance of the proposed PS and HS as a realization of asymmetric transmission is quantified with varying energy per bit per noise ratio (EbNo) and HWD levels.EbNo is obtained by normalizing SNR with the transmission rate.The derived error probability bounds and performance of the asymmetric transmission schemes are also validated using Monte-Carlo simulations.We compare the performance of conventional GS with the proposed PS.GS can be implemented by transmitting symbols from a reshaped constellation , where ζ * and θ * can be obtained by solving P2a given uniform prior distribution.Upon reception, they are detected using the ML detector which is the simplified form of optimal MAP detector (36) given uniform prior probabilities.This ML detector considers the reshaped constellation symbols v m as the reference to detect the received symbols.For most of the numerical evaluations we assume grey coded square QAM constellations of order M u = 8, i.e., R = log 2 (M u ), for no-shaping (NS) and GS as benchmarks.For PS and HS we employ M nu = 32-QAM with rate at least as high as that of GS, i.e., R ≥ log 2 (M u ).Moreover, we consider practical HWD values for the transmitter κ t = 0.01 and receiver κ r = 0.12.The pseudo-variances are derived from the κtI = κ t /4, κrI = κ r /4, and correlation coefficient ρ η = 0.9.Intuitively, AWGN channel assumes g = 1 and circularly symmetric Rayleigh fading channel is generated using λ = 1.Furthermore, the transmission EbNo is taken as 30 dB.The aforementioned values of the parameters are used throughout the numerical results, unless specified otherwise.
First, we evaluate the performance of various AS schemes for a range of EbNo from 0 dB to 50 dB in an AWGN channel as shown in Fig. 4. We employ M u -QAM for NS and GS whereas M nu -QAM for PS and HS.The BER performance improves with increasing EbNo till 30 dB and then undergoes saturation owing to the presence of HWD.Further increase in bit energy also results in an increase in the distortion variance, as the system experiences an error floor which can be deduced from (24).Evidently, the proper/symmetric QAM is suboptimal and the BER performance is significantly improved using AS.Conventional GS is not beneficial at lower EbNo values, but it significantly improves the performance for higher EbNo values pertaining to the increased symbol space [36].On the other hand, the proposed PS is capable of minimizing the BER for the entire range of EbNo.Substantial gains can be achieved by taking another step forward and employing HS.Therefore, we can safely conclude that the best performance can be achieved using PS for EbNo ≤ 15 dB and HS for EbNo For the same simulation settings, we analyze system throughput (correct bits/symbol) for a range of EbNo values where the lower bound on system throughput can be obtained as Fig. 5 depicts negligible throughput gain of GS over NS but noticeable throughput improvement using PS or HS.For instance, 1.5%, 6% and 7% percentage increase in throughput can be observed using GS, PS, and HS at EbNo = 5 dB.The throughput gain is quite substantial for lower EbNo values but undergoes saturation when EbNo ≥ 20 dB.Interestingly, PS/HS saturates at 3 bits/symbol following rate fairness constraint with negligible BER whereas other schemes saturate below 3 bits/symbol depicting significant BER even though the entropy of 8-QAM with uniform distribution is log 2 (8) = 3.
Next we analyze the behavior of various AS schemes with increasing distortion levels as depicted in Fig. 6.We assume 8-QAM for benchmark NS and traditional GS whereas 16-QAM for PS and HS.Derived bounds are in close accordance with the MC simulation especially for lower HWD levels.Obviously, the BER increases with increasing HWD levels and AS based systems achieve lower BER by efficiently mitigating the drastic HWD effects.Undoubtedly, the NS scheme suffers the most, but GS helps to decrease the BER to some extent.Further compensation can be achieved using the proposed PS and HS.Surprisingly, GS outperforms PS and HS at the lowest HWD values, e.g., κ = 0.11, in Fig. 6 but PS/HS maintain their superiority for κ ≥ 0.17.Interestingly, PS/HS are still capable of outperforming GS even for the lowest HWD levels pertaining to their rate adaptation capability and added DoF using 32-QAM as highlighted in Fig. 7.We can observe enhanced mitigation offered by the 32-QAM PS/HS as compared to the 16-QAM PS/HS due to the added DoF.For instance, we observe BER compensation of 66% and 77.5% using 32-QAM PS and HS, respectively, whereas BER compensation of 55% and 65% using 16-QAM PS and HS, respectively, at κ = 0.22 HWD level.
A similar analysis is undertaken to study the impact of increasing HWD on the system throughput.Fig. 8 compares the throughput performance of M u -QAM NS and GS with M nu1 = 16-QAM PS and HS as well as with M nu2 = 32-QAM PS and HS.System throughput decreases almost linearly with increasing HWD for all forms of signaling but with different slopes.NS demonstrates the steepest slope with increasing HWD and all the other AS schemes render gradual slopes.Quantitative analysis shows the slopes of −0.55, −0.41, −0.28, and −0.24 using NS, GS, 16-QAM PS/HS, and 32-QAM PS/HS, respectively, with increasing HWD.Therefore, PS and HS present the most favorable results as compared to the GS.Their performance can be even improved by increasing the modulation order.Another important observation is the Fig. 8: System throughput for a range of HWD levels at EbNo = 30 dB in an AWGN channel.
overlapping response of PS and HS especially for higher ordered QAM, which suffices PS and revokes the need of HS to perform even better.
Another simulation example depicts the performance of the discussed AS schemes with over a range of EbNo for two distinct scenarios of perfect receiver and perfect transmitter as presented in Fig. 9. Perfect receiver system as the name specifies includes ideal zero-distortion receiver but imperfect transmitter with κ t = 0.07 whereas perfect transmitter system involves ideal zero-distortion transmitter but imperfect receiver with κ r = 0.15.Note that the lower value of κ t relative to κ r is due to the fact that transmitters employ sensitive equipment to exhibit low distortions because the transmitter distortions are far more drastic than the receiver distortions.Interestingly, GS outperforms PS at EbNo > 15 dB for the perfect receiver case as opposed to EbNo < 15 dB where PS is still a better choice.HS outperforms both of them irrespective of the EbNo range classification.At such low HWD level, the BER percentage reduction of 81.82%, 90.91%, 94.55% is observed using PS, GS, and HS at 30 dB EbNo.Regarding the perfect transmitter scenario, GS and PS reverse the trend for higher EbNo level.Now the PS clearly outperforms GS for the entire range of EbNo and the HS marks its superiority over both of these schemes.At 0.15 HWD level, the EbNo gain of 8 dB, 12 dB, and 13 dB are estimated using GS, PS, and HS to attain the BER of 10 −2 .
Finally, the average (ergodic) BER performance of the adopted system with κ = 0.22 HWD level is evaluated over a Rayleigh fading channel for a range of EbNo values as given in Fig. 10.Evidently, the AS schemes preserve their BER trends and order.Clearly, average BER decreases with increasing EbNo and then undergoes saturation yielding an error floor.The derived BER bounds are also validated using MC simulations rendering a tighter bound for higher EbNo values.GS improves the average BER as compared to the NS scenario but PS and HS maintain their superior performance.Signaling schemes of GS, PS, and HS offer a percentage In a nutshell, we can conclude that the GS offers significant BER reduction at higher SNR values as opposed to the PS which offers universal gains.Moreover, the perks of HS are also prominent for higher SNR and higher M -ary modulation but depicts PS comparable performance at lower SNR values.Therefore, we recommend to employ HS given high SNR but resort to PS for lower SNR values to save additional computational expense.Additionally, GS is a better choice for slightly distorted systems whereas PS/HS are the optimal choice for moderate to severely distorted systems.Furthermore, we can achieve improved performance by employing higher-order QAM constellations for PS/HS given adequate resources.On the other hand, the throughput gains are eminent at considerably lower SNR values and higher distortion values.

VII. CONCLUSION
This work proposes probabilistic and hybrid shaping to realize asymmetric signaling in digital wireless communication systems suffering from improper HWD.Instinctively, all forms of asymmetric shaping are capable of decreasing the BER, and this performance gain improves with increasing SNR and/or increasing HWD levels with respect to NS.However, PS outperforms GS and performs equally well as HS.We can achieve more than 50% BER reduction with PS/HS over traditional GS.The perks of PS come at the cost of increased complexity in the design and decoding process.The HS scheme is capable of improving the system performance in terms of the BER as well as throughput.However, for less HWD levels and low EbNo, the benefits of HS over PS are limited while requiring additional complications in optimization, modulation, and detection procedures.Therefore, PS emerges as the best choice in the trade-off between enhanced performance and added complexity.

APPENDIX A STATISTICAL CHARACTERIZATION OF AGGREGATE NOISE
The superposed Gaussian distributions render the accumulative noise z ∼ CN (0, v, ṽ), where v = α|g| 2 κ + σ 2 w and ṽ = αg 2 κ.Exploiting the relation between the v, ṽ and the variances of Their inter relation enables us to evaluate σ 2 I , σ 2 Q , and r z I z Q from v and ṽ as Finally, ( 47)-( 49) allow us to find the correlation coefficient between z I and z Q as

APPENDIX B TRANSLATION WITHIN POWER BUDGET
In this appendix we present the proof of Remark 1.It is straightforward to prove that the translation v = Aw does not change the variance/power but only introduce asymmetry/improperness.Considering the transformation caused by the translation v = √ 1 + ζw I + i √ 1 − ζw Q , the power/variance is given by  Using the symmetric nature of r.v.w i.e., σ 2 w I = σ 2 w Q , it is clear that σ 2 v = σ 2 w .On the other hand, the pseudo-variance can be calculated as ) Again, the symmetry implies E {w I w Q } = 0. Thus, the circularity coefficient can be derived from ( 52 x 2 mI .Moreover, the non-zero pseudo-variance is given by x mI x mQ .

APPENDIX C KKT CONDITIONS
The convex non-linear constraint problem P1a can be efficiently solved using the first order necessary KKT conditions.We begin by writing the Lagrangian function L as L (p, λ 1 , λ 2 , λ 3 ) = PUB where the Lagrange multipliers are λ 1 , λ 2 , λ 3 ≥ 0. Next.we evaluate the gradient of the (55) with respect to the optimization variables in p where the partial derivative of L with respect to p m is given by Suppose that there is a local solution p * of P1a and the objective function PUB b p, p (k) along with the constraints (25b) and (25c) are continuously differentiable.Then, there exists a Lagrange multiplier vector λ * , with components λ i , where i ∈ (1, 2, 3), such that the necessary first order KKT conditions (as presented in Table I) are satisfied at (p * , λ * ).Interestingly, the KKT conditions are satisfied with H(p * ) = log 2 (M u ) .

APPENDIX D GRADIENT FOR OPTIMIZATION
The gradient of the upper bound on BER w.r.t GS parameters is given as where ∆ mn is the common part in both partial derivatives.

1 −
) i.e., |σ 2 v |/σ 2 v = ζ.The same concept can be extended to the symmetric discrete constellations with uniform prior probabilities.Considering the transformation caused by the translation v m = 1 + ζx mI + i √ ζx mQ , the power of the transformed constellation is given by

TABLE I :
First Order Necessary KKT Conditions