Supporting Probabilistic Constellation Shaping in 5G-NR Evolution

It is known that probabilistic constellation shaping (PCS) can provide a shaping-gain of 1.53dB asymptotically as signal-to-noise (SNR) increases. This is however, under ideal assumptions that the system operates in optimal sense and can achieve the Shannon capacity. In practice, the system can operate well below the capacity. In this paper, we propose a PCS-transceiver for the fifth-generation new-radio (5G-NR) evolution that supports quadrature-amplitude-modulation (QAM) constellations shaped with PCS, namely, PCS-QAM, in comparison to conventional uniform QAM constellations, namely, Uniform-QAM. We put a special interest in the throughput achieved under a stringent block-error rate (BLER) constraint, and validate the effectiveness of PCS under practical detection and decoding algorithms. We also analyze the properties of power-gain, entropy-loss, and peak-to-average power-ratio (PAPR), in connections to PCS. As a lower BLER does not necessarily indicate a higher throughput due to entropy-loss from PCS, we further prove that a necessary and sufficient condition for PCS-QAM to outperform Uniform-QAM in throughput is that the normalized entropy-loss with PCS-QAM is less than the BLER obtained with Uniform-QAM. Furthermore, we demonstrate that the PCS-transceiver is flexible in rate-adaptation without impacting the encoder, and can yield a better throughput-envelope in 5G-NR system.


I. INTRODUCTION
W IRELESS communication systems including the fifth- generation new-radio (5G-NR) have now evolved into an era of supporting super quadrature-amplitude-modulation (super-QAM) such as 1024QAM and 4096QAM.This is to further increase the spectral efficiency (SE) to approach Shannon capacity in high signal-to-noise (SNR) regime [1].However, it is known that attaining the capacity requires Gaussian constellation [2].The achievable rates with Uniform-QAM Fig. 1.A PCS-64QAM example, where the transmission-probabilities of four amplitudes {1, 3, 5, 7} (non-normalized) in both real and imaginary dimensions are equal to {0.5, 0.25, 0.17, 0.08}, respectively.constellations 1 asymptotically approach a straight line parallel to the capacity-bound under additive white Gaussian noise (AWGN) channel, and renders an SNR-loss of 1.53dB [3], or equivalently, a rate-loss of 0.51bit/s/Hz.From lattice-theory, it is the loss in shaping-gain from a sphere Voronoi-region based constellation to a cubic one, as the dimension goes to infinite [4].On the other hand, the loss can also be seen as the power-gain between a Gaussian density and a uniform density that achieve the same differential entropy [5].The idea of using Gaussian-like constellation can be traced back to G. Forney [3], [4], [5], and different shaping techniques have been studied under various names including probabilistic constellation shaping (PCS), probabilistic amplitude shaping (PAS), and geometrical constellation shaping (GCS).In general, PCS [6], [7], [8], [9] reuses QAM constellations, but the transmission-probabilities of constellation symbols are shaped with a Gaussian probability mass function (pmf).A PCS-64QAM example is shown in Fig. 1.As seen, the transmission-probability decreases as the amplitude of a constellation symbol increases.GCS [10], [11], [12], [13], on the other hand, keeps the transmission-probabilities uniform, but the geometrical locations of constellation symbols are shaped to be Gaussian.A widely considered shaping technique that has been commercialized in optical and fiber systems [14], [15], [16] is PAS, which is a special case of PCS.The sign bits of QAM symbols with PAS are generated with the paritybits from an encoder, which puts stringent constraints on supported code-rates that are determined by the modulationorders.Other shaping methods include modifying the encoder to produce coded bits obeying a biased distribution [18], shaping information-bits rather than symbols [19], reverse concatenation [40], [41], designing variable-length prefix-free codes [42] that are optimal in minimizing the energy per coded-symbol, sparse-dense transmission [43], [44], [45], and more.Nevertheless, those approaches in essence can work the same as PCS, but they share the common disadvantage of requiring significant changes in order to cooperate with the existing 5G-NR transceiver.
In this paper, we propose a PCS-transceiver design that is fully compatible with the 5G-NR and supports both Uniform-QAM and PCS-QAM constellations, as illustrated in Fig. 2. It keeps the transceiver chain unaltered, and the overheads are to add a PCS modulator and a corresponding demodulator at the transmitter and the receiver, respectively.At the transmitter, PCS modulator maps the information-bits into complex-valued PCS-QAM symbols, whose amplitudes of the real and imaginary components obey the same predetermined Gaussian pmf.Such a shaping can be achieved by a standard constant-composition distribution-matcher (CCDM) from the state-of-the-art designs [20], [21], [22].Afterwards, these symbols are demapped into binary bits and sent to the encoder, and the remaining processes are the same as those in 5G-NR.At the receiver, the PCS demodulator implements an inverse DM operation to decode the PCS-QAM symbols after the decoder.Such a design yields only small changes to the 5G-NR.
With the PCS-transceiver proposal, other than bit-error-rate (BER) or theoretical rates [29], we put a special interest in block-error rate (BLER) and the final throughput, which are the most relevant merits for a practical system of 5G-NR.Conventionally, the analysis of PCS is mostly focused on theoretical shaping-gains, which assumes a system that operates close to the capacity with optimal coding and decoding in Shannon sense.In practice, the system can operate far away from the capacity-bound [28], especially with a multi-input multi-output (MIMO) transmission under a fading channel, and further with a finite code-length [38], [39], [48].To be able to make fair comparisons between PCS-QAM and Uniform-QAM with practical systems, we generalized the shaping-gain to be: "Either the SNR-gain with a PCS-QAM when it attains the same throughput as a Uniform-QAM; or the rate-increment with a PCS-QAM compared to a Uniform-QAM at the same SNR.While in both cases the BLERs of PCS-QAM and Uniform-QAM shall below the same constraint." With such a generalized definition, the shaping-gain can be unbounded.However, it bears operational meaning in the sense that when the BLER constraint is stringent and the transceiver is close to ideal, the shaping-gain approaches its conventional meaning.In 5G-NR, the BLER constraint is set to 10% for enhanced mobile broadband (eMBB), and 10 −2 or lower for ultra-reliable low-latency communications (URLLC) applications.With a stringent BLER constraint of a few percent, the generalized shaping-gain is meaningful in the sense that PCS can either reduce the required SNR to achieve the same throughput, or push the throughput further at a given SNR.In general, we show that PCS brings power-gains that decrease BLER, but it also reduces source-entropy that can decrease throughput.The interplay between them determines the generalized shaping-gain that can be harvested.
In summary, the main contributions of this paper are: • We propose a PCS-transceiver design that is compatible with 5G-NR, and identify the twofold impacts from PCS: power-gain and entropy-loss.The combined effect determines the harvested gains in throughput.Previous studies did not reveal the fact that when measuring throughput, the entropy-loss due to PCS can be significant.• Properties of power-gain, entropy-loss, and PAPRincrement, in connection to PCS are elaborated.Through theoretical and numerical validations, we show that PCS yields affordable PAPR-increments and entropy-losses, while promising power-gains can be obtained.• We derive a necessary and sufficient condition for PCS-QAM to yield a higher throughput than Uniform-QAM, and show an important insight that the normalized entropy-loss with PCS cannot be higher than the BLER attained with Uniform-QAM.• We further demonstrate that the PCS-transceiver generates smooth rate-adaptations and yields higher Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Fig. 3.The PCS modulator generates N complex-valued symbols from 4M2 -QAM with 2(K+N ) input bits, and the shaping is on M positive amplitudes.
throughputs when SNR changes.A rate-increment of 0.2bits per layer and per channel use (bits/layer/chan.use 2 ) is observed with PCS-256QAM and 2 × 2 MIMO under Rayleigh-fading channels, while the BLER is below 2%.This corresponds to a shaping-gain of 0.6dB, and when achieving the same throughput as Uniform-QAM while BLER is below 10%, the SNR-gain can be up to 1dB.Notation: Throughout the paper, 'ln' denotes the natural logarithm and 'log K ' is the logarithm to a base K.The floor operator is '⌊•⌋', and an identity matrix is I whose sizes can be understood from the context.A variable α follows a complex Gaussian distribution with a zero-mean and a covariance matrix σ 2 I is denoted as α ∼ CN (0, σ 2 I).Further, H is the entropy operator, and E a takes the expectation over a random variable a.The Hermitian conjugate and inverse of a matrix H are denoted as H † and H −1 , respectively.

II. THE PROPOSED PCS-TRANSCEIVER DESIGN
The proposed PCS-transceiver cooperating with 5G-NR is depicted in Fig. 2, which generates complex-valued PCS-QAM symbols whose transmission-probabilities follow a Gaussian pmf.Without loss of generality (WLOG), throughout the paper we assume that the shaping is implemented on M positive amplitudes from a pulse-amplitude-modulation (PAM).Further, the pmf is symmetric for both positive and negative amplitudes, and also real and imaginary parts of the constellation symbols.Moreover, we let the integer M be a power of 2, such that the number of bits Q carried by each symbol from a 4M 2 -Uniform-QAM is an integer and equals The M (≥ 2) positive amplitudes from 2M -PAM are set to and the average-power is 1 A. The PCS-Transceiver for 5G-NR and Its Evolution Compared to an existing 5G-NR system that only supports Uniform-QAM, a PCS modulator is added at the transmitter prior to the encoder as in Fig. 2, whose operations are further elaborated in Fig. 3.As can be seen, 2(K +N ) informationbits are split into four sequences, comprising two vectors of length K, and another two of length N (N ≤ K).Those two vectors of length N are directly mapped into sign bits.While the two vectors of length K are processed by the DM, and each one is mapped into a vector of length N whose entries are from M positive amplitudes {A The two vectors with sign bits are then combined with these two vectors of positive amplitudes, and this results in two vectors of length N comprising symbols from a 2M -PAM modulation The pmf is symmetric with p(A m ) = p(−A m ).Finally, these two real vectors are combined together as the real and imaginary parts of N complex-valued symbols that form a 4M 2 -PCS-QAM constellation.
Afterwards, the demapper transforms the N symbols into N Q bits following a conventional demapping [1], and these bits are then sent to the encoder.Note that the PCS modulator has changed the transmission-probabilities of the QAM constellation, and it yields a lower average-power compared to Uniform-QAM.At the receiver, after successful decoding, the decoded bits are mapped back into a vector of N PCS-QAM symbols, which are further decoded into the original 2(K+N ) information-bits via an inverse DM operation.With the PCS-transceiver in Fig. 3, each PCS-QAM symbol carries 2(1 + K/N ) bits.Compared to Q bits carried by a Uniform-QAM symbol, it renders a loss in source-entropy since K/N ≤ log 2 M due to PCS.Further, when the two real 2M -PAM vectors are combined into PCS-QAM symbols, a power-normalization by a factor of 1/2 is applied such that the average-power of PCS-QAM symbol is one.

B. Shaping With CCDM
WLOG, we assume a CCDM [22] is applied for shaping, which maps K input bits into N positive amplitudes that obey a predetermined pmf p = (p 1 , p 2 , • • • , p M ), where Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.p m = p(A m ) denotes the transmission-probability of the amplitude A m .Then, the occurrence of A m in an output vector of length N is N m = N p m , with M m=1 N m = N .As N m must be an integer, small adjustments on p can be applied based on principles such as minimizing the Kullback-Leibler (KL) divergence D KL (p|q) = M m=1 p m ln pm qm , between the original pmf p and its revised version q such that N m = N q m are all integers [25].In simpler manners, heuristic approaches can also be used to adjust p when impacts are small.Considering all vectors of length N with each A m appearing exactly N m times, the total number of these vectors is Hence, the CCDM can maximally map K = ⌊log 2 S⌋ bits and form a one-to-one mapping between the input and output sequences.
Example 2: Letting N = 24 and assuming the amplitudes {1, 3, 5, 7} appear {12, 6, 4, 2} times, respectively, in total the number of vectors satisfying this equals S = Note that in order to reduce the complexity of DM, a long input sequence can be split into a number of shorter subsequences, and pass through a DM sequentially or through multiple DMs in parallel, as shown in Fig. 3.This may yield a slight degradation of DM efficiency.

C. Practical Issues of Implementing PCS in 5G-NR
There are some practical issues to consider when cooperating PCS in 5G-NR including that the scrambling and interleaving operations should be accommodated accordingly, which is not an issue since those operations are clearly defined in [23].Further, the PCS information for decoding PCS-QAM needs to be signaled to the receiver.It should also be noted that we consider PCS directly in the physical layer, not in higher layers.Potentially it is also possible to apply PCS in higher layers, but it then requires more changes and also becomes more sophisticated due to operations including scrambling, interleaving, and headers-adding at different layers before down to the psychical layer.Nevertheless, applying PCS in physical layer is sufficient and most convenient as we see it, which directly affects the transmission and receiving schemes including the shaping and modulation, transmit-power adjustment, and decoding success-rate.
Another potential issue is that according to the uniform parity-bit assumption [24], parity-bits from an Low-densityparity-check (LDPC) encoder are close to uniformly distributed.It is not a problem for PAS, but can cause some disturbances to the pmf with PCS.Hence, the proposed PCS is most effective for medium and high code-rates, and as we validate later with simulation results, when the LDPC coderate is around c = 2/3 or higher, the impacts are small.
Other issues include that the first two columns of information-bits sent to the LDPC encoder are punctured before transmission in the 5G-NR encoding.This however, similar to the issue with parity-bits, will not affect the effectiveness of PCS.As long as the majority of information-bits are preserved, the PCS is still valid.Similar impacts may come from hybrid automatic repeat request (HARQ) process when the bits in re-transmissions are selected from a circular buffer comprising all coded bits.However, when BLER is constrained to be only a few percents or almost zero as in URLLC, the re-transmission is rare.

III. PROPERTIES OF THE PROPOSED PCS-TRANSCEIVER A. The Optimal Shaping Distribution
Applying the pmf p to shape the M amplitudes, the entropy H (in bits) with the PCS-QAM constellation equals with Note that when N is sufficiently large, the ratio K/N between the input and output lengths of CCDM can approach H [26].As an example, it holds with Example 2 that p = {0.5, 0.25, 0.17, 0.08} and H=1.73bits, and the number of bits carried by each PCS-64QAM symbol is 2(1 + H) = 5.46bits with the proposed PCS scheme in Fig. 3, which is less than 6bits with Uniform-64QAM.
On the other hand, the average-power with the PCS-QAM equals From the uniform parity-bit assumption, after encoding with a code-rate c, the average-power is With Uniform-QAM, E PCS = E = 1 and H is maximized, while with PCS-QAM, both E and H are decreased.As an extreme case with p 1 = 1 and p m = 0 for m > 1, E PCS is minimized to A 2 1 and H = 0. Hence, there is a trade-off between entropy-loss and power-gain with PCS.
It is well-known that Maxwell-Boltzmann distribution is optimal for shaping [51], which can be directly shown by solving the optimization p = arg max(H − λ(E−1)) under the power-constraint E ≤ 1, with λ being the Lagrange multiplier.In other words, the optimal pmf reads where The optimal PCS parameter ν, can be solved for optimizing theoretical and achievable rates, or more practically, BLER and throughput.An observation is that the optimal p m also depends on c, and when c changes, ν shall be adjusted accordingly.This is different from PAS, where c can be regarded as fixed to one due to its shaping structure.However, with PCS, c plays roles in the properties of power-gain, entropy-loss, and PAPRincrement, which are elaborated in the next.
Note that with a large ν, p m corresponding to some large amplitudes can be zero.In those cases, the entropy H can also become smaller than a lower-order Uniform-QAM.Hence, the PCS can be applied to a lower-order constellation which yields the same H, and the difference can be marginal [14].Due to this fact and for simplicity, in the remaining we always assume a general case that p m > 0 for all amplitudes.

B. The Power-Gain With PCS
From ( 2) and ( 6), E PCS can be rewritten as a weighted-sum of exponential functions as .
To study its properties, we derive a closed-form expression of E PCS as M → ∞, which is stated in Property 1 and with a proof in Appendix A.
Property 1: The average-power E PCS monotonically decreases in ν until it saturates to the minimum A 2 1 .Further, as M → ∞, it converges to Note that the closed-form expression ( 12) is independent from M , and later numerical results show that the weightedsum in (11) converges fast in M and the approximation is quite accurate when M ≥ 4. To analyze the changingslope of E PCS with respect to (w.r.t.) ν, we further derive asymptotic approximations of E PCS in Corollary 1, with a proof in Appendix B.
Corollary 1: When ν is small, E PCS can be approximated by its Taylor expansion as While when ν is large, it holds that From Corollary 1, it is readily seen that the first-order derivative of and it eventually decreases to zero.Hence, E PCS decreases at a much slower-slope as ν becomes large.This provides some hints on designing the PCS properly to yield a fast decreasing in E PCS and harvest a high power-gain.Except for reducing E PCS , PCS also causes entropy-loss and PAPR-increment, which need to be jointly considered when designing ν.These are discussed in the next based on the results of E PCS .

C. The Entropy-Loss With PCS
Although it was shown in literature that PCS brings shapinggains, to out best knowledge, the fact that PCS also endures losses in source-entropy is usually overlooked.This however, is important when analyzing the final throughput.The entropy H with 4M 2 -PCS-QAM satisfies H ≤ log 2 M , and the maximum is attained with 4M 2 -Uniform-QAM.This can be directly verified by optimizing H under the constraint M m=1 p m = 1, and this entropy-loss is due to two facts: The first one is that the average-power E PCS with a Gaussian pmf is less than that with a Uniform-QAM due to shaping; while the second one is that with PCS the Gaussian distribution is truncated and has the same finite support (0, √ 3) as Uniform-QAM when M → ∞.The trade-off between E PCS and H is presented in Property 2, with a proof in Appendix C.
Property 2: With the optimal pmf in (8) and assuming p M > 0, the entropy with PCS-QAM is and the maximum is attained with ν = 0. Further, it holds that From Property 2, the decreasing-slope of ∂H ∂ ν scales down by a factor of ν ln2 compared to that of E PCS .Further, from (15) and Corollary 1, we have Corollary 2 with a proof in Appendix D.
Corollary 2: As M increases, the entropy H can be approximated as Different from E PCS , the approximation of H depends on M via β, and this is because H increases as M increases and it never saturates.

D. The PAPR With PCS
Another potential disadvantage with PCS is the PAPR that may decrease the power-amplifier efficiency [33].This is so since E PCS is decreased, but the maximal power A 2 M remains the same.In 5G-NR, there are two different transmission schemes, namely, orthogonal-frequency-division-multiplexing (OFDM) in downlink (DL) transmission and discrete-Fouriertransform spread OFDM (DFT-s-OFDM) in uplink (UL) transmission.OFDM applies a multi-carrier transmission and is known to suffer from high PAPR.While DFT-s-OFDM is a single-carrier system with a much better PAPR compared to OFDM.It is shown in [37] that PAS has almost no impact on the PAPR of OFDM.Therefore, we consider the PAPR (in dB) for the DFT-s-OFDM system, which is defined as the ration between the maximal power to the average-power as The PAPR-increment with PCS is presented in Property 3, which can be directly verified.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Property 3: The PAPR with the PCS increases at a slope of From Property 1 and Property 3, the derivative when ν is large.

IV. COMPARING THE PCS TO GCS AND PAS
With the PCS design elaborated, next we show connections and differences to another two common shaping techniques, GCS and PAS, mentioned earlier.It can become clearer after these comparisons that the proposed PCS scheme is more suitable for 5G-NR and its evolution.

A. The Connections to GCS
In fact, PCS and GCS share the same rationale behind to approximate an average-power E of a continuous Gaussian constellation.To see this, we take one-dimensional constellation as an example.It is known that a zero-mean Gaussian probability distribution function (pdf) p(x) minimizes E for attaining a given entropy.However, a continuous constellation x is infeasible in practice, and a discrete sampling is applied.With PCS, it is achieved by sampling x ∈ (−∞, ∞) uniformly with discrete constellation points ±A m such that where p m = p(±A m ) and ∆ x is the size of sampling-interval, and pm = 2∆ x p m is the normalized pmf.Such a discretizing approximates the Riemann integral in (20).
On the other hand, GCS approximates E from the perspective of Lebesgue integral [31].For instance, the GCS in [10] splits the region (0, ∞) into M subsets such that the integrals of p(x) over them are equal, with where a 1 = 0, a M +1 = ∞, and the other boundary points a m are chosen for (21) to hold.The constellation points are then set to ±A m , with A m being the centroid of the mth subset as It was shown in [10] that and the equality holds as M → ∞ and when GCS approaches the Shannon capacity.Comparing ( 20) and ( 23), the difference is clear that with PCS the transmission-probabilities of constellation symbols are Gaussian but the geometric locations are uniform, while with GCS the geometric locations of constellation symbols are Gaussian but the transmission-probabilities are uniform.A GCS-64QAM example compared to Uniform-64QAM is depicted in Fig. 4. The advantage with GCS is that no DM is needed, but the main drawbacks are the compatibility to 5G-NR and the slower convergence in M compared to PCS [10].

B. The Differences to PAS
With the PAS, a systematic encoder is applied with a fixed code-rate to generate sign bits for the shaped amplitudes.Hence, the average-power of PAS-QAM symbols is E = E PCS , which is smaller than E = c(E PCS − 1) + 1 with the PCS.In other words, a higher power-gain from PAS.However, the effective transmission-rate with PAS is less than that with the PCS, which is shown in Lemma 1.
Lemma 1: The effective rate (bits/s/Hz) with the PAS is 2H, which is smaller than the effective rate 2c(1+H) with the PCS when c equals the same as in (24).
Proof: The PAS transfers 2K input information bits into N symbols, and with the OFDM system in 5G-NR [1], [23], [49], the transmission rate is 1 symbol/s/Hz.Hence, the effective rate with PAS equals Similarly, the PCS transfers 2(K +N ) input information bits into N symbols, and afterwards an encoder with a code-rate c is applied.Hence, the combined effective rate is In both cases, the approximations hold due to K/N → H when N is large.Further, it can be shown that r 1 ≤ r 2 with c given Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
in (24), since and the equality holds only when H = log 2 M , i.e., there is no shaping.In summary, PAS and PCS yield different trade-offs between power-gain and entropy-loss.Compared to PAS, from Property 1 and 2, the decreasing-slopes of entropy and average-power with PCS are slowed down by a factor of c under the same ν, since While from Property 3 the increasing-slope of PAPR is lowered down by c 2 when ν is small.Except for these differences, another difference is the harvested coding-gain.The gain is smaller with PAS because the parity-bits are directly mapped as sign bits, which are better protected than the informationbits.With PCS, the parity-bits and information-bits are equally protected.The received quality of information-bits can be more critical in decodings of high code-rates.

V. APPLYING THE PCS-TRANSCEIVER IN THE 5G-NR
Next we analyze the benefits of applying the PCStransceiver in 5G-NR, by considering a general MIMO transmission with a received signal model where H is the MIMO channel of sizes L×L, and the noise n ∼ CN (0, σ 2 I).Each symbol-vector s is mapped from a bit- comprising LQ coded bits.The power-scaling factor 1 √ E is used to normalize the transmitpower with PCS-QAM, and equals one with Uniform-QAM.

A. Shaping-Gains Under MIMO and Fading Channels
For a 5G-NR system, the maximal achievable rate is limited by the bit-interleaved coded-modulation (BICM) capacity [46], [47], which is more practical than Shannon capacity.BICM is typically computed with Uniform-QAM, but in this section we generalize it to PCS-QAM.Denoting X as the set of all possible transmission vectors of s after the encoder, and X (b m ) (b m = 0 or 1) as the set of all vectors from X with the mth bit (1 ≤ m ≤ LQ) equal to b m , the normalized BICM capacity (bits/layer/chan.use), when H is known, equals The entropy H(s) equals where the prior probability p(s) is known from the PCS by taking into account the impacts from parity-bits.It can be calculated as where p(s ℓ ) is the probability of symbol s ℓ on the ℓth layer, drawn from PCS-QAM.In (30), Λ m is the log-likelihood ratio (LLR) of the mth bit, which can be approximated with a maxlog-map (MLM) detection [36] as To guarantee an error-free transmission, the transmissionrate must be less than the BICM capacity, i.e., 2c(1 + H) < R BICM .Since both H and R BICM depend on ν, it can be designed as When H is unknown, R BICM in (34) can be replaced by the ergodic capacity E H [R BICM ].However, attaining R BICM also requires an optimal coding and decoding, and in practice the achievable rate is less.Further, the back-off rate ϵ ≥ 0 is assigned to meet a certain BLER constraint, while the BLER also depends on the practical detection and decoding capabilities of a considered system.Hence, in general the optimization (34) needs to be customized for different systems and setups.Nevertheless, it has been validated in [46] and [47] that an optimal ν with PAS can be fixed for a wide range of SNR values, and the shaping-gain harvested by maximizing R BICM under a single-input single output (SISO) and AWGN channel is 0.8dB for 64QAM.Further, with practical codes, it is no less than 0.43dB [14].Even though, the metrics considered in [14], [46], and [47] are still the rates, not throughput.Further, when the sizes of MIMO become large, the MLM detector can yield prohibitive complexity, and the suboptimal linear minimum-mean-square-error (LMMSE) detector [36] can be applied to compute LLR per layer, and the detection model ( 29) is simplified as where Λ is a diagonal matrix comprising the main diagonal from H † (HH † + σ 2 EI) −1 H, and ñ comprises both the interference and noise.Denoting ỹℓ and ñℓ as the ℓth entry of ỹ and ñ, and λ ℓ as the ℓth diagonal-element of Λ, respectively.The noise ñℓ can be modeled as zero-mean with a variance Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.ℓth layer (1 ≤ ℓ ≤ L) can be computed as where X now denotes the set of all constellation symbols on a single layer, and X (b k,ℓ ) bears similar meaning as X (b m ).

B. Rate-Adaptability With the PCS in 5G-NR
Shaping-gain is one advantage of PCS, while the other advantage is its capability of rate-adapting.In current 5G-NR, the modulation and coding scheme (MCS) table that supports up to 1024QAM and a code-rate c = 0.95 is shown in Fig. 5.The largest rate-gap between two adjacent MCS indexes is 0.47bits/s/Hz, which corresponds to an SNR-gap of 1.4dB in high SNR regime computed from the capacity formula.When transmitting with MIMO and under fading channels, the SNR-gaps between two adjacent indexes can be significantly enlarged.For URLLC, the BLER constraint is stringent.If a high MCS index cannot meet the BLER constraint, it will degrade to a lower MCS which can yield significant SNRlosses.This is essentially due to that the granularity of MCS in the existing 5G-NR is not dense enough for transmissions under a stringent BLER that is less than 10%, and in particular with harsh transmission conditions including large MIMO, super-QAM, and fading channels, where detection and decoding algorithms can be suboptimal.
This issue, however, can be resolved with the PCS.As it can be seen from ( 30) that by changing ν, it reduces the entropy H smoothly, and meanwhile can make the conditional entropy H(s|y) arbitrarily small to meet a BLER constraint under a given SNR and channel condition.When H(s|y) = 0, there is no errors in transmission and the rate achieved is 2c(1+H).This can also be seen as adapting the effective rate r 2 in Lemma 1.Without PCS it holds that r 2 = cQ.Otherwise, by specifying ν, r 2 can be adapted accordingly.This yields a smooth rate-adaptation and can even create a continuous MCS scheme, which is essential for URLLC applications in 5G-NR.Moreover, it is also beneficial for systems that need to puncture a mother-code for obtaining different code-rates, which can lead to significant losses in coding-gains [30].Hence, rate-adaptability with PCS is a useful feature for 5G-NR evolution [34].

C. When Can PCS Provide Gains in Throughput?
With the shaping-gain and rate-adaptability from the PCS introduced, next we answer a practical question: When can PCS provide gains in throughput?Although a reduced BLER improves the transmission latency and reliability, which are both important for URLLC [35], it however, does not necessarily mean a higher throughput.This is essentially because the entropy H is reduced.This fact, to our best knowledge, has not been elaborated in literature, but is critical for applying PCS in practice.
Denoting the normalized entropy-loss δ H with the PCS as The normalized throughput (bits/layer/chan.use) attained with 4M 2 -Uniform-QAM and the BLER P b is cQ(1−P b ).While with PCS-QAM (of any modulation-order) and a BLER Pb , the achieved normalized throughput is 2c(1+H)(1 − Pb ).In order to have it is equivalent to have Since 0 ≤ P b , Pb ≤ 1, when Pb ̸ = 1 the condition (39) becomes This means that the BLER attained with Uniform-QAM cannot be lower than δ H , i.e., the entropy-loss with the PCS-QAM, for it to yield a higher throughput.On the other hand, when using a high-order modulation with the PCS, it can hold that 2(1+H) ≥ Q, which yields δ H ≤ 0 and (40) always hold.In this case, from (39) the BLER with PCS-QAM shall satisfy These results are summarized in Property 4. Property 4: Assuming the BLER of transmissions with a Uniform-QAM is P b , for PCS-QAM to yield a higher throughput than the Uniform-QAM, the BLER of transmissions with the PCS-QAM must satisfy (41).Further, the normalized entropy-loss δ H must be less than P b .
It is worth-noting that (41) provides a necessary and sufficient condition for PCS-QAM to outperform Uniform-QAM in throughput, and (40) puts a stringent constraint on the entropy-loss that can be tolerated with the PCS.When Pb = 0, i.e., the transmissions with PCS-QAM have no errors, the condition (40) also becomes sufficient.When Pb ≥ P b , the PCS cannot bring any gains in throughput if δ H ≥ 0. For instance, compared to Uniform-64QAM, with PCS-64QAM the normalized entropy-loss is 9% in Example 2. That indicates that if the BLER is less than 9% with Uniform-64QAM, the final throughput can never get higher with PCS-64QAM.Nevertheless, in most cases of interested PCS-QAM can bring gains in both BLER and throughput.

A. Simulation Setups
In this section we present numerical results to validate our analysis.With the MIMO signal model (29), the L×L channel H is modeled as identically and independently distributed (i.i.d.) Rayleigh-fading, obeying CN 0, 1 L I , and the SNR is defined as 1/σ 2 .A 5G-NR LDPC encoder is applied [17], [23] with an input length 1024, and the decoder uses the minsum decoding algorithm [27].The PCS-QAM symbols s are generated with the scheme in Fig. 2, and for both PCS and PAS we assume ideal CCDM and inverse operations [22].The output length N of CCDM is set to 160 for SISO-AWGN (L = 1) channel in Fig. 8-9, and 96 for MIMO channel in Fig. 10-14 on each layer that CCDM operates on separately.We apply a heuristic method to adjust N m , that sets N m = ⌊N p m ⌋ firstly, and afterwards the difference N − M m=1 N m is evenly assigned to the last N − M m=1 N m amplitudes that satisfies N m > 0. The performance differences compared to an optimal adjustment are marginal.Through all simulations we assume a perfect channel knowledge, and when there are errors with channel estimates, the results are similar by regarding those errors as additional noises.

B. Power-Gain, Entropy-Loss, and PAPR-Increment
The relationships between E PCS and ν = ν/c are shown in Fig. 6 for 64QAM, 256QAM, and 1024QAM, i.e., M = 4, 8, and 16, with different code-rates, respectively.The amplitudes are shaped with the pmf in (8).As seen, the approximations (12)   The entropy-losses and PAPR-increments are shown in Fig. 7.An observation is that ν should not be too large to prevent significant entropy-losses and PAPR-increments.When ν is too large, then the PCS becomes equivalent to apply a lower-order modulation, and the PAPR can be much worse.On the other hand, ν shall not be too small, otherwise the power-gain is limited.A reasonable value of ν seen from these figures is around 0.5 for the same modulation-order of Uniform-QAM.Such a design is aligned with the optimal pmfs found via numerical optimizations in [7], and the PAPRincrement is around 1dB in this case and can be affordable.
It can also be seen from Fig. 7 that the approximation of H in ( 12) is also quite accurate.Further, with setting N = 80 the entropy H is close to the case with N = ∞.Hence, the complexity of CCDM can also be acceptable.Moreover, since with PAS it is equivalent to set c = 1, for the same ν, the decreasing of E PCS and H, and also the increasing of PAPRincrement are much faster than those with PCS, which is well aligned with the analysis in Property 1-3.

C. BLER and Throughput Under SISO-AWGN Channel
In Fig. 8 and 9, we compare BLER and throughput of PAS-64QAM to PCS-64QAM under SISO-AWGN channel.With setting c = 2/3, the maximum transmission rate is 4bits/layer/chan.use.The MLM detector in (33) is applied for detection.As it can be seen in both cases that when ν increases the BLERs are significantly reduced.The gains come from the fact that the average-power E is reduced with PCS, and the transmit-power can be increased proportionally by a factor of 1/E.However, since H also decreases as ν increases, both PCS or PAS can actually render losses in throughput, even though BLERs are reduced.By setting ν to 0.4, 0.8, and 1.2, the normalized entropy-losses δ H with PCS-64QAM are 2.51%, 8.13%, and 14.46%, respectively.From Fig. 8 and according to Property 4, PCS-64QAM can only yield a higher throughput when the BLER with ν = 0 is larger than these losses, which are corresponding to SNR=15.8dB,15.6dB, and 15.4dB, respectively.These SNR values are well aligned with those cross-points of BLER shown in Fig. 9.Note that PCS and PAS are two different designs, and it is difficult to judge which one is superior to the other.Nevertheless, from those results we can see that PCS has advantages at least from three different aspects.The first one is that there is no change to the encoder of 5G-NR.The second one is that when there is no shaping with ν = 0, PCS outperforms PAS due to a higher coding-gain as seen in Fig. 8.The last one is that with the same setting of ν > 0, it can be seen from both figures that the changes in BLER and throughput with PAS are more abrupt.Nevertheless, since the system is operating below the capacity, we examine the generalized shaping-gain with the definition introduced before.With 10% BLER that corresponds to a throughput 3.6bits/layer/chan.use, it can be seen in Fig. 9 that PCS-64QAM with ν = 0.8 attains it at SNR=14.5dB, while Uniform-64QAM requires SNR=15.5dB.This corresponds to 1dB SNR-gain.When the BLER constraint is lowered to 5%, PCS-64QAM with ν = 0.4 shows 0.5dB SNR-gain when achieving the same throughput as Uniform-64QAM.Similar gains can be observed for PAS-64QAM, but it can also be seen that throughputs with PAS are worse than PCS in the region where the throughput is close to saturation due to higher entropy-losses.

D. BLER and Throughput Under MIMO and Rayleigh-Fading Channels
Although the SNR-gains are promising in Fig. 9, an argument would be that when BLER is high, the system can lower the MCS index to achieve a better throughput.This is true, however as explained, the number of MCS indexes supported in 5G-NR is limited due to the signaling overheads and adaptability of encoders.PCS can still improve in the sense of mitigating a large SNR-gap between two adjacent indexes.In Fig. 10 and 11, we simulate 2 × 2 MIMO under Rayleigh-fading channels with PCS-256QAM, and apply the MLM detector (33).As it can be seen in both figures that the observations are the same as those made in Fig. 8 and 9.However, here we put a focus on comparing two adjacent MCS indexes from the 5G-NR shown in Fig. 5, i.e., Uniform-256QAM with c = 3/4 and 4/5, which correspond to maximum rates 6bits/layer/chan.use and 6.4bits/layer/chan.use, respectively.
With setting ν to 0.2, 0.4, 0.6, and 0.8, the normalized entropy-losses δ H with PCS-256QM are 0.53%, 1.91%, 3.85%, and 6.1%, respectively.In Fig. 10, the BLER of Uniform-256QAM with c = 4/5 is 2% at SNR=31.2dB.According to Property 4, PCS-256QAM cannot show gains in throughput with ν = 0.6 and 0.8.In particular, Uniform-256QAM with c = 3/4 attains the same BLER at SNR=29.2dB, which shows an SNR-gap of 2dB between these two adjacent MCS indexes.For URLLC applications with a stringent BLER constraint 2%, the SNR needs to be higher than 31.2dBwith c = 4/5.Otherwise, it degrades to c = 3/4 or operates in between these two indexes.However, with ν = 0.4, we see that PCS-256QAM is below this BLER constraint at SNR=31dB, and from Fig. 11, the throughput is increased by 0.2bits/layer/chan.use compared to Uniform-256QAM with c = 3/4.This corresponds to a shaping-gain of 0.6dB.On the other hand, from the zoomed-in figure it can been see that to achieve the same throughput as Uniform-256QAM while satisfying the BLER constraint, the SNR-gain is 0.2dB of PCS-256QAM with ν = 0.4.Therefore, from either perspective, the harvested shaping-gains are promising.In addition, in Fig. 11 we also see that the SNR crossing-points for PCS-256QM to outperform Uniform-256QAM between 30dB and 32dB are well aligned with those SNR values in Fig. 10 where δ H shall be less than the BLERs attained with Uniform-256QAM.
In Fig. 12 and 13, we run the same tests as those in Fig. 10 and 11, but replace the MLM detector with LMMSE in (36).Further, the prior information p(s) is not utilized.This to examine the worst detection-performance in practice.Nevertheless, similar conclusions can be drawn as those made with Fig. 10 and 11, although LMMSE yields about 1dB loss compared to MLM.As it can be seen in Fig. 12 that Uniform-256QAM with c = 4/5 attains 2% BLER at SNR=32.85dB, while PCS-256QAM with ν =0.4 and 0.2 can always show higher throughputs between SNR 31dB and 33dB.In particular, PCS-256QAM with ν = 0.4 shows a rate-increment of 0.2bits/layer/chan.use at SNR=32dB, and an SNR-gain of 0.25dB in the zoomed-in figure when it achieves the same throughput as Uniform-256QAM and a BLER below 2%.E. Shaping-Gain and Rate-Adaptability With the Proposed PCS-Transceiver In Fig. 14, we evaluate the rate-adaptability with the proposed PCS-transceiver, by applying a single code-rate c = 2/3 and LMMSE detector.It can be seen that with three modulations 64QAM, 256QAM, and 1024QAM, PCS-QAM can provide smooth transitions of throughput when SNR changes, by adjusting the effective code-rate r = c(1−δ H ) with ν.This means that the encoder and decoder can be highly optimized for this code-rate, while the system uses PCS to create a dense and even continuous MCS scheme.
BLERs are shown in the same colors of different curves in the regions where generalized shaping-gains can be observed.For instance, when BLER is constrained to around 2-5%, in the regions marked with ellipses a maximal SNR-gain of 0.8dB can be observed when PCS-QAM achieve the same throughputs as Uniform-QAM.Meanwhile, in the regions marked with rectangles a maximum rate-increment of 0.2bits/layer/chan.use can be observed under given SNRs.It can also be seen that PCS-QAM with ν = 0.4 and ν = 1.6 are close to optimal compared to Uniform-QAM with the same or a lower-order modulation in the these regions.This shows the effectiveness of the PCS-transceiver in 5G-NR even with fixed PCS designs and practical transmissions utilizing suboptimal coding, detection and decoding.

VII. SUMMARY
We have proposed a practical, flexible, and effective PCStransceiver design for 5G-NR and its future evolution to further increase SE, which imposes only small overheads.We have analyzed the properties of power-gain, entropyloss, and PAPR-increment, and clarified the connections and differences to conventional GCS and PAS designs.We have further proved a necessary and sufficient condition for PCS-QAM to output Uniform-QAM in throughput.The flexibility of the PCS-transceiver in rate-adaptation without impacting the encoder has also been demonstrated.It can mitigate the large SNR-gaps between adjacent MCSs in transmissions with large MIMO, super-QAM, and under fading channels, and provides a better throughput-envelope.We have also shown that when BLER is below a stringent constraint e.g., 2%, a rate-increment of 0.2bits/layer/chan.use can be observed with the PCS under practical 5G-NR and MIMO transmissions, which corresponds to a shaping-gain of 0.6dB.Meanwhile, an SNR-gain up to 1dB can be observed when PCS-QAM achieves the same throughput as Uniform-QAM.

APPENDIX A PROOF OF PROPERTY 1
We firstly present Lemma 2, which can be directly verified.

APPENDIX B PROOF OF COROLLARY 1
When ν is sufficiently large, both ν exp(−2ν 2 ) and Q( √ 6ν) converge to zero, and from (12) it holds that E PCS ≈ 1 2ν .On the other hand, when ν is sufficiently small, we can use the Taylor expansion to approximate it as where c 1 = E ′ PCS (0) and c 2 = E ′′ PCS (0) are the first and second order derivatives of E PCS at ν = 0.

Fig. 2 .
Fig.2.The proposed PCS-transceiver with a PCS modulator and a demodulator added to the transmitter and the receiver, respectively, which are the main overheads to the existing 5G-NR transceiver-chain.

Fig. 5 .
Fig. 5. Tablefromthe latest version of 3GPP specification[49], where 26 MCS indexes are defined for rate-adapting up to 1024QAM.The SNR values are calculated assuming the system achieves Shannon capacity.

Fig. 6 .
Fig. 6.The relationships between E PCS and ν with different values of M and c, and the approximations of E PCS in Property 1.
in Property 1 are almost always on top of the exact values for M ≥ 4, even up to very large values of ν as shown in the zoomed-in figure.Moreover, the curves with different values of M are also overlapping with each other.

Fig. 7 .
Fig. 7.The relationships between H, PAPR and ν with different values of M and c, and the approximations of H in Property 2.