RSMA Precoding Design Based on Interference Nulling and Sum Rate Upper Bound

Rate splitting multiple access (RSMA), a recent generalization and merging of spatial division multiple access (SDMA) and superposition coding has exponential complexity. Hierarchical streams that transmit a subset of possible streams of encoded messages are proposed for rate splitting and analyzed. To reduce precoding complexity, RSMA with interference nulling (RSMA-IN), that nulls the portion of interference that is not decoded is investigated by formulating a sum rate maximization problem subject to a total power constraint. To solve this non-convex problem, a method that combines interference nulling and stream SNR maximization is proposed. A convex upper bound problem is formulated for the sum rate to compare existing algorithms, and conditions under which the upper bound problem is tight are determined. Taking advantage of the tight cases, we propose an upper bound aided (UBA) enhancement of existing precoding designs by exploiting optimal solutions in the tight cases. It is also shown that the existing augmented weighted mean square error (AWMSE) algorithms cannot converge to a local optimum in sum rate maximization. Simulation results indicate that the proposed approaches enhance performance over existing ones as well as approach their upper bounds in certain instances.


RSMA Precoding Design Based on Interference
Nulling and Sum Rate Upper Bound Elaheh Sadeghabadi , Student Member, IEEE, and Steven D. Blostein , Senior Member, IEEE Abstract-Rate splitting multiple access (RSMA), a recent generalization and merging of spatial division multiple access (SDMA) and superposition coding has exponential complexity. Hierarchical streams that transmit a subset of possible streams of encoded messages are proposed for rate splitting and analyzed. To reduce precoding complexity, RSMA with interference nulling (RSMA-IN), that nulls the portion of interference that is not decoded is investigated by formulating a sum rate maximization problem subject to a total power constraint. To solve this nonconvex problem, a method that combines interference nulling and stream SNR maximization is proposed. A convex upper bound problem is formulated for the sum rate to compare existing algorithms, and conditions under which the upper bound problem is tight are determined. Taking advantage of the tight cases, we propose an upper bound aided (UBA) enhancement of existing precoding designs by exploiting optimal solutions in the tight cases. It is also shown that the existing augmented weighted mean square error (AWMSE) algorithms cannot converge to a local optimum in sum rate maximization. Simulation results indicate that the proposed approaches enhance performance over existing ones as well as approach their upper bounds in certain instances.
Index Terms-Rate-splitting multiple access, broadcast channel, multiple-input multiple-output communications, nonorthogonal multiple access, space-division multiple access.

I. INTRODUCTION
F UTURE generation multi-user multiple-input multipleoutput (MU-MIMO) base stations (BS) face severe challenges serving multiple users. They must either limit performance by interference avoidance, treat interference as noise, or attempt highly complex interference cancellation. Ratesplitting (RS) is a recently proposed multiple access technique motivated by overcoming: i) spatial degree of freedom (DoF) limitations of space-division multiple access (SDMA) systems with imperfect channel state information at the transmitter (CSIT) [1], [2], [3], [4], [5] and ii) limitations of SDMA and non-orthogonal multiple access (NOMA) [6], [7]. The latter motivation is the focus of this paper. A. Related Works SDMA treats interference as noise and manages interference by precoding at the transmitter, while NOMA decodes interference and manages it by superposition coding at the transmitter and successive interference cancellation (SIC) at the receiver. Both SDMA and NOMA have limitations [6]. For example, consider a broadcast channel (BC). SDMA is suitable for an under-loaded regime with fewer receiving antennas than transmitting antennas and performs well for close-toorthogonal user channels. In single-input single-output (SISO) and single-input multiple-output (SIMO) systems, NOMA is able to decode and cancel multi-user interference for aligned user channel directions and large differences in power levels. Also, SDMA is not DoF optimal for imperfect CSIT [3], and NOMA is neither DoF optimal for perfect CSIT, nor for imperfect CSIT [8]. To mitigate SIC receiver performance loss, users must be ordered based on channel gains. However, for MIMO and multiple-input single-output (MISO) systems, due to precoding at the transmitter, users' channels are not degraded and NOMA is not capacity achieving. To exploit both multiple transmit antennas and SIC receivers, MIMO-NOMA schemes based on user grouping are proposed [9]. However, this architecture leads to inefficient use of antenna dimensions at the transmitter and SIC at the receiver, decreasing both DoF and rate [8].
Rate-splitting multiple access (RSMA), based on splitting and merging messages into streams at the transmitter, bridges between SDMA and NOMA by partially decoding interference and partially treating interference as noise [6]. Streams are precoded and transmitted. User receivers decode part of the interference by SIC and treat the rest as noise. RSMA system performance depends on: i) message splitting into transmitted streams, and ii) precoding. The focus here is on precoding design for low complexity RSMA that chooses a subset of all possible streams.
In its generic form, RSMA has an exponential number of transmitted streams and receiver SIC layers. Optimum SIC decoding by exhaustive search is also computationally prohibitive [10]. To reduce complexity, reduced numbers of transmitted streams with unique SIC decoding order are proposed, i.e., 1-layer RS [3], 2-layer RS [4]. These RSMA schemes are surveyed and compared in [11]. Hierarchical user grouping [10] improves performance/complexity over [3], [4] by restricting numbers of streams and SIC layers to grow linearly with users. This paper addresses RSMA for both general subsets of streams and hierarchical streams.

B. Motivation and Contributions
To reduce precoding design complexity, a special class of precoding for RSMA that nulls the part of interference that must not be decoded is used, which we term RSMA with interference nulling (RSMA-IN) [10] and [22], which is similar to precoding design in [4], [23] eliminating inter-group interference by interference nulling. Our results show that as inter-user correlation decreases, the performance loss by interference nulling becomes negligible. Precoding for RSMA-IN has the computational advantage of decoupling sum rate maximization into disjoint optimization problems. This motivates the sequentially descending precoder design (SDPD) algorithm [10], [22]. Its properties are investigated in greater detail here and shown to be equivalent to a two-stage process of interference nulling followed by maximization of the stream's signal-to-noise-ratio (SNR). We term this successive interference nulling and SNR maximization (SIN-MaxSNR) and provide performance comparisons of SIN-MaxSNR to the optimal sum rate of RSMA-IN, and to existing precoding design algorithms, WMMSE [21] and AWMSE [6]. 1 The optimal sum rate of RSMA-IN is challenging to achieve due to non-convexity of the optimization problem. To examine the closeness of sum rate performance of precoding design algorithms to the optimal sum rate of RSMA-IN, a convex problem for the sum rate upper bound is formulated and solved numerically for both the general case of streams and a lower complexity version, i.e., hierarchical streams. It can be shown that the upper bound problem always exists for RSMA-IN with hierarchical streams (RSMA-IN-HS). Taking advantage of the cases where the upper bound problem has the same solution as the sum rate maximization problem, we next propose an upper bound aided (UBA) algorithm to enhance existing precoder designs including SIN-MaxSNR and AWMSE [6]. A new generalization of low complexity streams called optimal hierarchical streams is also introduced which has an upper bound problem that is tight for any channel realization. Their optimal covariance matrices to maximize sum rate can be obtained by solving a convex optimization problem. Previously proposed methods, e.g., 1-layer [3] and 2-layer [3] RS without a common stream to all users are special cases. Our main contributions are the following: • A class of low complexity streams, i.e., hierarchical streams are investigated which include 1-layer and 2-layer RS [3], [4] as special cases. Its key properties are highlighted. • A convex upper bound sum rate problem for RSMA-IN precoding is formulated and used to assess existing precoding algorithms in the literature. • To further lower complexity, a suboptimal successive interference nulling and SNR maximization SIN-MaxSNR precoder is proposed to simplify the joint optimization problem. • To enhance performance of both SIN-MaxSNR and existing AWMSE [6], an upper-bound-aided (UBA) algorithm is proposed that leverages the convex upper bound when tight. While this work was presented in part in [22], this paper provides many new results and proofs of theorems and corollaries, including the SIN-MaxSNR algorithm, application of the AWMSE algorithm [6], complexity comparisons of the proposed algorithms, as well as new and more complete numerical performance comparisons.
The remaining sections are organized as follows: Section II specifies the system model. The convex upper bound problem of RSMA-IN is proposed for both general streams and hierarchical streams in Section III. In Section IV, SIN-MaxSNR and AWMSE precoding design algorithms are applied to RSMA-IN sum rate maximization followed by the proposed UBA algorithm. Section V presents simulation results, and Section VI concludes the paper.
Notation: Matrices and vectors are denoted by bold upper and lower case letters, respectively. Sets are represented by calligraphic font I, and |I| is the cardinality of I. Relative complement of sets J and I is denoted by I\J . |a| and ∥a∥, respectively, represent absolute value and Euclidean norm. (A) H and Tr (A) indicate conjugate-transpose and trace, respectively. Positive-definite matrix A is denoted by A ⪰ 0. R and C represent real and complex fields. CN (., .) denotes a multi-variate circularly symmetric complex Gaussian distribution. O (n x ) is used to denote complexity upper bound cn x for some 0 < c < ∞. Finally, (.) represents scalar product.

II. SYSTEM MODEL
Consider a multi-user MISO system with an M -antenna BS and K single-antenna users. To examine the sum rate of RS, perfect channel state information is assumed to be available at both BS and users. The effect of imperfect CSIT Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
is evaluated in Section V. The channel between BS and users is denoted by H = [h 1 , . . . , h K ] ∈ C M ×K , where h k ∈ C M ×1 is user k's channel vector. Consider an RSMA system where the BS splits users' messages into parts such that the partial messages of a subset of users, I, are combined and encoded into a common stream by symbol s I . All users in the set I are required to decode stream s I to decode their own partial messages. Denote the set of users' indices by K = {1, . . . , K}, and a subset of users, I, as a superuser. The order, r, of s I is r = |I|. All possible ways of splitting messages forms the set of all nonempty subsets of K, which is denoted by S = {I|I ⊆ K, |I| ̸ = 0} and has 2 K − 1 superusers. To reduce RSMA complexity, suppose that the BS sends only a subset G, where |G| = N ≤ 2 K − 1.
That is, the BS sends N streams to the users. S and G are sets of superusers. For example, for three users, Suppose that W u is the uncoded message of user u ∈ K and is split into partial messages W I u |I ∈ G, u ∈ I . Partial user messages encoded into the same stream s I have the same superscript, i.e., W I k ′ |k ′ ∈ I which must be decoded by all users in set I. A single-antenna user u receives the signal where n u is noise, and q I is the beamforming vector corresponding to stream s I . Suppose that streams are zero mean, unit variance, and independent. The BS transmits symbols to the channel with covariance matrices Q I = E q I s I s * I q H I = E q I q H I , ∀I ∈ G. Users only decode streams containing part of their message and treat other streams as noise. Streams are decoded by SIC, and decoding is by decreasing stream order, i.e., for r > t, r-order streams are decoded before t-order streams. For streams of the same order, decoding order must be optimized jointly with beamforming vectors. Assume that streams are decoded in order π : s I1 → s I2 → · · · → s I N , starting with s I1 and ∀i < j, |I i | > |I j |. Note that the index i in s Ii determines the placement of the stream in the decoding order π, and I i is the group of users that must decode s Ii , where i ∈ {1, . . . , N }. The SINR of stream s Ii at user u is given by where the first sum in the denominator is interference to user u of streams not yet decoded, and the second sum is interference of streams not intended for user u. To simplify notation, we use decoding order as the subscript i instead of I i , e.g., Q i instead of Q Ii . For the ith stream to be successfully decoded by all users in I i , its rate must satisfy where R u i = log 2 (1 + γ u i ) is the achievable rate of the ith stream at user u, ∀u ∈ I i . User m i = arg {min u∈Ii γ u i } is called a weakest user of stream s Ii . Suppose that each stream has a unique weakest user. Based on this definition, the achievable sum rate is as follows: The weakest user of stream s Ii can take on |I i | possible values, and as a result, the sum rate can be expressed as the minimum among N i=1 |I i | possible sum rate cases, corresponding to all possible combinations of weakest users, where R t sum is the tth sum rate case, and m i,t ∈ I i is the weakest user of stream s Ii in case t.
This paper considers RSMA-IN that nulls interference not intended for user u, ∀u ∈ K, i.e., where the equivalence in (a) follows from Q i ⪰ 0, ∀i = 1, . . . , N . Applying (6) to (2), Interference nulling simplifies the SINR expression, which makes it possible to derive an upper bound for the achievable sum rate of RSMA-IN in (5). Under certain conditions, R sum can be expressed as a concave function of covariance matrices, enabling sum rate maximization to be expressed as a convex upper bound problem. In Section IV-A, taking advantage of the simplified SINR expression, the joint optimization problem to design beamforming vectors is shown to decouple, yielding our proposed SIN-MaxSNR algorithm. The performance loss by interference nulling, which depends on the values M and K and the inter-user correlation, is discussed in Section V. Also, imperfect CSIT causes imperfect interference nulling. The performance of RSMA-IN is evaluated under both perfect and imperfect CSIT in Section V.

III. CONVEX UPPER BOUND PROBLEM FOR SUM RATE IN RSMA-IN From (5)-(6), sum rate maximization for RSMA-IN under power constraints is expressed as
Expanding R sum from (5), defining auxiliary variableR, and using epigraph form, (8) becomes Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
From (7) and (3), sum rate cases in (9b) are functions of covariance matrices. The objective function and constraints of Problem (9), except for constraints (9b), are linear functions of optimization variables. For a convex problem, the left side of the inequalities in (9b) must be a concave function of covariance matrices. Although, the streams' rates are not concave, their sum can be a concave function of Q i , ∀i = 1, . . . , N . Removing the non-concave constraints in (9b), the relaxed problem provides an upper bound for the sum rate of RSMA-IN. By removing all constraints in (9b), the relaxed problem becomes a feasibility check problem. Therefore, at least one concave sum rate case in (9b) must exist to find a sum rate upper bound. In the following, sufficient conditions for R i sum to be a concave function are derived for RSMA-IN, and it is shown that the upper bound problem always exists for RSMA-IN with hierarchical streams (RSMA-IN-HS).

A. Sum-Rate Upper Bound for the General Case
To find a sum rate upper bound by solving a convex optimization problem, it is required to determine whether R sum is a concave function of Q I , ∀I ∈ G. Consider G u = {I i |m i = u, ∀i = 1, . . . , N } to be a set of superusers with weakest user u. Since each stream has only one weakest user, Theorem 1: The achievable sum rate of RSMA-IN is a concave function of Q I , ∀I ∈ G if all interfering streams in decoding stream s I , ∀I ∈ G u are in G u , ∀u ∈ K.
Proof: Since {G 1 , G 2 . . . , G K } forms a partition of G, the sum rate can be expressed as By assumption, all interfering streams in decoding stream s Ji at user u are in G u . Hence, streams s Jj |s Jj ∈ G u , ∀j > i form the set of all interfering streams in decoding s Ji at user u. As a result, the rate of stream s Ji can be expressed as where (a) comes from the fact that J i ∈ G u . According to (10), it is straightforward to show the following expression: which is a concave function of covariance matrices.
It is straightforward to show that without interference nulling, the concavity condition in Theorem 1 cannot be satisfied. This can be verified from (11). Without interference nulling, the last expression in (11) would be the logarithm of a rational function, whose denominator cannot be cancelled with the numerator in other streams' rate expressions, and as a result, the final simplified expression cannot be a concave function of covariance matrices. Therefore, interference nulling is a necessary condition.
The idea to find sum rate upper bound of RSMA-IN is to remove constraints that are not concave. The relaxed convex problem can be expressed as where C ⊆ 1, . . . , I∈G |I| in (12b) refers to the set of indices corresponding to concave constraints in (9b), i.e., the set of constraints that satisfy the assumptions in Theorem 1, and G t u , ∀u ∈ K refer to the sets in the tth sum rate case, where t ∈ C.

B. Sum Rate Upper Bound for Hierarchical Streams
Upper bound problem (12) is examined here for the special case of hierarchical streams: Definition 1 (Hierarchical Streams): A set of superusers G ⊆ S is called hierarchical, if for any two superusers I, J ∈ G, where |I| ≤ |J |, one of the following properties holds: i)I ∩ J = I, or ii)I ∩ J = ∅. The set G contains all private streams, i.e., superusers of S with cardinality 1. The set of streams corresponding to hierarchical superusers are called hierarchical streams.
An example of hierarchical streams is shown in Fig. 1, and in [10, Algorithm1], an algorithm is proposed to generate hierarchical streams. This algorithm groups users in multiple levels such that at the first level, each user forms its own group. At the second level, users are assigned to groups based on a similarity metric and form a group that cannot be broken in subsequent levels. At next levels, each group in the previous level is a superuser, and superusers are assigned to groups at subsequent levels. Grouping continues until a single group of all users is reached.
User grouping requires a similarity metric and threshold. Since RSMA-IN nulls unintended interference, in order to avoid nulling the desired signal by precoding, users in the same group must have similar channels. Channel alignment provides such a metric: users i and j are deemed similar if Also, the similarity between two groups is defined as the minimum similarity between their users. To ensure that at least one group is formed at each level, ϵ is chosen dynamically at each level based on the similarity of groups generated at the previous level. In fact, ϵ must be chosen between the minimum and maximum similarity of groups generated at the previous level. The maximum number of SIC layers at a user is the number of user grouping levels minus 1. In Fig. 1, there are 4 user grouping levels resulting in 3 SIC layers. Existing 1-layer and 2-layer RS schemes are examples of hierarchical streams with 1 and 2 SIC layers, which are equivalent to 2 and 3 levels of user grouping, respectively. A similar approach is also used in [24, Section III-B] to generate hierarchical streams, which is termed hierarchical agglomerative clustering (HAC). This approach builds a dendrogram in bottom-up fashion. The algorithms in [10] and [24] use similar group similarity metrics and both continue to group until reaching a single group of all users. However, the number of streams in HAC is fixed, while in [10,Algorithm1], it is chosen dynamically by changing similarity threshold at each level. For further details, see [10,Algorithm1] and [24, Section III-B].
Hierarchical streams have attractive properties. First, hierarchical streams do not require finding an optimal decoding order, which reduces RSMA complexity, as given by Proposition 1: For hierarchical streams, there is a unique decoding order.
See Appendix A for the proof of Proposition 1. Hierarchical streams are not necessary, however, to avoid choosing an optimal decoding order. For example, for 4 users, the set of streams s 1 , s 2 , s 3 , s 4 , s {1,2} , s {2,3,4} is not hierarchical, but it has a unique decoding order, e.g., the decoding order for user 2 is π : The second advantage of hierarchical streams is related to the likelihood that the sum rate upper bound problem of RSMA-IN-HS is tight over a random channel realization. Tight refers to the case that sum rate is a concave function of covariance matrices with the results that the upper bound problem in (12) has the same solution as the sum rate maximization problem in (9). When tight, the removed constraints in Problem (12) do not affect the optimal solution. Tightness probability refers to the frequency of occurrence of Problem (12) being tight over the ensemble of channel realizations and is quantified in Section V by Monte Carlo experiments that Problem (12)  Proof of Theorem 2 is in Appendix B. A further advantage of hierarchical streams arises from Theorem 2: the upper bound problem in (12) always provides an upper bound for the sum rate of hierarchical streams. If the set of concave sum rate cases C in the constraints (12b) is empty, Problem (12) becomes a feasibility check problem, and no upper bound is achieved for the sum rate. It can be proven that C is not empty for hierarchical streams. Proof: To prove Corollary 1, we calculate the number concave sum rate cases |C| in RSMA-IN-HS, where C is the set of concave sum rate cases in (12b). We show that |C| ≥ 1 for RSMA-IN-HS. Hence, C is not empty, and Problem (12) is not a feasibility check problem.
For RSMA-IN-HS, the superusers of streams interfering with s I are subsets of I in G. Suppose that superuser I, ∀I ∈ G has n I disjoint subsets with the largest size in G\ {I}. Since G contains all possible superusers with cardinality 1, these superusers form a partition of I. We call these superusers the composing superusers of I. In the case that |I| = 1, we consider I itself as the composing superuser of I. According to Theorem 2, the weakest user of stream s I must be the same as the weakest user of one of its composing superusers. Thus, there are n I cases for the weakest superuser of s I that satisfy Theorem 2. As a result, the total number of cases for the weakest users of all streams satisfying the conditions in Theorem 2 is |C| = I∈G n I . Since G in hierarchical streams contains all possible superusers with cardinality 1, we have n I ≥ 1, ∀I ∈ G, which results in |C| = I∈G n I ≥ 1.
To clarify the notion of composing superusers in the proof of Corollary 1, consider the example shown in Fig. 1 2, 2, 3, and 2, respectively. Note that n I for superusers with cardinality 1 is 1. Therefore, the number of concave sum rate cases for this example is |C| = I∈G n I = 48.
This section concludes by defining a subset of hierarchical streams, optimal hierarchical streams, for which Problem (12) is tight, i.e., the optimal value of Problem (12) is the same as the optimal value of the sum rate maximization problem.
Definition 2 (Optimal Hierarchical Streams): Suppose that G is a set of hierarchical superusers. The set of streams corresponding to G are called optimal hierarchical streams if for any two superusers I, J ∈ G, where I ∩ J = A, the cardinality of A is 1 or zero.
Corollary 2: For optimal hierarchical streams, R sum in RSMA-IN is a concave function of Q I , ∀I ∈ G.
See Appendix C for the proof of Corollary 2. Based on Definition 2, 1-layer RS [3] with one common stream to all users and 2-layer RS [4] without the stream common to all users are examples of optimal hierarchical streams. A 4-user example is s 1 , s 2 , s 3 , s 4 , s {1,2} , s {3,4} . The above implies that the optimal covariance matrices to maximize RSMA-IN sum rate can be obtained by solving a convex optimization problem.

IV. PRECODING DESIGN ALGORITHMS
In this section, three precoding design algorithms to maximize the sum rate of RSMA-IN are provided. First, an algorithm is proposed that exploits interference nulling condition to decouple the joint optimization problem, which lowers complexity. It is next shown that this algorithm operates in two stages, performing interference nulling followed by SNR maximization. This is termed successive interference nulling and SNR maximization (SIN-MaxSNR). To enable performance comparison of SIN-MaxSNR with algorithms proposed in the literature, we apply the augmented weighted mean square error (AWMSE) algorithm in [6] to sum rate maximization in RSMA-IN. The AWMSE precoder is based on weighted minimum mean-squared error (WMMSE) [21] to maximize sum rate. However, we show that the AWMSE algorithm is not equivalent to sum rate maximization Problem (8). In addition, it is shown in [6] that AWMSE converges to a KKT point. However, without providing a constraint qualification (CQ), a KKT point is not necessarily a local optimum [20]. Numerical results in Section V confirm that AWMSE sum rate performance is not close to optimal. We then propose the upper bound aided (UBA) algorithm that exploits Problem (12) in the tight case to enhance the performance of existing precoding design algorithms including SIN-MaxSNR and AWMSE. In addition, the UBA algorithm may reduce the complexity of AWMSE by requiring solution of just a single convex optimization problem in tight cases. The section concludes with a discussion on complexity.

A. Successive Interference Nulling and SNR Maximization
The sum rate maximization Problem (8) requires joint optimization over all streams' beamforming vectors. Here, we propose a precoding design that uses the simplified SINR expression in RSMA-IN to decouple Problem (8).
The beamforming vectors in the feasible region of Problem (8) must satisfy interference nulling condition (6) and a total power constraint. To apply the interference nulling condition, the beamforming vector of stream s I must lie in the null space of the channel, H −I , which excludes users in I. Suppose that the beamforming vector of stream s I is q I = √ ρ I F I f I , where ρ I is the power level, F I is the first-stage beamformer that applies interference nulling, and f I is the second-stage beamforming vector of stream s I such that ∥F I f I ∥ = 1. For interference nulling, F I maps the vector f I into H −I 's null space. Expressing the singular value decomposition (SVD) of H −I as U null −I , U null −I Σ −I V H −I , then where U null −I and U null −I are composed of the right singular vectors of H −I corresponding to the nonzero and zero singular values of H −I , respectively. Based on (13), F I is tall unitary. Therefore, it can be concluded that which shows that f I is unit norm. Next, second-stage beamforming vectors and power levels are designed to maximize sum rate: Problem (14) is a non-convex joint optimization problem over unit-norm second-stage beamforming vectors and power levels. To simplify the non-convex problem in (14), one approach is to alternate between power allocation and second-stage beamforming design. First, consider the second-stage beamforming design for fixed powers. To avoid difficult joint optimization over all vectors f I , ∀I ∈ G, a suboptimal approach is to design the second-stage beamforming vector of each stream by maximizing the rate of that stream. From (3) and (7), the rate of the ith stream, R i , ∀i = 1, . . . , N , in terms of the above two-stage beamforming parameterization is as follows: In (15), the rate of the ith stream depends on other streams' beamforming vectors, complicating rate maximization. Due to interference nulling, R i only depends on vectors f i , f i+1 , . . . , f N . Hence, the second-stage beamforming vectors can be designed sequentially from f N to f 1 . Therefore, assuming that f j =f j , ∀j = i + 1, . . . , N are already designed, the vector f i is found from solving: which is equivalent to Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.

Algorithm 1 SIN-MaxSNR
Inputs: Channel matrix H and decoding order s I1 → s I2 → · · · → s I N Outputs: Beamforming vectors q i , ∀i = 1, . . . , N 1 Compute F i , ∀i = 1, . . . , N from (13); 2 Equal power allocation: ρ i = Ptot N , ∀i ∈ {1, . . . , N }; 3 while Not converged or it is the first iteration do 4 Compute f i , ∀i = 1, . . . , N from (17); 5 Find ρ i , ∀i = 1, . . . , N in Problem (18) using water filling algorithm; Compute R sum in (4); 8 end which can be solved by semi-definite relaxation [25]. Considering unit variance noise, the objective function of Problem (17) is the minimum SNR of the ith stream s Ii decoded by the set of users I i which will be referred to as the SNR of the ith stream. Hence, the second stage of the beamforming design maximizes the stream's SNR. The unit-norm beamforming vector design can be interpreted as interference nulling at the first stage followed by maximizing SNR at the second stage, where interference nulling can be viewed as a generalization of ZFP in SDMA [26] to RSMA, and SNR maximization is analogous to MRT in SDMA [27].
The power allocation problem for fixed second-stage beamforming vectors is not easy to solve. However, we can obtain a suboptimal solution by defining N streams with SNRs γ i , ∀i = 1, . . . , N and find power allocation parameters in each iteration by solving the following problem: where γ i = min u∈Ii γ u i f prev j , ρ prev i , ∀j ≥ i /ρ prev i , where prev refers to the previous iteration. The solution to Problem (18) can be obtained by water filling [28]. The above steps are summarized in Algorithm 1, successive interference nulling and SNR maximization (SIN-MaxSNR).

B. Augmented Weighted Mean Square Error Algorithm
AWMSE in [6] is a precoding design based on WMMSE [21] that proposes sum rate maximization based on mean square error (MSE). The AWMSE algorithm iteratively computes minimum MSE (MMSE) and then solves a joint convex optimization problem to find beamforming vectors. In the following, the AWMSE algorithm is applied to the proposed RSMA-IN sum rate maximization Problem (8) in Section III. It is shown that the AWMSE approach is not equivalent to the sum rate maximization problem.
The received signal at user u after decoding and cancelling the part of interference that must be decoded from the signal in (1) and using the interference nulling condition in (6) is as follows: User u applies receiver g u i to detect the ith stream, i.e.,ŝ Ii = g u i y u i , which results in the MSE of First, the algorithm defines augmented mean square error (AMSE) coefficient ξ u i for stream s Ii , ∀i = 1, . . . , N decoded at user u, ∀u ∈ I i as ξ u i = w u i ϵ u i − log 2 (w u i ) , where w u i is an auxiliary coefficient defined for stream s Ii decoded at user u. The AMSE coefficient has the following relation with the rate of stream s Ii at user u: According to (21), sum rate can be expressed based on AMSE coefficients: The AWMSE algorithm solves a problem equivalent to sumrate maximization. From (21) and (24), an equivalence to sum rate maximization is as follows: The AWMSE problem can be formulated under the following assumption: From (25) and (26), maximizing sum rate is equivalent to maximize qi,∀i=1,...,N R sum ≡ minimize Therefore, AWMSE problem for RSMA-IN sum rate maximization in (8) is as follows: Problem (28) is not convex with respect to all variables. However, it is convex with respect to each variable, individually, if others are fixed. Hence, AWMSE algorithm finds the minimizer of Problem (28) using an alternating algorithm, summarized by Algorithm 2.
Under the assumption (26), the solution of Problem (28)

for fixed beamforming vectors is
. . , N . Also, Problem (28) with known values of (g u i , w u i ) is equivalent to the following convex problem: where auxiliary variablesξ i , ∀i = 1, . . . , N replace max(.) function in (28a), and inequality (29b) is obtained from (20). AWMSE alternates between updating (g u i , w u i ) and beamforming vectors.
The equivalence between AWMSE and sum rate maximization problem depends on the assumption in (26), which may not always hold. In addition, according to [6], w u,MMSE i , g u,MMSE i satisfies the KKT conditions [20] of Problem (28) for fixed beamforming vectors. However, without a CQ, the KKT conditions are neither necessary nor sufficient for a point to be a local optimum. Therefore, the solution of AWMSE algorithm is not necessarily the local optimum of Problem (28), which can result in performance loss.

C. Upper Bound Aided Algorithm
To enhance precoding algorithms including SIN-MaxSNR and AWMSE, we proposed the UBA algorithm, which uses Algorithm 3 UBA Inputs: Channel matrix H and decoding order s I1 → s I2 → · · · → s I N Outputs: Beamforming vectors q I , ∀I ∈ G 1 Solve Problem (12); 2 Compute sum rate in (4); 3 Check the tightness of Problem (12) by comparing its optimal value with the sum rate; 4 if The solution of (12) is tight then 5 Find q I from Q I by randomization for all I ∈ G; 6 else 7 Use another precoding design algorithm, e.g., SIN-MaxSNR or AWMSE; 8 end upper bound problem (12). According to Theorems 1 and 2, in certain situations (12) may be tight, and UBA algorithm exploits these cases to enhance the average sum rate. The UBA algorithm is summarized by Algorithm 3. At first, it solves the upper bound Problem (12) using convex optimization. Then, it examines whether or not the solution is tight. To check tightness, the sum rate in (4) is calculated based on covariance matrices obtained by the upper bound problem. If the calculated sum rate is not less than the optimal value of Problem (12), it means that the solution of (12) is tight, and covariance matrices obtained from Problem (12) are optimal. However, the covariance matrices are not necessarily rank one, and randomization is required to obtain the beamforming vectors [25]. If (12) is not tight, beamforming vectors are found using SIN-MaxSNR or AWMSE in Algorithms 1 and 2.
In the randomization process, vectors are generated using the covariance matrices, and the vectors with the best sum rate are chosen as the beamforming vectors. The following steps ensure satisfaction of interference nulling and total power constraints: 1) Generate a set of random vectors q I , ∀I ∈ G with distribution q I ∼ CN 0, F H I Q I F I 2) To apply interference nulling, generate q I , ∀I ∈ G such that q I = F I q I , ∀I ∈ G, where F I is defined in (13), so E q I q H I = Q I . 3) Scale the vectors as q I = Tr (Q I ) q I / ∥ q I ∥ , ∀I ∈ G to satisfy the power constraint.

D. Complexity
In this section, the complexities of the precoding design algorithms in Section IV are investigated. In the tight case, the UBA algorithm solves one convex optimization problem in (12). If not tight, the application of SIN-MaxSNR or AWMSE is also required. The convex problem in (12) can be solved by interior-point methods with polynomial-time complexity [29]. The interior-point method approximates the convex problem by the log-barrier method [20] and obtains a convex problem that only has equality constraints, which can be solved by Newton's method [20]. The duality gap in sum rate by using this approximation is nineq j=1 θ j /l, Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  I  COMPLEXITY OF UPPER BOUND PROBLEM, AWMSE, AND SIN-MAXSNR FOR RSMA-IN where n ineq is the number of inequality constraints in Problem (12), parameter l trades of complexity and duality gap, and θ j = M or 1 for positive definite constraints or other inequality constraints of Problem (12), respectively. Therefore, nineq j=1 θ j = t∈C 1 + I∈G M + 1 = |C| + M N + 1, where C is the set of concave sum rate cases, M is the number of BS antennas, and N is the number of streams. From [20], the number of Newton iterations required for the interior-point method to solve (12) is where l (0) is the initial value of l, and ϵ is the convergence threshold. Hence, for a fixed initial value of the duality gap, the number of Newton iterations to solve Problem (12) is O |C| + M N + 1 , which is polynomial-time complexity. Note that the number of concave sum rate cases, |C|, depends on the user grouping algorithm. However, |C| can be obtained for RSMA-IN-HS as described in the proof of Corollary 1.
The complexities of SIN-MaxSNR and AWMSE mainly depend on solving Problems (17) and (29), respectively. Both problems can be solved using convex optimization methods, and similar to computing the complexity of Problem (12), it can be shown that the complexity of (17) and (29) in terms of The complexity depends on the computation of one Newton iteration, which is mainly dependent on problem size. Problem (12) computes N symmetric covariance matrices of size M × M , while problems (17) and (29) obtain beamforming vectors of size M × 1. In addition, AWMSE solves a joint optimization problem in (29), while SIN-MaxSNR solves decoupled optimization problems, (17), for each i = 1, . . . , N . Therefore, SIN-MaxSNR has the smallest search space, and using UBA to complement SIN-MaxSNR does not reduce the computational complexity. As for using UBA to complement AWMSE (UBA-AWMSE), Problem (29) in AWMSE has a smaller problem size and computational complexity in comparison to (12) in UBA. However, UBA solves just one convex optimization problem, while AWMSE is an alternating algorithm. Hence, UBA-AWMSE can lower the complexity of AWMSE. In conclusion, while UBA does not have clear complexity reduction advantages, the UBA algorithm obtains optimal covariance matrices in the tight case, and as a result, can enhance the sum rate performance at similar complexity. The complexities discussed above are summarized in Table I.

V. SIMULATION RESULTS
The sum rate performance of the precoding designs are evaluated and compared with the upper bound problem (12) for RSMA-IN-HS. From Algorithm 2, AWMSE performance depends on the beamforming vectors' initial values. As observed in [1], initializing beamforming vectors q I , ∀I ∈ G with the dominant eigenvector of the SVD of H I , labelled by "MRT initial" in Figure 2 is shown to perform well. AWMSE results are also provided for "zero initial" value beamforming vectors. To produce hierarchical streams, the user grouping algorithm in [10, Algorithm 1] is applied. Special cases of hierarchical streams, i.e., optimal hierarchical streams in Section III-B as well as 1-layer [3] and 2-layer RS [4] are also compared. In 1-layer RS, the BS sends private streams as well as one stream common to all users. 2-layer RS is equivalent to 3 levels of user grouping, where in the first level, each user is a group, and in the third level, all users form a group. To produce optimal hierarchical streams, the common stream to all users is removed from 2-layer RS. Total transmit power is proportional to number of users. Average received SNR in dB is denoted by γ ave , and total transmit powerP tot = K 10 γave/10 . Users' channels are modeled by one-ring scattering [30]. The BS has uniform linear array geometry with antenna spacing d with nth antenna element position u n = nd − M +1 2 d, ∀n = 1, . . . , M . Users' angles of arrival (AoA), θ k , ∀k ∈ K, are uniformly distributed over [0, 2π). The angle spread ∆ = π 12 for all users. User k's channel vector h k ∼ CN (0, α k R k ), and R k elements [30]: λ is the carrier wavelength, which is 12.5cm at 2.4GHz, and antenna spacing d = λ 2 . The channel power gain α k , ∀k = 1, . . . , K has Rayleigh distribution with mean π 4−π σ 2 , where variance σ 2 is a measure of channel power disparity. Unlike previous investigations [10], [22], channel disparity is used here to gain additional insight into SIC performance.
To measure the likelihood of tightness of the sum rate upper bound problem for RSMA-IN-HS for a random channel realization, a Monte Carlo experiment with 200 channel realizations is conducted based on the one-ring scattering model with a channel power disparity of σ 2 = 1 in a 7-user system. Table II shows the sum rate upper bound tightness probability, i.e., frequency of occurrence of Problem (12) being tight over the ensemble of channels, for 2-layer and hierarchical streams with more than 2 layers [4], [10]. Table II examines effects of average received SNR and number of streams on tightness probability. As tightness probability increases, the average improvement by UBA algorithm increases, and the upper bound problem's average sum rate is closer to the optimum. From Tables IIa and IIb, as SNR increases from 1dB to 13dB, the tightness probability of the upper bound problem decreases. As transmit power increases, the power constraint is easier to satisfy and the non-convex constraints in Problem (9) reduce the tightness probability. From [10], the number of streams for hierarchical streams with > 2 layers is greater than that of 2-layer RS, and from Tables IIa and IIb, the tightness probability is lower, reflecting the increase in the number of non-convex constraints in (9b). From Table II, the sum rate upper bound problem tightness occurs with high probability for 2-layer RS. The 1-and 2-layer RS are practical special cases of hierarchical streams with few SIC layers at the user end. Table IIa shows high tightness probability for 2-layer RS, and from Section III-B, 1-layer RS is an instance of optimal hierarchical streams with tightness probability 1.  (12) is also provided as a comparison point. These results agree with those obtained for hierarchical streams in [22, Figure 1]. However, the results here are obtained with the MOSEK solver [31], which is more reliable for optimization problems containing logarithmic functions, while the results of [22, Figure 1] are obtained with the SeDuMi solver [32]. In addition, the effect of channel disparity on sum rate performance of RSMA-IN-HS is considered here. As observed in Figs. 2a and 2b, sum rates are higher for σ 2 = 4. This is explained by RSMA's linear precoding having better performance at low channel correlation and low channel disparity, while superposition coding and SIC (SC-SIC) having better performance for high channel correlation and high channel disparity, as also found in [6]. Since channel correlation is relatively high for ∆ = π 12 , SC-SIC is more effective, and channel disparity increase results in higher sum rate performance. As seen in Fig. 2a, the performances of SIN-MaxSNR and UBA algorithms are close to the sum rate upper bound, i.e., to the optimal sum rate. In cases where the upper bound problem's sum rate is not optimal, SIN-MaxSNR and UBA-SIN-MaxSNR have similar performance. Otherwise, in the tight case, UBA-SIN-MaxSNR uses the optimal covariance matrices to find beamforming vectors and outperforms SIN-MaxSNR. Similarly, UBA-AWMSE's superiority over AWMSE is confirmed in Fig. 2a and 2b. In addition, for σ 2 = 4, Fig. 2b   Average sum rate comparisons as in Fig. 2 but with optimal hierarchical streams. Figures 3a and 3b show the sum rate of optimal hierarchical streams for (M, K) = (7, 7) and ∆, σ 2 = π 12 , 1 where the upper bound problem obtains the optimal sum rate, and as observed in Fig. 3a, the performance of the SIN-MaxSNR algorithm is near optimal. Also, Figs. 3a and 3b show that UBA-SIN-MaxSNR and UBA-AWMSE can achieve the optimal sum rate. In this case, UBA algorithm never switches to SIN-MaxSNR or AWMSE. We remark that if the sum rate upper bound problem is tight, UBA algorithm only solves a single convex optimization problem, and if it is not tight, it switches to SIN-MaxSNR or AWMSE, which are iterative algorithms. Hence, as discussed in Section IV-C, UBA algorithm can reduce complexity in some cases. Figures 4a and 4b compare AWMSE and SINR-MaxSNR and examine the effect of imperfect SIC on sum rate performance. To model residual interference in SIC decoding, the coefficient η ∈ [0, 1] is used in the SINR formula of the ith stream at user u as in the following [33]: which changes the sum rate to R sum = N i=1 log 2 (1 + min u∈Iiγ u i ) . Figures 4a and 4b have SIC coefficients η = 0 and η = 0.2, respectively, and M, K, ∆, σ 2 = 7, 7, π 12 , 1 , and display hierarchical, 1-layer, and 2-layer RSMA schemes. Figure 4a also display SDMA-ZF, where the transmitter utilizes ZFP to transmit streams, and there is no RS. The numbers of streams, N , are such that N SDMA-ZF < N 1-layer ≤ N 2-layer ≤ N Hierarchical [10], so that hierarchical streams has the highest sum rate, and SDMA-ZF has the lowest one, which is in agreement with Figs. 4a without SIC error. The numbers of SIC layers are L 1-layer = 1, L 2-layer = 2, and L Hierarchical > 2 and are proportional to imperfect SIC error. Increasing the number of SIC layers also allows for more streams, which increases sum rate, creating a tradeoff in the number of chosen SIC layers. As observed in Fig. 4b, increasing from 1 to 2 SIC layers increases sum rate, but from 2-layer RS to hierarchical streams, the effect of imperfect SIC dominates over that of increased streams. From Fig. 4a (η = 0), we observe SIN-MaxSNR's superior sum rate over AWMSE as expected since the latter does not converge to optimal beamforming vectors in the sum-rate maximization problem. Also, the performance of 2-layer RS using SIN-MaxSNR approaches that of hierarchical streams using AWMSE. Not shown in Fig. 4b is that MRT-initialized AWSME has severe performance degradation from SIC errors due to nonconvergence to a local optimum. Figures 5a and 5b present average sum rate of 2-layer RS with and without interference nulling using AWMSE for (M, K) ∈ {(7, 5) , (7, 4)} and ∆ ∈ {π/8, π/6}. In Fig. 5, path loss is unity for all users, i.e., α k = 1, ∀k ∈ K. To apply interference nulling, the beamforming vector of stream s Ii , i.e., q i , cannot be in the span of H −Ii , which is of dimension K − N i . Therefore, q i must be chosen from a space of dimension M − K + N i . Larger values of M − K increase degrees of freedom in choosing q i , and reduce interference nulling performance loss: as observed in Fig. 5, performance loss for (M, K) = (7, 4) is less than that for (M, K) = (7, 5). Interference nulling performance loss also depends on interuser correlation. Larger correlation between H Ii and H −Ii requires more power for interference nulling, and greater performance loss, as indicated in Figs. 5a and 5b. Note that larger angle spread, ∆, implies lower correlation. Figures 6a and 6b present the average sum rate of RSMA-IN-HS for different CSIT conditions, where parameters M, K, ∆, σ 2 = 7, 7, π 12 , 1 . The channel estimation model used follows that in [16], where user k's channel is modeled by h k = h k + h k , ∀k ∈ K such that the estimated channel h k ∼ CN 0, 1 − τ 2 α k R k , with uncorrelated estimation error h k ∼ CN 0, τ 2 α k R k . The CSIT quality is determined Fig. 6.

VI. CONCLUSION
Low-complexity RSMA algorithms using hierarchical streams are proposed and investigated under an interference nulling condition. A new convex upper bound problem formulation is shown to enhance existing precoding designs. Simulations reveal sum rates of proposed UBA and SIN-MaxSNR algorithms close to the upper bound and sum rate upper bound tightness. The upper bound problem for 1-layer RS is always tight and for 2-layer RS the tightness probability exceeds 0.75. The proposed SIN-MaxSNR precoding combines interference nulling and SNR maximization and reduces complexity by decoupling optimization problems. The proposed UBA enhances performance of existing precoding algorithms. When the upper bound problem is tight, UBA obtains optimal covariance matrices and higher sum rate performance. Simulation results show significant performance improvement of UBA-AWMSE over AWMSE, e.g., average sum rate is increased by 2.58 bits per channel use at 10 dB SNR and channel disparity of 4.

APPENDIX A PROOF OF PROPOSITION 1
User u, ∀u ∈ K only decodes streams containing parts of its message. In the decoding procedure, multiple choices of decoding order only arise when two streams of same order contain parts of user u's message. From Definition 1, the intersection of any two superusers is empty or equal to one of the superusers, implying that the intersection of two distinct superusers of the same order is empty. Hence, there are no two superusers of the same order intended for user u.

APPENDIX B PROOF OF THEOREM 2
By definition of hierarchical streams, for any two superusers I, J ∈ G, where |J | < |I|, we have I ∩ J = ∅ or J . Therefore, three cases may occur for the weakest users of I and J : (1) m J = m I , (2) m J ̸ = m I and m I ∈ J , and (3) m J ̸ = m I and m I ∈ I\J . From Theorem 1, for R sum to be a concave function of covariance matrices, all interfering streams in decoding s I ∀I ∈ G for its weakest user m I , must have the same weakest user as s I . Using RSMA-IN's decoding rules, Theorem 1 is applicable to Case (3), since m I ̸ ∈ J , s J does not interfere with decoding of s I at m I . Case (1) satisfies Theorem 1's conditions. However, due to m I ∈ J , case (2) violates Theorem 1's conditions, and thus s J interferes with decoding of s I at m I , while m J ̸ = m I . Hence, to avoid violating Theorem 1, only cases (1) and (3) are permissible.

APPENDIX C PROOF OF COROLLARY 2
From the proof of Theorem 2, in RSMA-IN-HS, for any two superusers I, J ∈ G, where |J | < |I|, three cases may occur for the weakest users of I and J : i) m I = m J , ii) m J ̸ = m I and m I ∈ J , and iii) m J ̸ = m I and m I ∈ I\J . From Theorem 2, the first and third cases permit R sum to be a concave function of Q I , ∀I ∈ G. We show that the second case cannot occur. From Definition 1, if J and I have nonempty intersection, J ∩ I = J , and from Definition 2, |J | = 1. Hence, m J ̸ = m I and m I ∈ J are mutually exclusive.