Rate-Splitting Multiple Access and Dynamic User Clustering for Sum-Rate Maximization in Multiple RISs-Aided Uplink mmWave System

In this paper, a reconfigurable intelligent surfaces (RISs)-aided millimeter wave (mmWave) uplink (UL) rate-splitting multiple access (RSMA) system is investigated which targets to achieve better rate performance and enhanced coverage capability for multiple users. The considered UL RSMA model splits the rate for each user by dividing their message into multiple parts and hence exploits all the necessary degrees of freedom to achieve maximum capacity region and high user fairness. In particular, we focus on the sum-rate maximization for considered UL RSMA system subject to joint optimization of power allocation to the UL users and beamforming design, i.e., active receive beamforming at the base-station (BS) and passive beamforming at multiple RISs. To efficiently mitigate high inter-node interference in multi-user scenario, we first provided a low-complex user pairing scheme based on k-means clustering and then develop an effective low-cost alternating optimization framework to solve the joint optimization problem sub-optimally by decoupling the problem into different sub-problems of power allocation and beamforming design. Specifically, the sub-problems of power allocation and beamforming design are solved using successive convex approximation, Riemannian manifold and fractional programming techniques. Later, the unified solution based on block coordinate descent (BCD) algorithm is proposed. Extensive numerical simulations validate that the user-clustering effectively significantly improves the performance gain and the considered RSMA system outperforms the conventional multiple schemes in terms rate and user-fairness. Also, the exploitation of spatial correlation among each RIS elements i.e., non-diagonal phase-matrices at each RIS achieve better performance that conventional diagonal phase-matrices setting.


I. INTRODUCTION
R ECENTLY, reconfigurable intelligent surfaces (RISs), also known as intelligent reflecting surfaces (IRS), have been considered as a promising solution to improve the network coverage and to resolve blockages issue in millimeter wave (mmWave) communications [1], [2], [3]. Despite the potential of providing gigabits-per-second communication, the severe path loss and high directivity make mmWave communication vulnerable to severe attenuation and blockages, which can be frequent in dense urban environments [4]. An RIS is a planar array consisting of multiple low-cost reflecting elements which steers the incident signal by adjusting the phase shift and amplitude [5]. Owing to this ability, RISs can reconfigure the wireless channel to facilitate information transmission and the performance of mmWave communications can be significantly enhanced by deploying RISs on the exterior walls of buildings [6].
Besides, the ever-rising demand of high data rates and dense wireless networks in recent years have led to the inevitable search for very high spectrum and power efficient technologies [7], [8]. As a possible candidate technique to radio access of the future wireless networks, rate-splitting multiple access (RSMA) has recently been recognized as a promising technology due to its superior spectral efficiency and effective resource sharing [8], [9], [10], [11], [12], [13], [14]. Particularly, RSMA allows multiple users to access the same orthogonal resource block and splits the rate of each user into various sub-messages [13], [14], [15]. The rate-split of the users enable efficient resource allocation and hence, secure enhanced spectral efficiency as compared to existing multiple access schemes such as orthogonal multiple access (OMA), non-orthogonal multiple access (NOMA) and space-division multiple access (SDMA) [11], [14], [15], [16]. Specifically, the rate-splitting in RSMA exploits all the necessary degrees of freedom in power control to achieve the entire capacity region and the optimal user fairness in the multiple access channel even with sub-optimal user pairing [17].
coverage capability for each users [18], [19]. While, the key feature of RSMA technique is to split the user messages into multiple sub-messages so as to achieve more flexible interference management, and thus provides better rate-throughput while ensuring delay-limited transmission, optimal userfairness, massive connectivity, spectral efficient transmission as compared to conventional OMA and NOMA [17], [20]. The interplay between RIS and RSMA techniques can bring out a new paradigm for wireless communication architectures which effectively utilizes their individual merits of enhanced spectral efficiency and coverage capability and render a competent solution for better system performance in next-generation wireless networks. The underlying merits of the RIS aided UL RSMA system are detailed as follows: 1) Enhanced spectral efficiency and Coverage capability: The integration of RIS and RSMA captivate the individual merits in terms of high spectral efficiency and enhanced coverage for the users on both transmission and reflection sides. Mainly, the optimal phase-shift design overcomes the randomness effects of channel and provide high spectral efficiency especially for RIS-aided network and this further enhances the system performance for power-domain multiple schemes i.e., RSMA and NOMA [21]. 2) Smart RSMA design: The passive array gain that comes with high number of STAR-RIS elements relaxes the need of massive antenna systems for RSMA system, and it also maximize the overall performance significantly even in limited spectral resources when compared to NOMA system [11]. Moreover, the channel diversity and flexible interference management facilitated by RIS and RSMA, respectively prohibits the high power (energy) consumption for dead zone users to maintain high quality of service (QoS) via direct link (without RIS) [22]. 3) Robustness to imperfect SIC: RSMA is robust to imperfect SIC due to which it becomes an competent solution for RIS aided networks for better system performance. Besides, the performance of multi-layer RSMA system is prone to imperfect SIC decoding. However, the passive array gain provided by RIS compensates the deterioration caused by imperfect SIC decoding up to possible extent [23]. 4) Low hardware overhead: The channel diversity provided by RIS enforces the single-layer/ two-layer RSMA systems to attain performance equivalent to higher-layer level RSMA systems (without RIS system). Moreover, the performance gain achieved by STAR-RIS aided networks can significantly relax the sophisticated SIC receiver design for RSMA system.

B. Related Works
Many efforts have been dedicated to improve the system performance with respect to spectral efficiency, energy efficiency, outage probability, network coverage capability in RIS-assisted mmWave communication systems [18], [24], [25], [26], [27], [28]. For instance, the authors in [24] studied joint optimization of power allocation and phase-shifters in RIS-aided mmWave systems. In [26], an RIS-aided massive multiple-input multiple-output (MIMO) architecture for mmWave communications was investigated in which two efficient precoder were designed by exploiting the sparsity of mmWave channels. Generally, the utilization of multi-RIS can resolve the blockages issue for mmWave communication by providing smart-radio propagation environment and flexibility when compared to conventional RIS system. Even though the user is present in single RIS's reflection half-space, the single RIS deployment may not guarantee an effective blockage-free transmission path between user and base-station (BS) especially in dynamic and dense urban environments [29]. Due to the practical limitation on the number of RIS elements on RIS module [30], the single IRS may render limited passive beamforming gain and thus can refrain a high gain performance. The authors in [19] demonstrated that received signal power increases quadratically with the number of reflecting elements for both the single IRS and multi-IRS cases. Moreover, they also demonstrate that the joint optimization of active beamforming at base station (BS) and passive beamforming at each RIS can significantly improve the rate performance. Notably, few works [31], [32], [33] in literature showed that spatial correlation among channels or among the phase-shifters of each RIS render positive impact on the system performance of RIS aided communication networks.
Further, the resource allocation problem for multi-channel RIS-aided NOMA and RSMA systems have attracted significant attention in past years. The joint power allocation and phase shifts optimization problems for sum-rate maximization and power minimization were studied for the downlink RIS-NOMA multi-user systems in [34], [35], [36], [37], [38], [39], [40], and [41], respectively, which validated the superior performance of NOMA over conventional OMA scheme. The authors in [34], [42], [43], and [44] proposed algorithms to improve the spectrum and energy-efficiency fairness for a multiuser RIS aided network. In [45] and [46], the authors concluded that the high SINR and high number of RIS elements improve the outage probability, ergodic capacity and rate throughput. Overall, as compared to conventional OMA scheme, the NOMA schemes for RIS aided network provide better spectral efficiency and coverage capability especially when direct link between communicating nodes is absent [47], [48], [49]. Nevertheless, the recent studies reveal that the system performance can be further improved using RSMA which splits the rate of user into various sub-messages [11], [12], [13], [14], [16], [43], [50], [51], [52], [53]. The closed-form expressions for the outage probability of cell-edge users and near users were derived in [51] for RIS assisted downlink communication which target to gain optimal resource allocation. It was shown that RSMA performs better than NOMA under various system parameters such as the number of RIS reflecting elements and the node density. The authors in [52] considered the maximization of the minimum transmission rate among multiple single-antenna users for RIS-aided multi-user MISO RSMA downlink system by optimizing the transmit beamformers at the transmitter and the phase sifters at the RIS. In [53], the problem of energy efficiency maximization was investigated by controlling the active and passive beamforming at the BS and at the RIS, respectively. The works in [51], [53], and [52] amalgamate RIS and RSMA technologies to achieve optimal network performance in terms of energy efficiency and rate maximization and outage analysis.

C. Motivations and Contributions
The multi-RIS deployment provide effective LOS and thus render robust performance against blockages in mmWave communication [19]. Alternatively, the power control at the low-cost users and efficient spectrum sharing in RSMA design will become critical aspects in resource-limited networks. Overall, the integration of multi-RIS deployment and the RSMA can be beneficial in terms of better spectral efficiency, improved flexibility and enhanced coverage especially in mmWave communication. However, the deep insights and solid theoretical analysis for the integration of multi-RIS and RSMA is majorly missing in the previous works. The theoretical and practical investigation of the system performance of multi-RIS-aided RSMA system is a hot research topic and thus, is the prime motivation of this work. Nevertheless, the selection of optimal power-splitting, decoding order and beamforming design schemes becomes very pivotal for multi-RIS aided RSMA system which bring new optimization challenges for the effective resource allocation design.
Despite many potential of both multi-RIS deployment and RSMA schemes, the research on integrated multi-RIS aided UL RSMA system is still in infancy. As far as the authors are aware, the UL-RSMA has not been investigated with multi-RIS in the literature yet from neither a theoretical nor a resource allocation perspective. Motivated by this background, we investigated multi-RIS-RISs-aided UL-RSMA framework where the UL users operate in RSMA fashion and communicate with the BS via multi-RISs. The important contributions of this work are as follows: 1) We theoretically analyze the system performance of multi-RIS aided RSMA system in terms of spectral efficiency and coverage capability of uplink users under mmWave communication. In the considered RSMA system, a message of each user is split into several sub-messages and each part contributes to the rate of that user and the BS decodes them using an appropriate decoding order. To the extent of our knowledge, this is the first paper looking at spectral efficient resource allocation design for multi-RIS aided UL RSMA system by exploiting the mmWave channel characteristics. 2) To effectively mitigate the inter-user interference (IUI) in the considered multi-RISs aided multi-user UL RSMA system, we first perform an unconventional dynamic user clustering based on low-cost k-means clustering, Later, a resource allocation design problem which aim to maximize the overall rate-throughput of the system under the joint optimization of power allocation, decoding order, and beamforming (including active receive beamforming at the BS and passive beamforming at each RIS) under the consideration of discrete phase-shift models and spatial-correlated phase-shift matrices at each RIS.
3) To solve the formulated resource allocation design problem for the considered STAR-RIS aided RSMA design, we decouple the transformed problem into various sub-problems of power allocation, decoding order and beamforming design separately. The sub-problems of power allocation and beamforming design are solved independently using general convex approximation and fractional programming approaches and later, a unified solution based on alternating optimization algorithm is presented. Further, we select a SIC decoding order strategy for considered UL RSMA which ensure optimal user-fairness index. 4) Deep theoretical insights and extensive numerical simulations are provided based on stochastic geometry of the considered model under correlated and uncorrelated mmWave channels. The simulation results demonstrate that a) the proposed solution attains fast convergence and outperforms the considered conventional schemes in terms of overall rate-throughput and user-fairness b) the proposed usering clustering significantly improves the rate performance c) the considered RSMA system significantly outperform conventional multiple access schemes i.e., NOMA and OMA and d) the spatial correlation among each RIS elements i.e., non-diagonal phase-matrices at each RIS achieve better performance that conventional diagonal phase-matrices. Notations: Throughout the paper, the scalar, vectors, matrices and sets are represented by regular, bold lowercase, bold uppercase and scripts, respectively. ||l|| indicates the 2 norm of the vector l and |l| indicates magnitude of l. p (a) denotes the value of parameter p at a th iteration whereas {p i } indicates the accumulation of all variables p i , ∀i. p [a] denotes the a th element of vector p. n ∼ CN (μ, σ 2 ) denotes that n is circularly symmetric Gaussian random variable with μ mean and variance σ 2 . diag(l) represent diagonal matrix with diagonal element set as l. R(z) denotes real part of complex variable.

II. SYSTEM MODEL: MULTI-RIS AIDED UL RSMA SYSTEM
Let us consider a multiple RIS-aided mmWave UL RSMA system 1 as shown in Fig. 1, where M RISs are deployed to simultaneously assist the data transmission from K singleantenna UL users to a multi-antenna BS. A set of multiple RISs are deployed on the surface of the buildings to assist the severely blocked users in communication. Without loss of generality, we ignore 2 the multi-reflection signals which are 1 Note that the scope of the considered RIS aided UL RSMA is not restricted for only mm-wave system. The proposed theoretical analysis in this work can be also extended to other sub-6Ghz band communication applications as well. 2 The multi-reflection links are generally encountered in multi-RIS configuration where the signal transmitted by user is reached to BS via direct reflection as well as inter-RIS-reflection from multiple RIS. We consider that RIS are sparsely deployed at the network peripherals such that the inter-RIS links are generally weak. However, the diverse reflection paths including inter-RIS reflections available in the multi-IRS-aided system boosts extra DoFs and flexibility in system performance condition that multi-RIS are optimally placed [28]. Nevertheless, the consideration of joint resource allocation and optimal placement of RIS to maximize the overall system performance of the considered mmWave communication is out of the scope of this paper and will be considered as a future work. reflected by the RIS two or more times due to the high path loss of mmWave transmissions [19]. The BS is equipped with N r receiving antennas and each RIS has N reflecting elements.
In the considered UL RSMA system, each user splits its own message into J parts (sub-messages) and transmits them to BS simultaneously within same time and frequency slot [12]. J = {1, . . . , J} denotes the set of sub-messages for each user.

A. Channel and Signal Model
Let x k represents the transmitted signal from the k th user such that where s kj is the j th sub-message of the k th user with E |s kj | 2 = 1 and p kj is the corresponding transmit power allocated to the sub-message s kj . The total transmission signal power p k = j p kj of each user is limited upto P max , i.e., p k = j∈J p kj ≤ P max , ∀k ∈ K. The BS receives transmitted signal from each user through different communication channels i.e., RIS-aided communication (user-RIS) and direct communication channels (user-BS) as shown in Fig. 1. Specifically, each reflecting element on the RIS behaves like a single physical point which combines all the received signals and then re-scatters the combined signal with a certain phase shift and unity amplitude gain. At the BS, the multi-antenna array receives the signal from the user and performs receive beamforming. It is assumed that the channel state information (CSI) of all channels involved is perfectly known 3 at the BS and the IRS under the consideration of 3 The results in this paper serve as theoretical performance upper bounds for the considered UL mmWave system which can provide a benchmark for the system design under imperfect CSI. Some related works on CSI estimation can be found in [54] and [55]. Nevertheless, for the multi-RIS aided system involving single and multi-reflection coupled channel coefficients, the accurate and efficient estimate of complete CSI is out of the scope and can be considered as a open research problem for future works.
Denote h b k ∈ C Nr×1 , h r km ∈ C N ×1 and H bm ∈ C N ×Nr as the mm-wave channel between k th user-BS, k th user −m th RIS and m th RIS-BS, respectively. Assuming that mmWave channels are sparsely scattered into multiple paths. The BS employs a uniform linear array (ULA) with N r antennas, and each RIS consists of a uniform rectangular array (URA). The channel gain between m th RIS-BS can be given using Saleh-Valenzuela (SV) channel model [60] as where L bm are the number of scattering paths for the m th RIS-BS link, a t m,l ϕ t m,l and a r m,l ϕ r m,l are the normalized transmit and receive array response vector and ϕ t km,l , ϕ r km,l and ϕ t mb,l are the angle of arrival and departure for the l th path, respectively. ζ b m,l ∼ CN 0, 10 −0.1κ0 and g mb,l ∼ CN 0, 10 −0.1κ l is the complex channel gain corresponding to l th path with κ l = a l + 10γ l log 10d + and ∼ N 0, σ 2 . Note that l = 0 corresponds to LOS path.
Proposition 1: Each RIS-BS mmWave channel can be approximated using as a rank-one matrix approximation when LOS path dominates 4 [19] such that, where M {1, . . . , M}, ζ bm is the path-loss-dependent complex gain, a tm and b tm are the normalized array response vector associated with the LOS path of the m th RIS and BS, respectively.
Proof: Besides the sparse scattering characteristics, few works [61], [62] in the literature have shown that the received power for the LOS path in mmWave communications is nearly 13dB higher than the sum of power of NLOS paths when their exist direct LOS link between BS and each RIS. Hence, the consideration of only LOS link is straightforward.
Simulations results validate that the rank-one approximation with only LOS path achieve performance nearly equivalent to the case while considering all paths into account.
phase-shift matrix of the m th RIS and the receive beamforming vector of the BS, respectively such that θ m = exp(jφ rm1 ), . . . , exp(jφ rm,N ) and φ b n is the phase shift associated with the n th passive element of the m th RIS and the n th antenna of the BS, The received signal at the BS y can be given as where

B. SIC Decoding and Rate-Throughput
The considered RSMA system model serves all the users in the same time and frequency slots and the signal for the users are decoded successively based on the ascending order of their decoding order such that the users with low decoding order (termed as strong users) are decoded first and the users with high decoding order (termed as weak users) are decoded later. After SIC, the rate throughput of strong users is not subject to IUI, instead, it depends on its power allocation and channel gain. However, the rate throughput for weak users depend more upon the IUI from other strong users and therefore it becomes very critical for low order users especially when there exists large number of users.
The BS utilizes SIC to decode all the sub-messages of the users from the received signal y. There exist total S = K × J sub-messages from K users. Assuming that decoding order of sub-messages at the BS is denoted as the set π = {π kj : k ∈ K, j ∈ J } in which first element is decoded first, second element is decoded second and so on. π kj ∈ S {1, . . . , S} denotes the decoding order of the sub-message s kj . The permutation π belong to the set Π which is set of all the possible decoding order of sub-messages. Particularly, for the sub-message s kj , the BS successfully decodes and eliminates all the sub-messages which have low decoding order than s kj and treats remaining sub-messages as the interference (other than s kj ). So, the signal to noise-ratio (SINR) in RSMA system for the sub-message s kj can be given as where ∀k ∈ K, ∀j ∈ J and Q kj is the set of all the sub-messages which have greater decoding order than s kj i.e., Remark 1: Overall, the interference in the received signal become highly dominant than desired signal which results into very poor SINR for weak users due to which the optimal system performance is not guaranteed. Moreover, it also becomes impractical to implement the single centralized RSMA model for large scale network due to high complexity of SIC design [16]. Therefore, in this paper, we investigate a dynamic user clustering algorithm of RSMA system which effectively mitigate the IUI among the users.
We adopt a hybrid multiplexing scheme such that the inter-cluster users operate in OMA fashion and intra-cluster users are served using RSMA which significantly mitigate the IUI among the users. Note that all the RISs are allowed to operate whole bandwidth and they assist every user for the UL transmission irrespective of the cluster. Importantly, we allocate dynamic time/ frequency resource to the users based on their cluster density such that any t th cluster, ϕ t , is assigned with τ t |ϕ t |/K portion of the total bandwidth allocated to the system. Hence, the cluster with high user density is assigned with higher bandwidth and low density cluster with lower bandwidth. This ensure fair bandwidth distribution among all the users.
Assuming that all the UL users into T clusters (T ≤ K) such that the size of each cluster S t is less than maximum allowable cluster size T s and moreover, the cluster set belong to the set of all the possible clusters S i.e., S t ∈ S, t ∈ T = {1, . . . , T }. After user clustering, the achievable rate for the k th user belonging to the t th cluster can be given using (5) as where Q t kj = (u, v) : π t uv > π t kj , u, k ∈ ϕ t is the set of sub-messages which contribute IUI for the j th sub-message of the k th user belonging to the t th cluster and π t kj represents its decoding order and Q t+ III. PROBLEM FORMULATION: RESOURCE ALLOCATION FOR SUM-RATE MAXIMIZATION The prime motivation of the considered multi-layer RSMA system is to maximize the spectral efficiency of all the UL users using multi-RIS aided RIS aided network. In particular, we formulate the problem the sum-rate maximization under the joint consideration of user clustering power allocation, decoding order and beamforming design (including receive beamforming at the BS and passive beamforming at each RIS). Now, the original optimization problem for resource allocation design can be formulated as : where p = {p kj } and C = {c kt } are the collections of power allocation coefficients and user clustering indicators. The constraints (C1), (C2), (C3), (C4) and (C7) are the power allocation, minimum QoS requirement, decoding order, passive beamforming, receive beamforming and user-clustering constraints, respectively. Specifically, the constraint (C1) provide the maximum UL power allocation to the users, (C2) ensure that the resource allocation should maintain minimum rate requirement for each user, (C3) and (C4) are the constant modulus beamforming constraints of for RISs' and BS antenna elements, respectively. The constraint (C5) and (C6) represent the selection of decoding order π and user clustering from the set of all possible decoding orders, Π and clustering sets, S, respectively. It can be observed that the formulated resource allocation design (P0) in (7) is a NP-hard mixed-integer non-linear programming (MINLP) problem due to the non-convex nature of its constraints and strong coupling of variables. Primarily, the decoding order and clustering setting in the constraint (C5) and (C6) is integer programming.
Proposition 2: The problem (P0) in (7) is non-deterministic polynomial-(NP-) hard even when the decoding order and user clustering is optimally or sub-optimally designed.
Proof: Firstly, the constraint (C2) is neither convex nor concave with respect to the optimization variables and other constraints for beamforming design (C4) and (C5) are unity modulus, highly coupled or binary expressions. Consequently, the formulated resource allocation problem in (7) even after decoding order and user clustering design is nearly intractable.
Remark 2: Although, the exhaustive search may solve it, however the implementation of generic exhaustive search algorithms are not practically feasible as its computational complexity grows exponentially over the total number of variables.
Remark 3: Therefore, it is necessary to transform (P0) into some tractable sub-problems that can be solved separately and alternatively over multiple iterations. Many research works [34], [52], [53], [57] have witnessed that the effective utilization of general convex approximations, fractional programming and alternating optimization can solve complex resource allocation design problem with ultra-low complexity. The aggregation of general convex relaxation to solve such complex resource allocation design for Multi-RIS aided UL RSMA can be considered as a meaningful and reasonable contribution.
Overall, the problem (P0) in (7) is nearly intractable and moreover, there exist no standard methods to solve such complex problems. Due to the intractability of (P0), it is necessary to transform the problem it into some tractable sub-problems that can be solved separately and alternatively over multiple iterations.

IV. PROPOSED UNIFIED SOLUTION
In this section, we provide an efficient and low complex sub-optimal solution framework to solve the MINLP problem (P0) in (7). We first perform user clustering to effectively mitigate IUI and then decouple the transformed problem into sub-problems of decoding order selection, power allocation and beamforming design which are solved using general convex approximation techniques. Specifically, for a given decoding order and power allocation, we design active and Algorithm 1 Dynamic User Clustering for RSMA System Randomly select t cluster centers c i and set empty cluster sets ϕ Calculate distance of the k th user from each cluster Assign users to the nearest cluster centre, i = min i {d ki } and ϕ i = ϕ i ∪ {k} 5: Recalculate the cluster center by finding centroid of the users assigned to that cluster 6: Repeat step 3-6 untill convergence 7: Check cluster size such that |φ i | ≤ T s , otherwise repeat 2-8. 9: passive beamforming matrices using alternating optimization by Riemannian manifold algorithm and sequential fractional programming. Then, for fixed decoding order and beamforming matrices, we solve power allocation problem using the difference of two convex functions method and iterative successive convex approximation. The convergence of the proposed unified solution is clarified.

A. User Clustering
Under given decoding order, power allocation and beamforming design, the problem of user clustering can be formulated as Even after decoupling the original problem into sub-problems, the user-clustering problem in (P1) is still non-convex due to integer programming constraint (C6) and thus hard to solve.
To resolve this problem, we adopt low-complex k-means clustering algorithm to divide the multiple users dynamically into the clusters. The proposed user clustering scheme is illustrated in Algorithm 1. Particularly, we exhibit k-means algorithm for varying (increasing) value of T . For each value of T , the mean square distance (MSD) of UNs from their respective sub-clusters centroid is calculated and the least value of T is selected. Defining t as the convergence parameter of MSD. Finally, T sets of clusters i.e., ϕ t , t ∈ T = {1, . . . , T } are obtained. The proposed clustering provide cluster sets which have users associated with it. Algorithm 1 describes the proposed user clustering in detail. Although, the proposed k-means algorithm does not ensure channel disparity (near-far strategy) among user, however, it primarily focus on maintaining strong effective channel gain for each user with low-complex user pairing scheme [63]. Moreover, it also relaxes the complicated user-pairing scheme design. Further, the convergence characteristics of the proposed user-clustering scheme in Algorithm 1 is complemented by the discussion of following proposition.
Proposition 3: The proposed user clustering based on k-means in Algorithm 1 always converges in a finite number of iterations but not necessarily to global optimal solution.
Proof: An equivalent convex optimization problem can be constructed for the considered k-means clustering method [64] whose sub-optimal solution can be obtained using Karush-Kunh-Tucker (KKT) conditions. Algorithm 1 converges to a sub-optimal solution of problem (P1) in a finite number of iterations [64, see Theorem 5].

B. Active and Passive Beamforming Design
Under given clustering, power allocation and decoding order, the joint problem of active and passive beamforming can be formulated as The problem (P2) is non-convex in nature due to the non-concave objective function and constant modulus constraints (C4) and (C5) under continuous phase-shift. In order to solve this problem, we propose two alternate optimization methods based on manifold optimization and sequential fractional programming which are discussed as below: 1) Alternate Riemannian Manifold Optimization: The constraints (C4) and (C5) can be viewed as two different Riemannian manifolds [4], M θ and M z such that M θ = θ ∈ C NM×1 : |θ mn | = 1, m ∈ M, n ∈ N , (10) where  (12) where P is the projection function defined as for any X and x. Similarly, for z, Riemannian gradient g z can be determined as After obtaining the Riemannian gradient g θ and g z , we exploit the alternate optimization approach based on conjugate gradient to tackle the manifold optimization problem. Particularly, we employ alternate conjugate gradient algorithm which alternatively optimize θ and z to obtain local maximizer for the objective function f z,θ . The conjugate gradient method finds the local optimal point for θ and z on the respective manifold iteratively. In order to restrict the search on the manifold, we apply the concept of the retraction [65] with given search direction as θ (a+1) = Ret where α The update conjugate direction is used to search a maximum of the function w.r.t. θ and z and the update rule of the search direction for the (a + 1) th iteration can be given by

Compute
Riemannian gradient g (12) Choose Armijo backtracking line search step size α (a) z and update z (a+1) using (15) 4: Compute Riemannian gradient g denote the Riemannian gradients for θ and z, respectively.
The update points for the θ and z are given by using retraction operations at each iteration. The alternating maximization algorithm for two different manifold optimization is summarized in Algorithm 2.

Proposition 4:
The proposed beamforming design based on two different Riemannian manifold optimization discussed in Algorithm 2 converges to a stationary point.

2) Sequential Fractional Programming (SFP):
Besides RCG method, the problem (P1) can be solved using fractional programming (FP) approach [66]. Conceptually, the FP approach employs two key steps to solve the optimization problem, i.e., Lagrangian dual transform and quadratic transform. Particularly, the Lagrangian dual transform tackles the logarithmic function by introducing auxiliary variable as While, the quadratic transform converts the sum of ratios problem i Using FP approach, the objective function in the problem (P2), i.e., f z,θ can be approximately written as To solve (23), we employ iterative approach which solves and updates ζ, β, z and θ sequentially. Firstly, we obtain ζ and β at the (a + 1) th iteration as [66] ζ (a+1) kj where ψ (a) , z (a) p kj . Update θ: After dropping irrelevant constant terms, the approximated maximization problem ( P1) under the optimal θ at the (a + 1) th iteration of FP approach can be simplified as where φ = [φ 1 , . . . , φ MN ] T and U (a) 1 and v (a) 1 are defined as However, the problem in (25) is intractable as the objective function f θ (φ) is non-convex. Importantly, f θ (φ) can be equivalently written into its convex surrogate function at the (a + 1) th iteration as where and κ φ is step size which follows Note that κ φ can be determined by using Armijo rule with 0 < ν < 0.5. So, the problem in (25) can be solved using SCA technique such that which ultimately converges to a stationary solution. Update z: Similar to θ, z 1 √ Nr exp (jω) can be updated as where ω = exp(jφ b1 ), . . . , exp(jφ bN r ) T is the phase shift vector at the BS.
Note that κ ω is set using Armijo rule satisfying the following condition: Proposition 5: The proposed beamforming design based on SFP optimization discussed in Algorithm 3 converges to a stationary point [66].
Remark 4: Discrete Phase-Shift Model at RIS: In previous discussion, we consider continuous phase-shift setting (infinite phase-shift resolution) at each RIS. However, the phase-shift may not account a continuous phase-shift model in practical scenario. To satisfy the practical aspect of setting phase-shifters at RIS, we impose finite resolution constraint for each RIS elements such that the phase-shifter can take value from the set P

C. Power Allocation
Under given beamforming matrices θ = θ, z = z and decoding order π = π, the optimization problem of sum-rate maximization in (7) can be recast as The problem in (P3) is non-convex in nature and thus, we utilize an epigraph method by introducing an auxiliary variable r [. . . , r t kj , . . . ] T such that Note that the problem ( P3) in (35) is still non-convex due to the constraint (C7). Particularly, the left side of (C7) can be written as difference of convex function (or approximate Algorithm 4 Successive Convex Approximation for Power Allocation 1: Initialize: θ = θ, z = z, π = π, p (0) 3: Solve (38) and calculate the sum rate where ξ kj,t = log 2 we adopt SCA method in order to convert the non-convexity of (C7) into a convex one by iteratively approximates ξ kj,t into its equivalent convex form [68] using given power allocation from the previous iteration. Now, ξ kj,t can be approximated at the (b + 1) th iteration using first-order Taylor's series expansion as and (C7) can be approximately written as Ultimately, the problem in (35) can be written into an approximate convex form as With given initial power allocation values {p kj,t }, the problem (38) is iteratively solved for maximum B max iterations or until convergence. Let p denotes the convergence factor. The proposed SCA method for achieving the optimal power allocation under given θ = θ, z = z and π = π is summarized in Algorithm 4.
Proposition 6: Given an initial feasible solution, the proposed SCA based power allocation method in Algorithm 4 converges to a stationary point satisfying the KKT conditions of original power allocation problem (P3).
Proof: Let p u (b) kj,t be the power allocation of the j th submessage of k th user which is obtained after the execution of b th SCA iteration which are utilized in next i.e., (b + 1) th iteration. It can be shown thatξ kj,t . This implies that the total sum-rate function R k monotonically increases as a increases. Due to the power constraints, the R k is bounded proposed and hence the Algorithm 4 converges. Algorithm 5 Block Coordinate Descent (BCD) for Unified Solution 1: Input: p = p, θ = θ, z = z and C max 2: Initialize: p (0) = p, θ (0) = θ, z (0) = z, c = 0, J = 0,R f = 0 and R (−1) = 0 3: Perform user clustering using Algorithm 1 4: for π = π 1 : π S f do 5: while Using p (c) , design beamforming matrices using Algorithm 2 or Algorithm 3 7: Using θ (c) and z (c) , allocate power using Algorithm 4 and calculate overall rate-throughputR

D. Decoding Order and Unified Solution
Algorithm 5 summarizes the proposed unified solution. In particular, the sub-problems of power allocation and beamforming design are solved alternatively in an iterative manner for all possible decoding order. With given decoding order and initial feasible power allocation, we first design beamforming matrices either using Algorithm 2 or Algorithm 3. Later, the obtained beamforming matrices is utilized for the power allocation in Algorithm 4. In next iteration, the obtained power value in the previous iteration is utilize for beamforming design and this process continues until a convergence ( c ) is achieved. The output of each optimization process in current iteration serves an input to the next iteration. Importantly, the optimal decoding order in UL RSMA ensures high user fairness in its capacity region [17]. Particularly, we consider Jain's fairnes index as performance indicator for user fariness for all possible decoding order. The user fairness i.e., Jain's fairnes index J th i for π i decoding order is given as k is the rate of k th for i th decoding order. Note that the rater (i) k is rate obtained after executing unified solution of power allocation and beamforming design. The decoding order which provide high user fairness J is selected.
Proposition 7: The proposed unified solution in Algorithm 5 for resource allocation design involving user clustering, power allocation and receive and passive beamforming design at BS and RIS, respectively is guaranteed to converge to local optimal stationary point.
Proof: Since the feasible solution set of (P0) is compact and its objective value is non-decreasing over iterations via solving the sub-problems in (P1), (P2) and (P3), iteratively, the solution of the proposed unified solution in Algorithm 5 is guaranteed to converge [69]. Since we adopted k-means, SCA and SFP techniques to solve the unified resource allocation design using alternating optimization algorithm, the obtained solution may be sub-optimal [69], [70], [71] for the original problem (P0).
The optimality and convergence of the proposed unified method is further complemented in simulation section through its performance comparison with optimal brute-force search algorithm.

E. Computational Complexity
Now, we describe the overall computational complexity analysis of the proposed unified solution in detail. The SCA based power allocation problem in (38) has 2L variables and 4L + K constraints where S = KJ. With consideration of worst case scenario, let the SCA algorithm converges at maximum iterations B max . So, the worst case computational complexity of SCA based power allocation algorithm can be given as O B max (KJ + K) 2 (4KJ + K) . Further, the RCG and SFP algorithms for beamforming design in Algorithm 2 and Algorithm 3 entail worst-case com- (for z) and A max 2 are their maximum iterations for convergence, respectively [66]. Note that SFP based beamforming design has slightly higher computational complexity than RCG, however, it is shown in the next section that the proposed solution with SFP based beamforming design provides better system performance as compared to the RCG algorithm. Let, the unified solution converges in maximum C max iterations then the overall complexity of the proposed solution with RCG based and SFP based beamforming can be given as respectively. We observe that the overall complexity mostly depends on the number of users, the sub-messages, number of RISs and number of RIS elements. The another important parameter affecting complexity of proposed solution is the convergence rate of the SCA, RCG (or SFP) and the proposed BCD algorithm. The fast convergence rate of the proposed algorithm is validated shortly in the simulation section.

V. EXTENSION TO NON-DIAGONAL PHASE-MATRICES AT RIS
Here, we provide a tractable asymptotic performance analysis for the considered Multi-RIS aided UL RSMA system under mmWave communication based on stochastic geometry of channels while accounting for the spatial correlation among the each RIS element i.e., for non-diagonal phase-matrices at RIS. For a rectangular phase-shift array with N = N v N h elements at each RIS, the spatial correlation at each RIS under isotropic Rayleigh fading can be given as where R [n,n ] = r n,n = sinc (2u n,n /λ), (42) u n,n is the distance between n th and (n ) th elements of the RIS, d h × d h is the dimension of each RISs' element and N v and N h are the number of elements per column and per row, respectively [33].
Let h b k ∼ CN 0, ζ 2 b k I Nr and h r k,m ∼ CN 0, ζ 2 r k,m I N are the complex channel gain associated with k th user-BS and k th user-m th RIS link, respectively where ζ b k and ζ r k,m are the corresponding distance dependent path loss model. We utilize the rank-1 approximation of RIS-BS mmWave where the normalized array response vector associated with the LOS path are considered as the complex Gaussian, a tm ∼  CN (0, I N ) , a rm ∼ CN (0, I Nr ).
Before discussing further on spatial correlation, we derive the closed-form expression for the overall rate-throughput for the multi-RIS aided UL RSMA as Lemma 1: The closed-form expression for overall rate-throughput of the UL RSMA system irrespective of any decoding order scheme under can be given as Proof: See Appendix A.
as the effective SINR of the users corresponding to the t th cluster in the given network. It can be shown that the effective SINR for each cluster depend upon the phase-shift setting at each RIS.
Theorem 1: The average effective SINR at the BS for the considered multi-RIS aided UL RSMA can be upper bounded as where and [n,n ] are the factors associated with spatial correlation among passive elements at the m th RIS.
Proof: See Appendix B.
From Theorem 1, we observe that the factors ρ m and ρ m associated with the phase-matrices at each RIS considerably affects the asymptotic performance of the considered system model. In particular, these factors can be considered as the correlation factors at each RIS. For uncorrelated phase-shift matrices i.e.,diagonal phase-matrices, we obtain that ρ m = N and ρ m = N 2 .
Remark 5: Primarily, the appropriate selection of phase-shifters for non-diagonal phase-matrices Θ m n,n = θ m n,n , ∀m, ∀n∀n can provide that ρ m ≥ N and ρ m ≥ N 2 . Besides, the inappropriate selection or any random selection of non-diagonal phase-matrices may lead to performance even worse than diagonal phase-shifters.
Hence, the consideration of spatial correlation at each RIS can provide better performance when compared with uncorrelated RIS phase-matrices as validated in [31]. The correlated phase-matrices at RISs corresponds to non-diagonal phasematrices at RISs. Therefore, we focus on the optimal (or sub-optimal) design of the non-diagonal phase-matrices to ameliorate the rate performance for the considered Multi-layer UL RSMA system. Using (43), we formulate the passive beamforming design problem for the case of non-diagonal phase-matrices at the RIS under given active receive beamforming and power allocation design as Proposition 8: The optimization problem (P4) in (46) can be equivalent transformed as Proof: The objective function in the problem (46) are the logarithmic functions of optimization variables which monotonically increases with {Θ m }, the corresponding objective function of the form max {Θm} f Θ an be equivalently written as max {Θm} f Θ .
The passive beamforming problem ( P4) is still non-convex due to its non-concave objective function. However, we can construct a surrogate function for the objective function f Θ and solve the optimization problem in (47). At a th iteration, the objective function f Θ can be upper bounded using its Θ as (48), shown at the bottom of the page, where p kj = p kj /(τ t σ 2 ). Now, the optimization problem in (47) can be solved iteratively using SCA as Since the objective function is now affine in and the constraint (C8) is convex, the problem in (49) is convex in nature and can be solved untill convergence. Proposition 9: Given an initial feasible point Θ, the proposed SCA based passive beamforming design for nondiagonal phase-matrices converges to a stationary point. The number of phase-shifters at each RIS and BS are set to N = 60 and N r = 15, respectively. The signal from each user is split into two sub-messages, i.e., J = 2. The convergence factors θ , p and c are set as 10 −4 and the maximum number of allowed iterations for SCA and BCD algorithms are set to 10. The UL power budget is set as p max = 100 mW. The beamforming matrices are initially selected as random value and the initial transmit power is set as p kj = p max /J, ∀j, ∀k. The mmWave channel parameters are set as L bm = 5, a 0 = 8.4, γ 0 = 2.2, a 1 = 40, γ 1 = 3.1, σ 2 = 3.6. We refer our proposed unified solution in the Algorithm 5 with random beamforming matrices, RCG and SFP based beamforming design as RIS-RSMA-RND, RIS-RSMA-RCG and RIS-RSMA-SFP, respectively. For performance benchmark, we solve the equivalent sum-rate maximization problem with RCG and SPF based beamforming for counterpart NOMA and OMA schemes scheme which are referred as RIS-NOMA and RIS-OMA, respectively. Moreover, we examine the performance of proposed algorithms without user clustering which are referred as RIS-RSMA-RCG-WC and RIS-RSMA-SFP-WC. Further, we also analyze the performance behavior of the condensed RIS-RSMA system under spatial correlation and discrete phase-shift design at each RIS.

A. Convergence and Optimality Analysis
Firstly, we examine the convergence and optimality of the proposed unified solution, power allocation and beamforming matrices design algorithms w.r.t. the number of iterations. Fig. 2a validates the Proposition 4 and Proposition 5 of the proposed beamforming design algorithm for RCG based Algorithm 2 and SFP based Algorithm 3. We consider objective function f z,θ in (P2) to evaluate convergence characteristics. Fig. 2a shows that the RCG algorithm converges quickly as compared to the SFP algorithm, while, the SFP algorithm provide better performance than RCG. In SFP algorithm, the active receive and passive beamforming matrix are updated sequentially in the current iteration, while Algorithm 2 first updates passive beamforming matrix under fixed receive beamforming matrix and later, active receive beamforming is updated under fixed receive beamforming matrix. Importantly, Algorithm 2 solves the beamforming design in block manner and Algorithm 3 execute the beamforming design in sequentially manner. Apparently, Algorithm 3 converges slowly but attains better performance than Algorithm 2. However, the computational complexity of the RCG algorithm is lower than the SFP algorithm as discussed in the Section IV-E.
Next, we examine the convergence analysis of the SCA based power allocation algorithm and the unified solution i.e., Algorithm 4 and Algorithm 5 w.r.t. varying number of iterations as shown in Fig. 2b and Fig. 2c, respectively. The proposed power allocation algorithm converges within 5 to 6 iterations which conclude the convergence of the Algorithm 4 as mentioned in Proposition 6 and Proposition 7. Moreover, the unified solution with SFP and RCG based beamforming design, discussed in Algorithm 5, attains fast convergence as it converges within 3 to 4 iterations. Simulation results also reveals that the proposed unified solution gain sub-optimal performance when compared to the computationally complex exhaustive search method. 5 Nevertheless, the exhaustive search operations are not practically feasible as its computational complexity grows exponentially over the total number of variables. Hence, the proposed low-complex unified solution based on alternating optimization is desirable to solve the complex resource allocation design for the considered system effectively. Fig. 3 and Fig. 4 show the performance behaviour of the proposed RSMA system for different number of RIS elements and number of receive antennas, respectively. For both the scenarios, we fix alternate parameter. It is quite obvious that as the number of phase shifter increases, the effective channel gain between the users and the BS is improved and hence the achievable sum-rate increases. In other words, the extra phase shifters can reflect more power of the signal received from the UL users to the BS which leads to increased power gain. Moreover, it provides higher flexibility in resource allocation and stronger beamforming gain which improve the sum-rate  throughput. Overall, the optimal phase determination achieves better sum-rate performance as phase-shifters increases when compared to the random phase shift.

C. Impact of Maximum Transmit Power and Number of Sub-Messages
Fig . 5 shows the performance behaviour of the proposed RSMA solution with SFP based beamforming design in contrast with the equivalent NOMA and OMA schemes. It is quite obvious that increasing in maximum transmission power for each user increases the rate-throughput. However, RSMA system achieves better performance as compared to equivalent NOMA and OMA schemes due to the better resource allocation and power distribution to various sub-messages due to two main aspects. The multiple sub-messages in UL RSMA from each user under effective transmit power utilization among sub-messages by maintaining flexible inter-message interference with the optimal SIC and decoding order. However, in UL NOMA system [48], the transmit power is allocated to the single-message of the user such that the IUI in the desired UL signal communication is minimum (or optimal) with minimum power utilization. 6 Besides, the proposed user clustering 6 Nevertheless, the NOMA with time sharing gain rate capacity region equivalent to RSMA for multiple access Gaussian channel at the cost of increased resource utilization or increased computational complexity [12]. Particularly, the NOMA with time-sharing performs all possible user-pairing schemes and perform NOMA communication in different time slots which gain higher user-fairness than conventional NOMA schemes. In order to provide fair comparison in terms of resource utilization, we provide comparative performance analysis of the proposed RSMA system with respect to NOMA without time-sharing only throughout this paper. at the cost of increased time resource utilization high implementation complexity and time-synchronization issue [16]. scheme effectively mitigate the IUI by grouping and hence the proposed solution (RIS-RSMA-SFP) gain better performance when compared to without clustering (RIS-RSMA-SFP-WC). Moreover, the proposed RIS aided RSMA system outperforms the system without RIS.
We also analyze the performance behaviour of the proposed solution under consideration of varying decoding order such as ascending/descending order of effective channel gain, i.e., RIS-RSMA-SFP (ASC) and RIS-RSMA-SFP (DSC) and exhaustive search RIS-RSMA-SFP (EXH) with only 4 users and single cluster as shown in Fig. 6. RSMA effectively allocate the transmit power to various sub-messages which achieves maximal achievable rate region i.e., capacity region and it increases with increase in sub-messages [12] as shown in Fig. 6. However, the increase in sub-messages increases the challenges for SIC design at receiver [16]. Besides, the decoding order with descending order of effective channel gain provides better performance (equivalent to exhaustive search) as compared to ascending order as illustrated in Fig. 6. The ascending decoding order of users based on channel gain provides the optimal performance when the user channels are sufficiently aligned and there exit a large disparity in their channel strengths. While, the descending order of users based on channel gain achieves better overall system performance especially when there does not exits disparity among the channel gains [48]. Generally, in RIS aided system, the channels are strong and there does not exist disparity among channel gain of users due to multiple RIS reflectors.

D. Impact of Number of Users and Cluster Size
We illustrate the effect of the proposed user clustering on the performance behaviour of the proposed RSMA solution in comparison with equivalent NOMA and OMA schemes. Fig. 7 shows the effect of variation on the number of UL users in the network. As it can be seen that with increase in the number of UL users, the overall sum-rate throughput of the system increases. The power allocation of UL users are constrained individually, moreover, with increase in user-density each user operates considerably with high power to mitigate the IUI. Therefore, the overall rate for UL transmission increases with increase in UL users. Besides, we also compare the performance of the proposed solution without user clustering (RIS-NOMA-SFP-WC). The user clustering plays vital role  in mitigating IUI when user density is considerably high as shown in Fig. 7. Importantly, the proposed algorithms with user clustering based on k-means algorithm outperform without clustering case. Note that user clustering based on k-means algorithm is easily scalable to large scale networks. Fig. 8 depicts the rate performance of the proposed k-means user clustering with respect to the maximum allowed clusterdensity T s i.e., number of users per clusters. We fix the resource bandwidth for given total number of users and vary the maximum number of users allowed in each clusters. Fig. 8 shows that the rate performance first increases with increases in T s and then decreases beyond particular value (T s = 3). It is due to fact that low value of τ t and high value of IUI respectively dominates for low value and high value of T s . The increase in cluster-density increases both IUI as well as the bandwidth resource allocation factor τ t . Primarily, there always exits a trade-off between IUI and resource allocation factor in hybrid orthogonal and non-orthogonal schemes and the heuristic selection of the cluster-density can captivate better system performance gain [63], [72]. For our considered hybrid RSMA-OMA system, T s = 3 gain better performance when compared to other values. Besides, the proposed k-means based clustering significantly outperforms the near-far user clustering scheme (RIS-RSMA-SFP-NF). It is due to fact that the effective rate-splitting in RSMA with optimal beamforming design at RIS captivate high performance gain with smart interference management even with higher number users and low channel disparity. Moreover, for without clustering, the performance of UL RSMA suffers from a significant loss in the overall rate-throughput. Overall, the proposed user clustering relaxes the sophisticated selection of user-pair for the required high channel-disparity in near-far user-clustering schemes. Fig. 9 illustrate the impact of noise power on the performance of both RSMA and NOMA depends. Intuitively, the system performance of both RSMA and NOMA degrades with an increase in noise power. However, the performance of RSMA degrades much faster than NOMA with an increase in imperfection. It is due to the fact that the high noise power reduces the SINR for both the sub-messages. This further increases the channel noise power over inter-node interference in RSMA and NOMA schemes. In a high noise environment, the performance of RSMA is approximately closer to the NOMA as the channel noise power dominates the performance as illustrated in Fig. 9.

E. Impact of QoS Requirement and Noise Power
We also analyze the system performance of the considered RSMA system w.r.t. varying minimum QoS requirement as shown in Fig. 10. For UL RSMA (or NOMA) system, the decoding of any sub-message s kj (or message s k ) depends on the successful SIC of its previously decoded sub-messages (messages) such that the sub-messages which does not meet QoS constraint then that sub-message s kj and other subsequent sub-messages after s kj cannot be decoded. In other words, if γ kj < γ min then γ uv = {0 : π uv ≥ π kj }. Particularly, the increases in rate requirement decreases the probability of successful decoding of all the sub-messages which results in rate-performance degradation. However, the performance of RSMA still outperform NOMA due to rate-splitting and effective interference management. Fig. 9 and Fig. 10 validate that the performance of multi-layer RSMA is closer to that of NOMA. In other words, the performance of RSMA is trivial when compared to NOMA at high noise power and for low QoS requirement (r k ≤ 0.1 bps/Hz). Moreover, NOMA inherit low computational cost and implementation cost and hence, it may be preferred over RSMA in such special cases.

F. Impact of Discrete Phase-Shift and Spatial Correlated Phase-Matrices at RISs
Finally, we discuss the effect of the spatial correlation at each RIS on the performance behaviour of the considered  multi-RIS aided RSMA system. Here, we also consider the impact of discrete phase-shift (Disc. PS) at RIS and Rank-1 approximation in Proposition 1 for mmWave-channel on the system performance and perform comparative analysis with continuous phase-shift (Cont. PS) and conventional mmWavechannel (without Rank-1 approximation ). Fig. 11 shows that the exploitation of spatial-correlation characteristics at RISs i.e. non-diagonal phase-matrices can significantly improves the system performance as compared to diagonal phase-matrices only when the phase-matrices are designed optimally (or sub-optimally). This validates Theorem 1. The extra-phaseshifters corresponding to non-diagonal elements provide extra channel gain-diversity [31], and thus boost the overall system performance of the considered RSMA system in terms of ratemetric. Moreover, the rank-one approximation which neglect NLOS paths and consider only LOS path has very little impact. Fig. 11 also validate that the discrete phase-shift model with high-resolution (b = 6) achieve performance near to the ideal continuous phase-shift case as it considerably reduces the performance loss due to discretization. Importantly, the low-resolution setting of discrete phase-shifters at RIS can lead to worst-performance especially when sub-optimal optimization techniques are utilized. Although the high resolution setting incurs extra hardware overhead and implementation bottleneck in practical systems, the high resolution for discrete phase-shift are more advantageous over low-resolution model in terms of system performance.

VII. CONCLUSION
In this paper, the problem of sum-rate maximization for an RISs-aided mmWave UL RSMA system was investigated. To effectively reduce IUI among users, we performed dynamic user clustering and later solved the joint optimization problem by decoupling it into different sub-problems under given decoding order. Particularly, the sub-problem of power allocation was solved using successive convex approximation and beamforming design was solved using RCG and SFP algorithms. Later, the unified solution based on BCD algorithm was proposed. Simulation results validated that RSMA system achieved sum-rate throughput gain approximately upto 40% and 80% higher than conventional NOMA and OMA schemes, respectively. The descending order of users based on channel conditions when compared to ascending order attained better rate-throughput due to less channel gain disparity among the users. Moreover, the proposed k-means user clustering approach effectively mitigated the IUI and improved the sum-rate throughput by 50%. Besides, the spatial correlation among each RIS elements i.e., sub-optimal setting of nondiagonal phase-matrices at each RIS achieved better performance as compared to conventional diagonal phase-matrices. where S = KJ, g s and p s corresponds to channel gain and the power allocation of j th sub-message of k th user. The expression in (A.1) can be rewritten using telescoping product [48] as Putting (kj) back instead of s proves (43).

APPENDIX B PROOF OF THEOREM 1
The effective channel gain between BS and k th user can be upper bounded as Θ m [n,n ] . Note that we omit complete proof due to brevity of space. Combining (B.6) and (B.7) gives (44). Hence proved.