Per-link Parallel and Distributed Hybrid Beamforming for Multi-Cell Massive MIMO Millimeter Wave Full Duplex

This article presents two novel hybrid beamforming (HYBF) designs for a multi-cell massive multiple-input-multiple-output (mMIMO) millimeter wave (mmWave) full duplex (FD) system under limited dynamic range (LDR). Firstly, we present a novel centralized (C-HYBF) scheme based on alternating optimization. However, C-HYBF presents many drawbacks such as high computational complexity, massive communication overhead to transfer complete channel state information (CSI) to the central node every channel coherence time (CCT), and requirement of expensive computational resources. To overcome these drawbacks, we present a very low complexity, per-link parallel and distributed HYBF (P & D-HYBF) scheme based on cooperation. Due to per-link decomposition, it enables each FD base station (BS) to solve its local sub-problems independently and in parallel on multiple processors, which leads to signiﬁcant reduction in the execution time. It requires that each FD BS cooperates by exchanging information about the beamformers with the neighbouring BSs which allow each FD BS to adapt its beamformers correctly, and consequently, P & D-HYBF exhibits negligible performance loss compared to C-HYBF. Moreover, its complexity scales only linearly with the network size and density, making it highly scalable. Simulation results show that both designs achieve similar performance and outperform the fully digital half duplex (HD) system with only a few radio-frequency (RF) chains.

are assumed to be equipped with non-ideal hardware, which is modelled by incorporating the LDR noise model [20], [28], leading to an impairment aware HYBF approach.
In general, centralized systems are superior to distributed systems and can serve as a benchmark to illustrate the performance loss due to distributed implementation. Therefore, we first present a novel C-HYBF scheme based on the minorization-maximization (MM) optimization technique by extending our work [20]. We then introduce P&D-HYBF for mmWave based on cooperation. Our design decomposes the multi-cell WSR maximization problem into per-link independent subproblems, which eliminates the problem of transferring full CSI to the central node every CCT. Due to the per-link decomposition, FD BSs can benefit from a multiprocessor capability and optimize many variables simultaneously. Being a cooperative design, P&D-HYBF requires information exchange about the beamformers and that each FD BS has access only to its local CSI. It is to be noted that per-link independent optimization for HYBF is non-trivial because the analog beamformers and analog combiners are common between the DL and UL users in the same cell, respectively. Moreover, the beamformers of the DL users are subject to a coupled total sum-power constraint imposed at the FD BSs, which is also affected by the analog beamformers. Our presented P&D-HYBF framework shows how to handle these coupling constraints in general and, therefore, can be adopted for future research on P&D-HYBF for both the FD and HD systems. Computational analysis shows that the complexity of C-HYBF and P&D-HYBF scales quadratically and only linearly as a function of both the network size and density, respectively, making the latter highly scalable and enabling the deployment of low-cost computational processors in each FD cell.
Simulation results show that P&D-HYBF scheme does not exhibit performance loss due to distributed implementation. Both designs achieve similar performance gains and outperform the conventional fully digital HD system with only a few RF chains and with very low phaseresolution at the analog stage of the FD BSs. The advantage of independent low-complexity computations for P&D-HYBF on different computational processors is investigated and results show that P&D-HYBF requires significantly less execution time compared to C-HYBF.
Paper Organization: The rest of the paper is organized as follows. First, we present the system model and problem formulation for the considered system in Section II. Then the MM optimization method and C-HYBF design is presented in Sections III and IV, respectively. Section V presents the P&D-HYBF design based on cooperation. Finally, Sections VI and VII present the simulation results and conclusions, respectively.
Mathematical Notations: Boldface lower and upper case characters denote vectors and matrices, respectively. E{·}, Tr{·}, (·) , (·) , ⊗, I, and D denote expectation, trace, conjugate transpose, transpose, kronecker product, identity matrix and the dominant generalized eigenvectors selection matrix, respectively. Vector of zeros of size is denoted as 0 ×1 , vec(X) stacks the column of X into x, unvec(x) reshapes x into X, and ∠X returns the phasors of matrix X. Cov(·) and diag(·) denote the covariance and diagonal matrices, respectively, and (X) returns the singular value decomposition (SVD) of X. Element of matirx X at the m-th row and n-th column is denoted as X( , ). BS ∈ B is assumed to have and transmit and receive RF chains, respectively, and and transmit and receive antennas, respectively. We denote with V ∈ C × and U ∈ C × the digital beamformers for the white unitary variance data streams s ∈ C ×1 and s ∈ C ×1 transmitted for DL user ∈ D and from UL user ∈ U , respectively.
The users and the FD BSs are assumed to be suffering from the LDR noise due to nonideal hardware. It is denoted as c and e for the UL user ∈ U and DL user ∈ D , respectively, modelled as [28] where 1, 1, = Cov(r ) and r denotes the undistorted received signal for with 1, 1, = Cov(r ) and r denotes the undistorted received signal by FD BS ∈ B after the analog combiner F . The thermal noise for FD BS ∈ B and DL user ∈ D is denoted as n and n with variances 2 and 2 , respectively, and modelled as n ∼ CN (0 ×1 , 2 I), n ∼ CN (0 ×1 , 2 I). (3)

A. Channel Modelling
We assume perfect CSI and let H ∈ C × and H ∈ C × denote the direct channels responses between the DL user ∈ D and UL user ∈ U , respectively, and their serving FD BS ∈ B. Let H , ∈ C × and H , ∈ C × denote the in-cell UL CI channel response between the DL user ∈ D and UL user ∈ U and the out-cell UL CI channel response between the DL user ∈ D and UL user ∈ U , respectively, with ≠ . Let H , ∈ C × and H , ∈ C × denote the interference channels responses from FD BS ∈ B to DL user ∈ D and from UL user ∈ U to FD BS , respectively, with ≠ .
Let H , ∈ C × and H , ∈ C × denote the DL CI channel response from FD BS ∈ B to FD BS ∈ B, with ≠ , and the SI channel response for FD BS ∈ B, respectively.
In mmWave, channel response H can be modelled as [29] The matrices H and H denote the line-of-sight (LoS) and reflected components channel response of the SI channel, respectively. The scalars , , , and denote the Rician factor, the power normalization constant to assure E(||H ( , )|| 2 ) = [9], the distance between -th receive and -th transmit antenna and the wavelength, respectively. Note that also the channel matrix H for the reflected components can also be modelled as (4).

B. Problem Formulation
Let y and y denote the signals received by the DL user ∈ D and by the FD BS ∈ B from UL user ∈ U after the analog combiner F , respectively, which can be written as Let , and denote the indices in the sets U , D and B without the elements , and , respectively. Let T U U and Q G V V G denote the transmit covariance matrices of UL user ∈ U and of FD BS ∈ B intended for its DL user ∈ D , respectively. Let (R ) R and (R ) R denote the (signal plus) interference plus noise covariance matrices received by the FD BS ∈ B from UL user ∈ U and by the DL user ∈ D , respectively.
The matrices R and R can be written as follows and R and R can be obtained as R = R −H Q H and R = R −F H T H F , respectively. 8 The WSR maximization problem for HYBF in a multi-cell mMIMO mmWave FD system with DL and UL multi-antenna users ∀ ∈ B, under the joint sum-power, unit-modulus and discrete phase-shifters constraints can be stated as The scalars and denote rate weights for UL user ∈ U and DL user ∈ D , respectively, and the scalars and denote the sum-power constraint for UL user ∈ U and FD BS ∈ B, respectively. The collections of digital beamformers in UL and DL are denoted as U and V, respectively, and the collections of analog beamformers and combiners are denoted as G and F , respectively.

III. MINORIZATION-MAXIMIZATION
Problem (9) is non-concave in the transmit covariance matrices T and Q due to the interference terms and finding its global optimum is very challenging. To find its sub-optimal solution based on alternating optimization, we leverage the minorization-maximization (MM) method [30], which allows to reformulate (9) with its minorizer using the difference-of-convex (DC) programming [30].
Let WR and WR denote the weighted rate (WR) of users ∈ U and ∈ D , respectively, and let WSR and WSR denote the WSR of users outside the cell in UL and DL, respectively. The dependence of the global WSR in (9) on the aforementioned terms can be highlighted as Note that in ( concave in Q and WSR , WSR ,WSR ,WSR are non concave in Q . As a linear function is simultaneously convex and concave, DC programming introduces the first order Taylor which allow to write the following minorizers WSR , WSR , WSR and WSR with respect to T . Similarly, for the transmit covariance matrix Q , we have the gradientŝ which allow to write the minorizers WSR , WSR , WSR and WSR with respect to Q . The gradients (11) and (12) can be computed by applying the matrix differentiation properties and the result Lemma 3 [20], and they are reported in Table I. We remark that the tangent expressions constitute a touching lower bound for the original WSR cost function. Hence, the DC programming approach is also a MM approach, regardless of the restatement of the transmit covariance matrices T and Q as a function of the beamformers.
Let and denote the Lagrange multipliers associated with the sum-power constraint for UL user ∈ U and FD BS ∈ B, respectively. Hereafter, for notational convenience, we define the following matrices By considering the minorized WSR constructed with the gradients (11)- (12), ignoring the constant terms and the unit-modulus and quantization constraints (9d), and augmenting the minorized WSR only with the sum-power constraints leads to the following Lagrangian We note that the constraints on the analog part, ommittied in (14), will be incorporated later.

IV. CENTRALIZED HYBRID BEAMFORMING
This section presents a novel C-HYBF design based on alternating optimization to solve (14) to a local optimum. Hereafter, different sub-sections are dedicated for the optimization of different variables and at each step complete information of the other variables is summarized in the gradients, which are updated at each iteration.

A. Digital Beamforming
To optimize the digital beamformers U and V we take the derivatives of (14) with respect to their conjugates, which yield the following Karush-Kuhn-Tucker (KKT) conditions Theorem 1. The WSR maximizing digital beamformers U and V can be computed as the generalized dominant eigenvector solution of the pair of the following matrices where the matrix D (D ) selects ( ) generalized dominant eigenvectors.
Proof. The proof is straightforward by extending the result proved in Theorem 2 [20] for a single-cell mmWave mMIMO FD case, by considering also the linearization terms with respect to the users outside the cell served by FD BS ∈ B.
After optimizing the digital beamformers U and V , we consider scaling them to unit-norm columns. Such operation preserves the optimized beamforming directions and allows to design the optimal power allocation scheme.

B. Analog Beamforming
To design the analog beamformer G for FD BS ∈ B, we assume the remaining variables to be fixed. By considering only the dependence of the WSR on the unconstrained analog beamformer G , we have to solve the following optimization problem We take its derivative with respect to the conjugate of G which leads to the following KKT Theorem 2. The vectorized unconstrained analog beamformer G which is common to all the DL users in set D , can be optimized as one generalized dominant eigenvector solution of the pair of the sum of following matrices Proof. The proof follows directly from the proof of Theorem 3 [20] for a single-cell by considering also the linearization terms with respect to the users outside the cell.
The result stated in Theorem 2 provides the optimized vectorized unconstrained analog beamformer. Operation unvec(vec(G )) is required to reshape it into correct dimensions, and to meet the unit-modulus and quantization constraints, we preserve only the phase part with the operator ∠· and pass it through the quantizer such that G = Q (∠G ( , )) ∈ P , ∀ , .

C. Analog Combining
Optimization of the analog combiner F is straightforward compared to the analog beamformer. Note that the analog combiners do not appear in the trace operators of (14) as they do not generate any interference towards other links. Therefore, to optimize F we can directly consider the original problem (9) which is purely concave with respect to F .
The objective of the analog combiner F is to combine the received covariance matrices at the antenna level such that the WSR is maximized. Let (R ) R denote the (signal plus) interference and noise covariance matrices received at the antennas of the FD BS ∈ B to be combined with F . Given R and R , the matrices R and R can be recovered as (9), with respect to the unconstrained analog combiner F , by using the properties of the logarithm function can be restated as Solving (20), which is purely concave, leads to the following optimal analog combiner where the matrix D selects dominant generalized eigenvectors equal to the number of receive RF chains at the FD BS ∈ B. To meet the constraints for (21), we normalize its amplitudes with the operator ∠· and pass it through the quantizer such that F = Q (∠F ( , )) ∈ P .

D. Optimal Power Allocation
Let P and P denote the stream power matrices for the UL user ∈ U and DL user ∈ D , respectively. Given the normalized digital beamformers with the unit-norm columns (16), the power allocation problems for P and P can be formally stated as and solving them leads to the following optimal power allocation scheme where (X) + = {0, X}. Given the optimal stream powers, we can search for the Lagrange multipliers satisfying the total sum-power constraint. Let P and P denote the collection of powers in DL and UL, respectively, and let and denote the collection of multipliers for and , respectively. Given (23), consider the dependence of the Lagrangian only on the multipliers and powers as L ( , , P , P ), obtained by including the power matrices P and P in (14).
The multipliers in and should be such that the Lagrangian is finite and the values of multipliers are strictly positive, i.e., min , max P ,P L ( , , P , P ), The dual function is the pointwise supremum of a family of functions of , , it is convex [31] and the globally optimal values for and can be found by using any of the numerous convex-optimization techniques. In this work, the Bisection method is adopted. Let By using the closed form expressions derived above, the complete alternating optimization based C-HYBF procedure to solve (9) is formally stated in Algorithm 1.

E. Convergence of C-HYBF
The convergence of Algorithm 1 can be proved by using the minorization theory [30], alternating or cyclic optimization [30], Lagrange dual function [31], saddle-point interpretation [31] and KKT conditions [31]. For the WSR cost function (9), we construct its minorizer, which is a touching lower bound for (9), hence we can write

Algorithm 1 Centralized Hybrid Beamforming
Given: The CSI and rate weights.
Compute G with (19), do unvec(G ) and get ∠G for: = 1 : Compute P with (23), do SVD, set P = D and finally set T = U P U if constraint for is violated The minorized WSR, which is concave in T and Q , has the same gradient of the original WSR maximization problem (9), hence the KKT conditions are not affected. Reparameterizing T or Q in terms of G , V , ∀ ∈ D , or U , ∀ ∈ U , respectively, augementing the minorized WSR cost function with the Lagrange multipliers and power constraints leads to (14).
By incorporating further the power matrices we get to L ( , , P , P ). Every alternating update of the L for the variables G , F , ∀ ∈ B, V , ∀ ∈ D , U , ∀ ∈ U , P , P , and , leads to a monotonic increase of the WSR, which assures convergence. For the KKT conditions, at the convergence point, the gradients of L for V , G , U or P , P correspond to the gradients of the Lagrangian of the original problem (9), and hence the sub-optimal solution for the minorized WSR matches the sub-optimal solution of the original problem. For the fixed analog and digital beamformers, L is concave in powers, hence we have strong duality for the saddle point, i.e., Let X * and * denote the optimal solution for matrix X or scalar at the convergence, respectively. As each iteration leads to a monotonic increase in the WSR and the power are updated by satisfying the sum-power constraint, at the convergence point, the solution of the optimization satisfies the KKT conditions for the powers in P and P and the complementary slackness with the individual factors in the products being non-negative. To proceed, we assume the following: 1) the FD BSs cooperate by exchanging information about the digital beamformers, analog beamformers and analog combiners via a feedback link; 2) local CSI is accessible by the FD BSs; 3) each FD BS has multiple computational processors dedicated for UL and DL; 4) the computations take place at the FD BSs in each cell, in a synchronized manner.
Recall that the MM optimization technique allowed to write the Lagrangian of the original WSR problem (9) as (14) from which it is evident that to update the beamformers for each user at each iteration, only its gradients are required. Therefore, they summarize complete information about all the remaining interfering links in the network. From a practical point-of-view, the gradients for each link take into account the interference generated towards all the other links, and hence limit greedy behaviour. However, (14) is coupled among different links as the covariance matrices of other users directly appear in the gradients, which vary at the update of each beamformer.
To decouple (14) into local per-link independent optimization sub-problems, we assume that each FD BS has some memory to save information. Hereafter, overline will emphasize that the variables are only local and saved in the memory. We introduce the following local variables For notational compactness, similar to (13), we also define the following variables now function only of the fixed local variables. By replacing the gradients with the fixed local variables, the Lagrangian (14) can be rewritten as In contrast to (14), (32) becomes fully decoupled as it is a function only of the local variables, which are fixed. However, note that in UL and DL, the optimization of the analog combiner F and analog beamformer G , ∀ ∈ B, is still coupled as they are common among the UL and DL users in the same cell, respectively. Moreover, the analog beamformer G also affects the total transmit power of each FD BS, posing a serious challenge for enabling per-link independent optimization in DL from (32). Handling of the coupling constraints and P&D optimization for HYBF from (32) is presented in the following.

A. Per-Link Independent Sub-Problems in UL
Each UL user has its own sum-power constraint but the analog combiner F , appearing in Z 1 , is common among all the UL users in the same cell. To decouple their optimization, we assume that FD BS ∈ B updates F only after updating all the digital beamformers U , ∀ ∈ U . Given this assumption and fixed local variables, UL WSR maximization problem for each FD BS reduces into three layers of sub-problems. At the bottom layer, FD BS ∈ B has to solve independent sub-problems to update U , whose optimization is fully decoupled and therefore can be done in parallel ∀ . At the middle layer, FD BS ∈ B has to independently update the stream power matrix P while searching the multiplier , ∀ . Finally, at the top layer, once the two-layer UL sub-problems are solved, only one update of the analog combiner is required. Fig. 3 highlights the idea of the proposed per-link decomposition for the UL WSR for FD BS ∈ B into three sub-layers, and the sub-problems at each layer must solved from the bottom to the top.
Due to per-link independent decomposition, the Lagrangian for the UL user ∈ U with independent sum-power constraint and fixed local variables can be written as in which for the bottom layer the analog combiner F in Z 1 and the powers are fixed. To optimize U , a derivative of (33) can be taken, which leads to a similar KKT condition as (15a), with replaced with Z , ∀ . By following a similar proof for Theorem 2 [20], it can be easily shown that WSR maximizing U for (33) can be computed as Note that (34) can be computed in parallel by the multi-processor FD BS ∈ B, ∀ ∈ U .
At the middle layer, the power optimization remains decoupled as each UL user has its own sum-power constraint and the local variables are fixed. To find the optimal P in parallel ∀ , we first consider the normalization of the columns of (34) to unit-norm and power allocation problem can be formally stated similar to (22a), but now dependent on the variables Z instead of , ∀ . Solving it independently yields the following parallel power allocation scheme which can be computed while searching for the multiplier associated with its independent sum-power constraint in parallel ∀ . If P becomes non-diagonal, its diagonal structure can be reestablished as P = D P , where D P is a diagonal matrix obtained from SVD of the non-diagonal P . Multiplier , ∀ , should be such that (33) while independently allocating the powers at the middle layer ∀ . The dual function is convex [31] and can be solved with the Bisection method, as for the C-HYBF scheme.
At the top layer, one update of F is required ∀ ∈ B. Note that simultaneous variation in parallel of the beamformers U and powers P , ∀ , at the bottom and middle layer vary the received covariance matrices, and consequently the information to be updated in the local variables R and R at the antenna level, which F should combine (similar to (21)). As each FD BS ∈ B has complete information about the optimized variables at the middle and bottom layers, it can use it to update R and R , ∀ ∈ U . As the WSR is fully concave with respect to the analog combiner F , without the constraints and by using the properties of logarithm function, its optimization problem can be stated as where the local variables have been recently updated by using the information from the middle and bottom layers. Problem (38) is fully concave and solving its leads to the following optimal unconstrained analog combiner To meet the unit-modulus and quantization constraints, we normalize its amplitudes with ∠· to unit-norm and quantize it as F = Q (∠F ) ∈ P .

B. Per-Link Independent Sub-Problems in DL
Decomposition of the DL WSR is more challenging due to the coupled sum-power constraint among the users in the set D , ∀ . Moreover, G is also common between the DL users in the same cell and thus affects the total transmit power. To introduce per-link independent decomposition in DL, we assume that each FD BS ∈ B first updates the digital beamformers for the DL users, while keeping the Lagrange multiplier and the analog beamformer G fixed. Furthermore, the powers are included afterwards, while searching the common multiplier . Given this assumption, the DL WSR problem, for each FD BS ∈ B, decomposes into three layers of sub-problems. At the bottom layer, each FD BS to update the DL beamformers V and normalize its columns to unit-norm, in parallel ∀ . At the middle layer, one update of the analog beamformer G is required. Finally, at the top layer, we have to search for the Lagrange multiplier satisfying the coupled sum-power constraint and update the power matrices P for the DL users, in parallel ∀ . Fig. 4 shows the decomposition of the DL WSR into three layers of sub-problems, which must be solved from the bottom to the top.
For FD BS ∈ B, the Lagrangian for the DL WSR can be written as In (40), for the bottom layer, as , G , Z are fixed, optimization of the digital beamformers remains decoupled. To optimize V a derivative can be taken, which will lead to a similar KKT condition as (15b), but function of Z instead of . By following a similar proof for Theorem 2 [20], it can be easily shown that the WSR maximizing V can be computed as  We consider the normalization of the columns of V to unit-norm in parallel ∀ , such that optimal power allocation could it included at the top layer. Once the parallel update of the digital beamformers V , ∀ , has been made, at the middle layer, FD BS ∈ B has to optimize the analog combiner G . By considering the unconstrained analog combiner, each FD BS ∈ B has to independently solve the following optimization problem Note that each FD BS has complete information about the digital beamformers optimized at the bottom layer, which must be first used to update R −1 and L appearing in Z 1 and Z 2 in (42), respectively, ∀ . To optimize G a derivative of (42) can be taken, which will lead to a similar KKT condition as (18), function of Z instead of , ∀ . By following a similar proof for Theorem 3 [20], it can be easily shown that G can be optimized as The analog combiner G optimized according to (43) is unconstrained and vectorized. Therefore, we do unvec(vec(G )) to shape it into correct dimensions, normalize the amplitude with ∠· and quantize it such that G = Q(∠G ) ∈ P . For the top layer, the optimal stream power allocation can be included while searching the multiplier to satisfy the sum-power constraint . Assuming the multiplier to be fixed, which is captured in Z 2 , the power optimization problem ∀ ∈ D can be stated as In (44), the update of power matrix P , ∀ remains independent and the multiplier must be updated based on the sum of the transmit covariance matrices G V P V G , once all the power matrices P are updated in parallel. Solving (44) in parallel ∀ leads to the following optimal power allocation scheme As a final step, the multiplier can be searched with the Bisection method, similar to (36), and while doing so, the optimal power allocation for the DL user in the set D can be computed

C. On the Convergence of P&D-HYBF
The convergence proof for P&D-HYBF follows similarly from the proof stated for the C-HYBF scheme. Compared to C-HYBF scheme, the local variables have a different type of information saved for each communication link which dictate the gradients. Computing the beamformers given the information shared from the neighbouring BSs as the dominant generalized eigenvectors increase the WSR at each iteration for each link. However, the increase is

Algorithm 2 Parallel and Distributed Hybrid Beamforming
Given: The rate weights, CSI and multiple processors in UL and DL ∀ .

Repeat until convergence
∀ ∈ B, share G , F and U , V , ∀ , ∀ with the neighbouring FD BSs.
In parallel ∀ (∀ ∈ U , ∀ ∈ D .) Update L , L from the memory and update L and L based on the feedback.
Update R −1 and R from the memory.

Solve in parallel ∀ ∈ B
Parallel DL for FD BS Set: Compute Compute F with (39) and get ∠F .
different compared to C-HYBF as the local variables' information differs, thus resulting for the P&D-HYBF to converge to a different local optimum.

D. Computational Complexity Analysis
In this section, we present the per-iteration computational complexity for the C-HYBF and P&D-HYBF schemes. For such purpose, equal number of users in DL and UL in each cell, i.e., = and = , ∀ ∈ B, is assumed. Moreover, the number of antennas in each cell for the FD BSs, UL and DL users is also assumed to be the same.
Let us consider only the computational complexity of C-HYBF denoted with C , by ignoring its huge communication overhead to transfer complete CSI every CCT. Its one iteration consists in updating and digital beamformers for the DL and UL users, respectively, and analog beamformers and analog combiners by the central node. For P&D-HYBF, different computational processors have different computational burden and therefore we consider comparing its worst-case complexity. Namely, let and denote the number of processors dedicated for DL and UL by each FD BS, respectively. We consider the following two cases: 1) = , = , and 2) < , < , ∀ . The first case considers fully parallel implementation with the number of processors equal to the number of users. For such a case, the worst-case complexity in UL and DL is given for the processors which makes one update of the digital beamformer in UL and the analog combiner, and one update of the digital beamformer in DL and the analog beamformer, respectively, in each cell. The second case considers that the number of processors dedicated is less than the number of users. In such a case, each processor may have to update and digital beamformers in DL and UL, before updating the analog part. In such a case, the worst-case complexity in DL and UL is given for the processors which update digital beamformers and the analog combiner, and digital beamformers and the analog beamformer, respectively, in each cell. Considering the aforementioned details, the complexity analysis for the proposed schemes is provided in Table II, which clearly shows that P&D-HYBF requires significantly less computational power compared to the C-HYBF, and therefore very low-cost processors can be deployed at each FD BS.

VI. SIMULATION RESULTS
This section presents simulation results to evaluate the performance of the proposed C-HYBF and P&D-HYBF schemes. For comparison, we consider the following benchmark schemes: • A centralized Fully Digital FD scheme with the LDR noise.
• A centralized Fully Digital HD scheme with LDR noise, serving the UL and DL users by separating the resources in times. It is neither affected by the SI nor by the CI.
To compare the performance with a fully digital HD system, we define the additional gain in terms of percentage for an FD system over an HD system as where WSR and WSR are the network WSR for the FD and HD system, respectively.     RF chains achieve ∼ 74, 55 and ∼ 71, 54% additional gain and with 16 RF chains they achieve ∼ 67%, 48% and ∼ 64%, 47% additional gain with 10, 4 bits phase-resolution, respectively. We can also see that when the LDR noise variance increases, the achievable WSR for both the FD and HD systems decreases considerably. Fig. 8 shows the average WSR as function of the LDR noise with only 12 or 10 RF chains and with 10 or 4 bits phase-resolution. For LDR noise ≤ 80 dB, C-HYBF and P&D-HYBF with 12 RF chains achieves ∼ 60, 43% and ∼ 57, 43% additional gain and with 10 RF chains they achieve additional gain of ∼ 58, 38% and ∼ 53, 37% with 10, 4 bit phase-resolution, respectively. Fig. 7-8 exhibits the detrimental effect of the LDR noise on the maximum achievable WSR with HYBF for both the schemes for mmWave FD and motivates to deploy RF circuitry, which generates small LDR noise. It is to be noted that P&D-HYBF achieves similar performance as the C-HYBF scheme at any LDR noise level. Fig. 9 shows the average WSR as a function of the SNR with 32 and 16 RF chains and with 10 or 4 bit phase-resolution affected with LDR noise = −80 dB, in comparison with the benchmark schemes. We can see that a fully digital FD system achieves ∼ 94% and ∼ 82% additional gain at low and high SNR, respectively. With 32 RF chains and 10 bit phase-resolution, the C-HYBF scheme achieves ∼ 79% gain at all the SNR levels and the P&D-HYBF achieves ∼ 77% and ∼ 68% gain at low and high SNR, respectively. As the phase-resolution decreases to 4-bits, we can see that the loss in WSR compared to the 10-bit phase-resolution case is   C-HYBF-12RF-10bit  C-HYBF-12RF-4bit  C-HYBF-10RF-10bit  C-HYBF-10RF-4bit  P&D-HYBF-12RF-10bit  P&D-HYBF-12RF-4bit  P&D-HYBF-10RF-10bit  P&D-HYBF-10RF-4bit Fully Digital HD Fig. 10: Average WSR as a function of the SNR with LDR noise = −80 dB.  much more evident at high SNR. Still, with 16 RF chains and 10 or 4 bit phase-resolution, both schemes significantly outperform the fully digital HD scheme for any SNR level. Fig. 10 shows the average WSR as a function of the SNR with same LDR noise level as in Fig. 9, i.e., = −80 dB, but with 10 or 12 RF chains and 10 or 4 bit phase-resolution. The achieved average WSR presents a similar behaviour as in the case of a high number of RF chains, and it is visible that the proposed schemes significantly outperform the fully digital HD system also with very low number of RF chains and phase-resolution. Moreover, P&D-HYBF achieves similar performance as the C-HYBF scheme regardless of the phase resolution and number of RF chains. Fig. 11 shows the achieved average WSR as a function of the SNR with LDR noise = −40 dB, which reflects highly non-ideal RF circuitry. We can see that when the LDR noise dominates, reduction of the thermal noise variance has negligible effect on the effective signalto-LDR-plus-thermal-noise ratio (SLNR). Therefore, dominance of the LDR noise variance acts as a ceiling to the effective SLNR ratio which limits the achievable WSR, tending to saturate at SNR= 10 dB. We can also see that with a large LDR noise level, C-HYBF and P&D-HYBF still perform similarly with the same phase-resolution and RF chains. At high SNR, both schemes achieve higher WSR with 16 RF chains and 10 bit phase-resolution than the case of 32 RF chains and 4 bit phase-resolution. Fig. 12 shows the average WSR as a function of the SNR with only 10 or 12 RF chains and with 10 or 4 bit phase-resolution, clearly showing that the proposed schemes can still significantly outperform the full digital HD system. Fig. 12 also shows that both schemes with 10 RF chains and 10 bit phase resolution are more robust to the LDR noise than the case of 12 RF chains and 4 bit phase-resolution, thus motivating to deploy high resolution phase-shifters at the analog stage rather than deploying more number of RF chains, which not only requires higher cost but also it will not yield higher gains than the high resolution phase shifters case.
From the results presented above, we can conclude that both the proposed HYBF schemes achieve significant additional gain and outperform the fully digital HD system with only a few RF chains and low phase resolution. Furthermore, both the designs achieve similar performance, but P&D-HYBF is much more attractive as it eliminates the problem of transferring full CSI to the central node at every channel CCT, thus reducing the communication overhead significantly.
The per-link independent decomposition enables each FD BS to solve its local sub-problems independently on multiple computational processors, and it results to be also highly scalable as its complexity increases only linearly as a function of the number of users and FD BSs. On the other hand, besides the massive communication overhead, C-HYBF has a quadratic dependency on the computational complexity and requires massive computational power per-iteration to update all the variables jointly based on alternating optimization. P&D-HYBF imposes a minimal computational burden on each processor due to per-link independent decomposition and as the computations are made in parallel, it requires considerably less execution time. By investigating the execution time, we learned that the P&D-HYBF requires ∼ 1/21 and ∼ 1/2.3 less execution time in UL and DL, respectively, compared to the C-HYBF scheme. Such time is expected to increase only linearly with the network size and density for P&D-HYBF, meanwhile for C-HYBF it will scale quadratically. Finally, as the P&D-HYBF converges in a few iterations, a very small amount of information exchange is required among the FD BSs. 29

VII. CONCLUSION
This article presented two HYBF schemes for WSR maximization in multi-cell mmWave mMIMO FD systems. Firstly, a C-HYBF scheme based on alternating optimization is presented.
However, C-HYBF requires massive communication overhead to exchange information between the FD network and the central node. Moreover, very high computational power is required to optimize numerous variables jointly. To overcome these drawbacks, a very low-complexity P&D-HYBF design is proposed, which enables each FD BS to solve its local per-link independent sub-problems simultaneously on different computational processors, which drastically reduces the communication overhead. Its complexity scales only linearly as a function of the network size, making it highly scalable and enabling the deployment of low-cost computational processors.
Simulation results show that the proposed HYBF designs achieve similar average WSR and significantly outperform the centralized fully digital HD systems with only a few RF chains.