Full-Duplex Cell-Free mMIMO Systems: Analysis and Decentralized Optimization

Cell-free (CF) massive multiple-input-multiple-output (mMIMO) deployments are usually investigated with half-duplex nodes and high-capacity fronthaul links. To leverage the possible gains in through-put and energy efﬁciency (EE) of full-duplex (FD) communications, we consider a FD CF mMIMO system with practical limited-capacity fronthaul links . We derive closed-form spectral efﬁciency (SE) lower bounds for this system with maximum-ratio combining/maximum-ratio transmission processing and optimal uniform quantization. We then optimize the weighted sum EE (WSEE) via downlink and uplink power control by using a two-layered approach: the ﬁrst layer formulates the optimization as a generalized convex program, while the second layer solves the optimization decentrally using alternating direction method of multipliers. We analytically show that the proposed two-layered formulation yields a Karush-Kuhn-Tucker point of the original WSEE optimization. We numerically show the inﬂuence of weights on the individual EE of the users, which demonstrates the utility of WSEE metric to incorporate heterogeneous EE requirements of users. We show that the low fronthaul capacity reduces the number of users each AP can support, and the cell-free system, consequently, becomes user-centric.


I. INTRODUCTION
Massive multiple-input-multiple-output (mMIMO) wireless systems employ a large number of antennas at the base stations (BSs), and achieve higher spectral efficiency (SE) and energy efficiency (EE) with relatively simple signal processing [1]- [3]. Two distinct mMIMO variants are being investigated in the literature: i) co-located, wherein all antennas are located at one place [1]; and ii) distributed, wherein antennas are spread over a large area [2, and the references therein] [3]. While co-located mMIMO systems have a low fronthaul requirement, distributed mMIMO systems, at the cost of higher fronthaul infrastructure, have greater spatial diversity to exploit and consequently have greater immunity to shadow fading [2], [3]. Cell-free (CF) mMIMO is one of the most promising distributed mMIMO variants in the current literature [2], [3]. CF mMIMO envisions a communication region with no cell boundaries, and promises substantial gains in SE and fairness over small-cell deployments [2], [3].
Full-duplex (FD) wireless systems have now been practically realized with advanced selfinterference (SI) cancellation mechanisms [4]. FD CF mMIMO is a relatively recent area of interest [5]- [7], where access points (APs) simultaneously serve downlink and uplink user equipments (UEs) on the same spectral resource. Vu et al. in [5] considered a FD CF mMIMO system with maximum-ratio combining and showed that if SI at the APs is suppressed up to a certain limit, it has higher throughput than its half-duplex (HD) counterpart and FD co-located systems. Wang et al. in [6] evaluated the SE of a network-assisted FD CF mMIMO system using zero-forcing and regularized zero-forcing beamforming. Reference [7] proposed a heap-based algorithm for pilot assignment to overcome pilot contamination in FD CF mMIMO systems.
In CF mMIMO, APs are connected to a central processing unit (CPU) using fronthaul links.
The existing FD CF mMIMO literature assumes high-capacity fronthaul links [5]- [7]. These links, however, have limited capacity, and the information needs to be consequently quantized and sent over them. The limited-capacity fronthaul has been considered only for HD CF mMIMO systems in [8]- [10]. Femenias et al. in [9] studied a max-min uplink/downlink power allocation problem for HD CF mMIMO with limited-capacity fronthaul, while Masoumi et al. in [10] optimized the SE of a HD CF mMIMO uplink with limited-capacity fronthaul and hardware impairments. Bashar et al. in [8] derived the SE of HD CF mMIMO uplink with limited-capacity fronthaul. We consider quantized fronthaul for a FD CF mMIMO system to derive achievable SE expressions. To the best of our knowledge, the current work is first one to do so.
With tremendous increase in network traffic, the EE has become an important metric to design a modern wireless system. Global energy efficiency (GEE), defined as the ratio of the network SE and its total energy consumption, is being used to design CF mMIMO communication systems [11]- [14]. Ngo et al. in [11] optimized the GEE for the downlink of a HD CF mMIMO system.
Bashar et al. in [12] optimized the uplink GEE of a HD CF mMIMO system with optimal uniform fronthaul quantization. Alonzo et al. in [13] optimized the GEE of CF and UE-centric HD mMIMO deployments in the mmWave regime. Nguyen et al. in [14] maximized a novel SE-GEE metric for the FD CF mMIMO system using a Dinkelbach-like algorithm.
A UE with limited energy availability will accord a much higher importance to its EE than an another UE with a sufficient energy supply. GEE is a network-centric metric and cannot accommodate such heterogeneous EE requirements [15]. The weighted sum energy efficiency (WSEE) metric, defined as the weighted sum of individual EEs [15], can prioritize EEs of individual UEs, by allocating them a higher weight [16], [17]. The WSEE is investigated in [16] for a general wireless network, and for a two-way FD relay in [17]. It is yet to be investigated for CF mMIMO HD and FD systems.
Decentralized designs, which accomplish a complex task by coordination and cooperation of a set of computing units, are being used to design mMIMO systems [18], [19]. This interest is driven by high computational complexity and high interconnection data rate requirements between radio frequency chains and baseband units in centralized mMIMO system designs [18].
Jeon et al. in [18] constructed decentralized equalizers by partitioning the BS antenna array.
Reference [19] proposed a coordinate-descent-based decentralized algorithm for mMIMO uplink detection and downlink precoding. Reference [20] employed alternating direction method of multipliers (ADMM) to decentrally allocate edge-computing resource for vehicular networks.
Such decentralized approaches have not yet been employed to optimize FD CF mMIMO systems.
We next list our main contributions in this context: 1) We consider FD CF mMIMO communications with maximal ratio combining/maximal ratio transmission (MRC)/(MRT) processing and limited fronthaul with optimal uniform quantization. This is unlike the existing works on FD CF mMIMO [5]- [7], [14], which consider perfect highcapacity fronthaul links. We derive achievable SE expressions for both uplink and downlink UEs, which are valid for arbitrary number of antennas at each AP. We use the derived SE expression to maximize the non-convex WSEE metric. While energy-efficient design of CF mMIMO systems have been studied in literature [11]- [14], most of them focus on the GEE metric, except reference [14]. The GEE, being a single ratio, can be expressed as a pseudoconcave (PC) function and can thus be maximized using Dinkelbach's algorithm [15]. Reference [14] is the only work so far which optimized the EE of FD CF mMIMO. It considered a novel SE-GEE objective, which also reduces to a PC function and is maximized using a Dinkelbachlike algorithm. The WSEE, in contrast, is a sum of PC functions, and is not guaranteed to be a PC function [15]. This makes the WSEE an extremely non-trivial objective to maximize [15]. Further, the algorithm in [14] requires knowledge of instantaneous small-scale channel fading coefficients. The WSEE metric optimized here, in contrast, requires large-scale channel coefficients, which remains constant for multiple coherence intervals [21].
2) We decentrally maximize WSEE using a two-layered iterative approach which combines successive convex approximation (SCA) and ADMM. The first layer simplifies the non-convex WSEE maximization problem by using epigraph transformation, slack variables and series approximations. It then locally approximates the problem as a generalized convex program (GCP) which is solved iteratively using the SCA approach. The second layer decentrally optimizes the GCP by using the consensus ADMM approach, which decomposes the centralized version into multiple sub-problems, each of which is solved independently. The local solutions are combined to obtain the global solution. We note that the GCP is not in the standard form which is required for applying ADMM, as it involves constraints that couple power control coefficients from different UEs. We therefore create global and local versions of the power control coefficients, which decouple the constraints, and iteratively update them till the algorithm converges.
3) We show that there is a fundamental limit to the number of UEs a FD AP can serve with limited fronthaul capacity by imposing impose separate constraints on the number of uplink and downlink UEs. We propose a proportionately-fair rule capping the maximum number of uplink and downlink UEs served by each AP. We use this rule to propose a fair AP selection algorithm which efficiently chooses the best subset of APs to serve each uplink and downlink UE.

4)
We analytically and numerically prove the convergence of the proposed decentralized approach. We numerically demonstrate the tightness of our obtained achievable SE expressions and investigate its variation with various system model parameters. We numerically show that the proposed decentralized optimization i) achieves the same WSEE as the centralized approach; and ii) is responsive to changing weights which can be set to prioritize UEs' EE requirements.
II. SYSTEM MODEL We consider, as shown in Fig. 1, a FD CF mMIMO system where M FD APs serve K = (K u + K d ) single-antenna HD UEs on the same spectral resource, with K u and K d being the number of uplink and downlink UEs, respectively. Each AP has N t transmit and N r receive antennas, and is connected to the CPU using a limited-capacity fronthaul link which carries quantized uplink/downlink information to/from the CPU. We see from Fig. 1 that due to FD model • uplink receive signal of each AP is interfered by its own downlink transmit signal and that of other APs. These intra-and inter-AP interferences are shown using purple and brown dashed lines, respectively.
• downlink UEs receive transmit signals from uplink UEs, causing uplink downlink interference (UDI) (shown as black dotted lines between uplink and downlink UEs). Additionally, the UEs experience multi-UE interference (MUI) as the APs serve them on the same spectral resource.
We next explain various channels, their estimation and data transmission. We assume a coherence interval of duration T c (in s) with τ c samples, which is divided into: a) channel estimation phase of τ t samples, and b) downlink and uplink data transmission of (τ c -τ t ) samples. Channel description: The channel of the kth downlink UE to the transmit antennas of the mth AP is g d mk ∈ C Nt×1 , while the channel from the lth uplink UE to the receive antennas of the mth AP is g u ml ∈ C Nr×1 . 1 We model these channels as g d mk = (β d mk ) 1/2gd mk and g u ml = (β u ml ) 1/2gu ml . Here β d mk and β u ml ∈ R are corresponding large scale fading coefficients, which are same for all antennas at the mth AP [5], [21]. The vectorsg d mk andg u ml denote small scale fading with independent and identically distributed (i.i.d.) CN (0, 1) entries. The UDI channel between the kth downlink UE and lth uplink UE is modeled as h kl = (β kl ) 1/2h kl [5], [6], whereβ kl is the large scale fading coefficient andh kl ∼ CN (0, 1) is the small scale fading. The inter-and intra-AP channels from the transmit antennas of the ith AP to the receive antennas of the mth AP are denoted as H mi ∈ C Nr×Nt for i = 1 to M .
Uplink channel estimation: Recall that the channel estimation phase consists of τ t samples.
We divide them as τ t = τ d t + τ u t , where τ d t and τ u t are samples used as pilots for the downlink and uplink UEs, respectively. All the downlink (resp. uplink) UEs simultaneously transmit τ d mode. The kth downlink UE (resp. lth uplink UE) transmits pilot signals . We assume, similar to [5], [11], that the pilots i) have unit norm i.e., ϕ u l = ϕ d k = 1; and ii) are intra-set orthonormal i.e. (ϕ u l ) H ϕ u l = 0 ∀l = l and (ϕ d k ) H ϕ d k = 0 ∀k = k . Therefore, we need τ d t ≥ K d and τ u t ≥ K u [5], [11]. The pilots received by transmit and receive antennas of the mth AP are given respectively as Here ρ t is the normalized pilot transmit signal-to-noise-ratio (SNR). The matrices W tx m ∈ C Nt×τ d t and W rx m ∈ C Nr×τ u t denote additive noise with CN (0, 1) entries. Each AP independently estimates its channels with the uplink and downlink UEs to avoid channel state information (CSI) exchange overhead [5], [14]. To estimate the channels g d mk and g u ml , the mth AP projects the received signal onto the pilot signals ϕ d k and ϕ u l respectively, asŷ tx These projections are used to compute the corresponding linear minimum-mean-squared-error (MMSE) channel estimates [5] [11]. After channel estimation, data transmission starts simultaneously on downlink and uplink.
Transmission model: An objective of this work is to derive a SE lower bound for FD CF mMIMO systems, where the M APs serve K u uplink UEs and K d downlink UEs simultaneously on the same spectral resource. We note that for the FD CF mMIMO systems, unlike the HD CF mMIMO systems [8], [9], [21], uplink and downlink transmissions interfere to cause UDI and inter-/intra-AP interferences. Further, unlike existing FD CF mMIMO literature [5], [6], [14], we consider a limited-capacity fronthaul. It is critical to model and analyze the UDI and inter-/intra-AP interferences and limited-capacity impairments while deriving the lower bound.

1) Downlink data transmission:
The CPU chooses a message symbol s d k for the kth downlink UE, which is distributed as CN (0, 1). It intends to send this symbol to the mth AP via the limited-capacity fronthaul link. Before doing that, it multiplies s d k with a power-control coefficient η mk , and then quantizes the resulting signal. The mth AP, due to its limited fronthaul capacity, is allowed to serve only a subset κ dm ⊂ {1, . . . , K d } of downlink users, an aspect which is discussed later in Section II-2. The CPU consequently sends downlink symbols for UEs in the set κ dm to the mth AP, which uses MMSE channel estimates to perform MRT precoding. The transmit signal of the mth AP is therefore given as follows Here ρ d is the normalized maximum transmit SNR at each AP. The function Q(·) denotes the quantization operation, which is modeled as a multiplicative attenuation,ã, and an additive distortion, ς d mk , for the kth downlink UE in the fronthaul link between the CPU and the mth AP [8], [12]. We have, from Appendix A, E{ ς d the constraint can be simplified as follows The kth downlink UE receives its desired message signal from a subset of all APs, denoted as The mth AP serves the kth downlink UE iff k ∈ κ dm ⇔ m ∈ M d k . Here x u l is the transmit signal of the lth uplink UE, which is modelled next.
2) Uplink data transmission: The K u uplink UEs also simultaneously transmit to all M APs on the same spectral resource as that of the K d downlink UEs. The lth uplink UE transmits its signal x u l = √ ρ u θ l s u l with s u l being its message symbol with pdf CN (0, 1), ρ u being the maximum uplink transmit SNR and θ l being the power control coefficient. To satisfy the average SNR constraint, E{|x u l | 2 } ≤ ρ u , the lth uplink UE satisfies the following constraint The FD APs not only receive uplink UE signals but also their own downlink transmit signals and that of the other APs, referred to as intra-AP and inter-AP interference, respectively. Using (1), the received uplink signal at the mth AP is expressed as Here w u m ∈ C Nr×1 is the additive receiver noise at the mth AP with i.i.d. entries ∼ CN (0, 1). The intra and inter-AP interference channels vary extremely slowly and thus can be estimated with very low pilot overhead [6]. The receive antenna array of each AP, with estimated channel, can only partially mitigate the intra-and inter-AP interference [5], [6]. The residual intra-/inter- [17]. Here γ RI,mi β RI,mi γ RI , with β RI,mi being the large scale fading coefficient from the ith AP to the mth AP, and γ RI being the RI power after its suppression.
The mth AP receives the signals from all the uplink UEs, and performs MRC for the lth uplink UE with (ĝ u ml ) H . Due to its limited fronthaul: i) AP quantizes the combined signal before sending it to CPU; ii) as discussed in detail later in Section II-2, the CPU receives contributions for the lth uplink UE only from the subset of APs serving it, denoted as M u l ⊂ {1, . . . , M }. Using (5), the signal received by the CPU for the lth uplink UE is expressed as We denote the subset of uplink UEs served by the mth AP as κ um ⊂ {1, . . . , K u }. The mth AP serves the lth uplink UE iff l ∈ κ um ⇔ m ∈ M u l . The quantization operation Q(·) is mathematically modeled using constant attenuationã, and additive distortion ς u ml which, as shown in Appendix A, has power E{(ς u ml Quantization, limited fronthaul and AP selection: The fronthaul between the mth AP and the CPU uses ν m bits to quantize the real and imaginary parts of transmit signal of the mth downlink UE and the uplink receive signal after MRC i.e., √ η mk s d k , and (ĝ u ml ) H y u m , respectively. Due to the limited-capacity fronthaul, the mth AP serves only K um ( |κ um |) and K dm ( |κ dm |) UEs on the uplink and downlink, respectively [8], [12]. For each UE, we recall that there are (τ c − τ t ) data samples in each coherence interval of duration T c . The fronthaul data rate between the mth AP and the CPU, in bps (bits per second), is The fronthaul link between the mth AP and the CPU has capacity C fh,m which implies that We propose the following lemma where we consider a proportionally fair approach to calculate K dm and K um . We set them in proportion to the total downlink and uplink UEs, respectively.
Lemma 1. The maximum number of uplink and downlink UEs served by the mth AP when connected via a limited optical fronthaul to the CPU with capacity C fh,m are given as Proof: LetK um andK dm denote the maximum number of uplink and downlink UEs served by the mth AP. We considerK um ∝ K u andK dm ∝ K d for proportional fairness on the uplink and downlink. Using (8), we get, The lemma follows directly from the definition of floor function · .
Using the maximum limits obtained in (9), we assign K um = min{K u ,K um } and K dm = min{K d ,K dm }. We see that the constraint imposed in (8) is similar to a UE-centric (UC) CF mMIMO system, wherein each UE is served by a subset of the APs [2]. We now define the procedure for AP selection to obtain the best subset of APs to serve each uplink and downlink UE, while satisfying (8). For this, we extend the procedure in [8] for a FD system as follows: • The mth AP sorts the uplink and downlink UEs connected to it in descending order based on their channel gains (β u ml and β d mk , respectively) and chooses K um uplink UEs and K dm downlink UEs, with the largest channel gains, to populate the sets κ um and κ dm , respectively.
• For the lth uplink UE and the kth downlink UE, we populate the sets M u l and M d k , respectively, using the axioms l ∈ κ um ⇔ m ∈ M u l and k ∈ κ dm ⇔ m ∈ M d k . • If an uplink or downlink UE is found with no serving AP, we use the procedure in Algorithm 1 to assign it the AP with the best channel conditions, while satisfying (8).
Algorithm 1: Fair AP selection for disconnected uplink and downlink UEs then Sort the APs in descending order of channel gains, β d mk , and find the AP n with the largest channel gain. For this nth AP, sort downlink UEs in κ dn in descending order of channel gains and find the qth downlink UE with minimum channel gain and at least one more connected AP.
Remove the qth downlink UE from the set κ dn and add the kth downlink UE to it. 3 Repeat the same procedure for all the uplink UEs l = 1 to Ku.

III. ACHIEVABLE SPECTRAL EFFICIENCY
We now derive the ergodic SE for the kth downlink UE and the lth uplink UE, denoted respectively asS d k andS u l . The AP employs MRC/MRT in the uplink/downlink and optimal uniform fronthaul quantization.The ergodic SE expressions are calculated using (3) and (6), as are signal, noise and interference powers respectively, for the lth uplink and kth downlink UEs.
We use ε {d, u} to denote downlink and uplink, respectively; φ {k, l} to denote kth downlink UE and lth uplink UE, respectively; and υ ε mφ {η mk for φ = k, θ l for φ = l}. The expectation outside logarithm in the SE expressions in (10) is mathematically intractable, and it is difficult to simplify them further [5], [8], [21]. We, similar to [21], employ use-and-then-forget (UatF) technique to derive SE lower bounds. To use UatF, we rewrite the received signal at the CPU for the lth uplink UE in (6), and at the kth downlink UE in (3), as where the effective additive noise terms n ε φ are expressed as follows: The term DS ε φ in (11) denotes the desired signal received over the channel mean, and the term BU ε φ in (12)-(13) denotes beamforming uncertainty i.e., the signal received over deviation of channel from mean. It is easy to see that n ε φ are uncorrelated with their respective DS ε φ terms. We, similar to [5], treat them as worst-case additive Gaussian noise, an approximation which is tight for mMIMO systems [5]. Using (11)-(13), we next derive an achievable SE lower bound. Theorem 1. An achievable lower bound to the SE for the kth downlink UE with MRT and the lth uplink UE with MRC can be expressed respectively as are the variables on which the SE is dependent. We recall from Section II thatã andb in (14)- (15) depend on the number of quantization bits, ν.
Proof: Refer to Appendix B. The SE expressions are functions of large scale fading coefficients, γ d mk and γ u ml , which we will use to optimize WSEE. This is unlike [14] which requires instantaneous channel while optimizing SE-GEE metric.
Remark 1. MRC/MRT has tractable SE expression that depend solely on large-scale channel statistics, which remain constant over hundreds of coherence intervals [22]. This is in contrast to zero-forcing designs which yield better SE but not tractable SE expressions [2]. Further, MRC/MRT can be implemented in a distributed fashion with low complexity.

IV. TWO-LAYER DECENTRALIZED WSEE OPTIMIZATION FOR FD CF MMIMO
We now devise a decentralized algorithm which maximizes WSEE by calculating the optimal downlink and uplink power control coefficients η * and Θ * , respectively. We use "two-layered" approach to decompose WSEE maximization into a sequential process with two distinct individual steps, each of which is called a "layer". The first layer simplifies the non-convex WSEE maximization into a successive convex approximation (SCA) setting. Its output is a generalized convex program (GCP) which needs to be solved iteratively for the optimal solution. The second layer optimally solves above GCP, either centrally through standard interior-point approaches or decentrally using ADMM method. The proposed procedure is outlined in Algorithm 2.
Algorithm 2: Two-layer decentralized WSEE maximization 1 AP selection: Select APs that serve each UE while satisfying limited fronthaul constraints.
2 SCA framework (first layer): Apply a series of transformations and approximations to recast the non-convex WSEE maximization using successive convex approximation (SCA) framework. The output of first layer is a GCP. 3 Decentralized ADMM approach (second layer): Introduce global and local variables to decouple the problem into multiple sub-problems. Each sub-problem is solved at a distributed (or "D") server, whose solutions are coordinated to obtain the global solution at the central (or "C") server. This procedure is implemented using ADMM.
We use ε {d, u} for the downlink and uplink, respectively; φ {k, l} for the kth downlink UE and lth uplink UE, respectively; and first define the individual EE for each UE as [16], where B is the system bandwidth, and p ε φ denotes the power consumed by each UE. The fronthaul links consume power for both downlink and uplink transmission. The APs consume power while transmitting data to the downlink UEs, and the uplink UEs consume power while transmitting their data. The power consumed by the system to transmit data to the kth downlink UE and the power consumed by the lth uplink UE are given respectively as [12], [14] p Here α m and α l are power amplifier efficiencies at the mth AP and the lth uplink UE respectively [5], N 0 is the noise power and P d tc,k and P u tc,l are the powers required to run the transceiver chains at each antenna of the kth downlink UE and the lth uplink UE, respectively.
The power consumed by the AP transceiver chains and the fronthaul between APs and CPU: Here P tc,m is the power required to run the transceiver chains at each antenna of the mth AP.
The fronthaul power consumption for the mth AP has a fixed component, P 0,m , and a trafficdependent component, which attains a maximum value of P ft at full capacity C fh,m . The term R fh,m , given in (7), is the fronthaul data rate of the mth AP.
The WSEE is now defined as the weighted sum of individual EEs of different users [15], as where w ε φ are weights assigned to the UEs to account for their heterogeneous EE requirements. The WSEE metric can prioritize the EE requirements of individual UEs by assigning them different weights [16], [17]. For example, it could assign a higher weight to a UE that is more energy-scarce. The WSEE maximization problem can now be formulated as follows R fh,m ≤ C fh,m , (2), (4).
The quality-of-service (QoS) constraints in (19a) guarantee a minimum SE, denoted by the constants S d ok and S u ol , for each downlink and uplink UE respectively. The first constraint in (19b) ensures that the fronthaul transmission rate for all APs is within the capacity limit. We observe that the number of quantization bits ν, if included in problem P1, will make it a difficult-to-solve integer optimization problem [8], [12], [23]. We therefore solve it to optimize the power control coefficients {η, Θ}, by fixing ν such that it satisfies the first constraint in (19b) [8], [12], and numerically investigate ν in Section V. We omit the constant B and reformulate P1 as follows (2), (4).
The objective in P2 is a sum of ratios, each of which is a PC function (concave-over-linear) of power control coefficients {η, Θ}. It is, therefore, not guaranteed to be a PC function and Dinkelbach's algorithm cannot be applied to maximize it [15]. This makes it a much harder objective to optimize as opposed to the more commonly studied GEE metric, which is a PC function [15] and has been investigated for CF mMIMO systems [11]- [14].
We now maximize WSEE centrally and decentrally using a two-layered approach. The first layer comprises an SCA framework, which formulates a GCP by approximating the non-convex objective and constraints in P2 as convex. In the second layer, the approximate GCP formed in the nth SCA iteration is either solved centrally or decentrally using ADMM. Since the approximate GCP obtained in the first layer, due to coupled optimization variables, is not in the standard ADMM form, we introduce their local and global versions. The sub-problems to update local variables are solved independently, and the local variables are coordinated to calculate the global solution [20], [24]. The updation of variables and coordination continues till ADMM converges.
The obtained solution is then used to formulate GCP for the (n + 1)th SCA iteration.
We next provide a centralized SCA to solve P6 in the second layer in Algorithm 3. Solve P6 for the nth SCA iteration to obtain optimal variables, {f d , f u , Ψ d , Ψ u , ζ d , ζ u , λ d , λ u , C, Θ} * ,(n) .

3
Assign the SCA iterates for the (n + 1)th iteration, The SCA procedure converges when r (n) has has a magnitude r  These approximations are of the form Λ(x) . It is easy to show that P6 is the inner-approximation problem for P5, where we replace each of the constraints (22a) and (23a)-(23b), denoted here as g i (x) ≤ 0, i = 1, 2, 3, with a convex approximation of the formḡ i (x, x (n) ) ≤ 0, i = 1, 2, 3. For each of the approximations, it can be easily shown that the following properties hold [25]: i) g i (x) ≤ḡ i (x, x (n) ) for all feasible x; ii) g i (x (n) ) =ḡ i (x n , x (n) ); and ∂g i (x (n) ) ∂x j = ∂ḡ i (x n ,x (n) ) ∂x j , j = 1, 2. The constraints in P6 also satisfy Slater's conditions [23]. This implies that Algorithm 3, by solving the inner-approximation problem, always converges to a KKT point of P2 due to [25].

B. Decentralized ADMM approach
We now use ADMM to solve P6 decentrally in the second layer, an approach well-suited for CPUs with multiple distributed D-servers, connected via a central C-server [18], [19]. ADMM decomposes a central problem into multiple sub-problems, each of which is solved by a D-server locally and independently. The C-server combines the local solutions to obtain a global solution.
We observe that the constraints in (24a)-(24b) couple the power control coefficients of different uplink and downlink UEs. We next introduce global variables for the power control coefficients at the C-server, with local copies at the D-servers to decouple P6 into sub-problems for each UE. We observe that the constraints in P6 for the downlink and uplink UEs can be divided between downlink and uplink D-servers, respectively. The D-servers solve sub-problems defined for each downlink and uplink UE. We first define local feasible sets at the nth SCA iteration for them, which are denoted as S d,(n) k and S u,(n) l , respectively. These sets are given as follows Here C d k , C u l ∈ C M ×K d and Θ d k , Θ u l ∈ C Ku×1 are local copies at the D-server of the corresponding global variables at the C-server, which are denoted as C ∈ C M ×K d and Θ ∈ C Ku×1 respectively, and represent the downlink and uplink power control coefficients, C and Θ, in P6.
We note that each D-server has its local power control variables and hence the constraints in (25), which are all convex, are independent for each D-server. This ensures that the sets S d,(n) k and S u,(n) l are convex. We define the sets of local variables for the D-servers corresponding to the downlink and uplink UEs as and Ω u l [ C u l , Θ u l , f u l , Ψ u l , λ u l , ζ u l ] respectively. We now reformulate P6 as follows To ensure that the global variables at the C-server have identical local copies maintained at the D-servers, we introduce the consensus constraints (26b)-(26c). The ADMM algorithm can now be readily applied to P7 as it is in the global consensus form [24]. We use ε {d, u} to denote downlink and uplink, respectively, and φ {k, l} to denote kth downlink UE and lth uplink UE, respectively. The sub-problems of the individual D-servers can now be written as follows We now define auxiliary functions for the objective in P7b as follows We write, using (28), the augmented Lagrangian function for P7 as where ρ C , ρ θ > 0 are the penalty parameters corresponding to the global variables C and Θ respectively, and χ ε φ ∈ C M ×K d , ξ ε φ ∈ C Ku×1 are the Lagrangian variables associated with the equality constraints (26b) and (26c), respectively. The quadratic penalty terms are added to the objective to penalise equality constraints violations, and to enable the ADMM to converge by relaxing constraints of finiteness and strict convexity [24].
We note that the augmented Lagrangian in (29) is not decomposable in general for the problem formulation in P7b [23]. The auxiliary functions defined in (28) enable us to decompose it and formulate sub-problems for the D-servers. In ADMM method, the D-servers independently solve the sub-problems and update the local variables, which are collected by the C-server to update the global variables [24]. In the (p + 1)th iteration, following steps are executed in succession.

1) Local computation:
The D-servers for each UE solve P8 to update the local variables as 2) Lagrangian multipliers update: The D-servers now update the Lagrangian multipliers as Using (29) and maximizing w.r.t. each global variable, we obtain a closed form solution The updated global variables in (32)-(33) are broadcasted by the C-server to all the D-servers.
Initialization for ADMM: At the (n + 1)th SCA iteration, we initialize the global variables at the C-server and their local copies at the D-servers with the SCA iteration variables as ADMM Convergence Criterion: The ADMM can be said to have converged at iteration P if the primal residue is within a pre-determined tolerance limit ADMM i.e., r (P ) 2 ≤ ADMM . The steps (30), (31), (32)-(33) and (36) are iterated until convergence, after which we obtain the locally optimal power control coefficients { C * , Θ * }. We assign them to the iterates for the (n + 1)th SCA iteration, i.e., C (n+1) = C * , Θ (n+1) = Θ * . This concludes the nth SCA iteration.
Remark 3. Convergence of proposed decentralized algorithm: Algorithm 4 uses the iterative SCA technique with each SCA iteration involving ADMM. The algorithm is thus guaranteed to converge if both SCA and ADMM converge. As discussed in Remark 2, the SCA iterative procedure surely converges to a KKT point of P2. For a given SCA iteration, the convergence of ADMM is guaranteed and investigated in detail in [24].
Remark 4. Implementability: The maximal ratio combiner/beamformer considered herein is the simplest receiver/transmitter for a distributed cell-free mMIMO system [2]. Further, the power optimization algorithms require only long-term fading channel coefficients, which remain constant for hundreds of coherence intervals [22]. This is in contrast to the existing work in SE-GEE maximization of FD cell-free massive MIMO systems in [14], which requires instantaneous channel. The current optimization problem whose reduced complexity is discussed below, therefore, needs to be solved over a relaxed time frame, which makes it easily implementable.

C. Computational complexity of centralized and decentralized algorithms
Before beginning this study, it is worth noting that both centralized Algorithm 3 and decentralized Algorithm 4 comprise of multiple steps that involve solving simple closed form expressions.
These steps consume much lesser time than the ones which solve a GCP, typically using interior points methods [23]. We therefore compare the per-iteration complexity of centralized and decentralized algorithms by calculating the complexity of solving the respective GCPs.
• Algorithm 3 solves P6 in step-1 of each SCA iteration, which has 4(K u + K d ) + K u + M K d real variables and 6(K u + K d ) + M + M K d linear constraints. It has a worst-case computational complexity O (10(K u +K d )+K u +M +2M K d ) 3/2 (4(K u +K d )+K u +M K d ) 2 [27].
• Algorithm 4, in step-2 of each ADMM iteration, solves P8 at the D-servers in parallel to update the local variables. We, therefore, need to analyse the computational complexity at any one of the D-servers. Since the downlink has an additional constraint (second one in (25d)), we consider a downlink D-server for worst-case complexity analysis, which in P8 has M K d +K u +4 real variables and M K d +M +K u +6 linear constraints. It t will have a worst- We consider K d = K u = K/2 uplink and downlink UEs for this analysis. We observe that for a large K, Algorithm 4 has a much lower computational complexity than Algorithm 3.

V. SIMULATION RESULTS
We now numerically investigate the SE and WSEE of a FD CF mMIMO system with limitedcapacity fronthaul links. We assume a realistic system model wherein the M APs, K d downlink UEs and K u uplink UEs are all scattered randomly in a square of size D km × D km. To avoid the boundary effects [21], we wrap the APs and UEs around the edges [5]. We use ε {d, u} to denote downlink and uplink respectively, and φ {k, l} to denote kth downlink UE and lth uplink UE, respectively. The large-scale fading coefficients, β ε mφ , are modeled as [11] β ε mφ = 10 PL ε mφ 10 10 σ sd z ε mφ 10 . (38) Here 10 σ sd z ε mφ 10 is the log-normal shadowing factor with a standard deviation σ sd (in dB) and z ε mφ follows a two-components correlated model [21]. The path loss PL ε mφ (in dB) follows a three-slope model [5], [21]. We, similar to [5], model the large-scale fading coefficients for the inter-AP RI channels, i.e., β RI,mi , ∀i = m, as in (38), and assume that the large-scale fading for the intra-AP RI channels, which do not experience shadowing, are modeled as β RI,mm = 10 PL RI (dB) 10 .
The inter-UE large scale fading coefficients,β kl , are also modeled similar to (38). We consider, for brevity, same number of quantization bits, ν, and same capacity, C fh , on all fronthaul links.
We, henceforth, denote the transmit powers on the downlink and uplink as p d (= ρ d N 0 ) and p u (= ρ u N 0 ), respectively, and the pilot transmit power as p t (= ρ t N 0 ). Similar to [5], [8], [11], [12], [21], we fix the values for the system model and power consumption model parameters, unless mentioned otherwise, as given in Table I.   Table I. We, similar to [11], [21], allocate equal power to all downlink UEs and full power to all uplink UEs, i.e., η mk = bN t k∈κ dm γ d mk −1 , ∀k ∈ κ dm and θ l = 1. We see that the derived lower bound is tight for both values of M. and downlink transmission; and iii) multiply sum SE with a factor of (1/2). We see that the FD system has a significantly higher sum SE than an equivalent HD system, provided the RI suppression is good (γ RI ≤ −10 dB). The sum SE does not double, even with significant RI suppression γ RI ≤ −40 dB. This is due to the UDI experienced by the downlink UEs in a FD CF mMIMO system as shown in Fig. 1, which cannot be mitigated by RI suppression at APs.
Sum SE -variation with quantization bits: We plot in Fig. 2c the sum SE by varying the number of fronthaul quantization bits ν. We consider M = 32, K d = 12, K u = 8, p d = 2p u = 30 dBm, N t = N r = {8, 16} transmit and receive antennas on each AP and fronthaul capacities C fh = {10, 100} Mbps. We observe that in both cases, the sum SE increases with increase in ν initially and then saturates. Increasing ν reduces the quantization distortion and attenuation, which improves the sum SE. This effect, however, saturates as after a limit most of the information is retrieved. We observe that reducing the fronthaul capacity from C fh = 100 Mbps to C fh = 10 Mbps reduces the sum SE slightly, as the procedure outlined in Section II-2 fairly retains the AP-UE links with the highest channel gains and helps maintain the sum SE.  We plot in Fig. 3a and Fig. 3b the individual EEs of UL and DL UEs, with: i) equal weights (w 1 = w 2 = w 3 = w 4 = 0.25), and ii) w 1 = 0.08, w 2 = 0.02, w 3 = 0.5, w 4 = 0.4, respectively.
In Fig. 3a, with equal weights, UEs attain an EE depending on their relative channel conditions, which clearly indicates that in terms of channel conditions, DL UE 2 DL UE 1 > UL UE 2 > UL UE 1. In Fig. 3b, the weights are chosen in an order which is opposite to the channel conditions. The EEs of the UL UEs dominate the EE of DL UE 1, while reversing their relative order. DL UE 2, with excellent channel, still attains a high EE, although lower than in Fig. 3a.
Convergence of decentralized ADMM algorithm: We plot in Fig. 3c the WSEE obtained using decentralized Algorithm 4 with SCA iteration index. We consider M = 10 APs, K u = K d = K/2 = 2 uplink and downlink UEs and N t = N r = {1, 2} transmit and receive antennas on each AP at transmit power p d = 2p u = p = 30 dBm. We assume the following: i) penalty parameters ρ C = ρ θ = 0.1; ii) penalty parameter update threshold factor µ = 10; iii) ADMM convergence threshold ADMM = 0.01; and iv) SCA convergence threshold SCA = 0.001. We consider two values of the penalty update parameter: ϑ = {1.2, 1.8}. We note that the algorithm in both cases converges marginally quicker with ϑ = 1.2. A smaller penalty update parameter is therefore beneficial as then changes in the penalty parameters are not too abrupt, and a bad ADMM iteration which causes the primal and dual residues to diverge is, consequently, not overly responded to [26]. We therefore fix ϑ = 1.2 for the rest of the simulations.
WSEE variation with parameters: We now vary WSEE with important system parameters and obtain crucial insights into energy-efficient FD CF mMIMO system designing. We consider M = 32 APs, N t = N r = N = 8 AP transmit and receive antennas, K d = 12 downlink UEs, K u = 8 uplink UEs and QoS constraints S ok = S ol = 0.1 bits/s/Hz, unless mentioned otherwise.
We plot in Fig. 4a the WSEE by simultaneously varying downlink and uplink transmit power as p d = 2p u = p. We consider centralized and decentralized optimal power allocation (OPA) approaches from Algorithm 3 and Algorithm 4, respectively. We compare them with three suboptimal power allocation schemes: i) equal power allocation of type 1, labeled as "EPA 1", where η mk = bN t k∈κ dm γ d mk −1 , ∀k ∈ κ dm and θ l = 1 [11], [12], ii) equal power allocation of type 2, labeled as "EPA 2", where η mk = bN t K dm γ d mk −1 , ∀k ∈ κ dm and θ l = 1 [11], and iii) random power allocation, labeled as "RPA", where power control coefficients are chosen randomly from a uniform distribution between 0 and the "EPA 1" value. We note that i) the existing literature has not yet optimized the WSEE metric for CF mMIMO systems, and hence we can only compare with above sub-optimal schemes; ii) the decentralized ADMM approach, with lower computational complexity, has same WSEE as the centralized one; and iii) both decentralized and centralized approaches far outperform the baseline schemes. We next characterize in Fig. 4b the joint variation of WSEE and sum SE with the number of quantization bits ν in the fronthaul links. The WSEE is obtained using decentralized Algorithm 4.
We consider transmit power p d = 2p u = p = 30 dBm and take two different cases: i) high fronthaul capacity, C fh = 100 Mbps, which is sufficiently high to support all the UEs, and ii) limited fronthaul capacity, C fh = 10 Mbps, which limits the number of UEs a single AP can serve. We observe that for C f h = 100 Mbps, the WSEE falls with increase in ν, even though the corresponding sum SE increases. For C f h = 10 Mbps, both sum SE and WSEE simultaneously increase with increase in ν. To explain this behavior, we note from Fig. 2c that increasing ν improves the sum SE for C fh = 100 Mbps and C fh = 10 Mbps. For C f h = 100 Mbps, the APs serve all the UEs, i.e., K dm = K d and K um = K u , so increasing ν linearly increases the fronthaul data rate, R f h (see (7)). This, as seen from (17), increases the traffic-dependent fronthaul power consumption. Using lower number (1-2) of quantization bits is therefore more energy-efficient, as it provides sufficiently good SE with a low energy consumption. However, for C f h = 10 Mbps, K um and K dm have an upper limit, given by (9), which is inversely related to ν. The product, ν(K um + K dm ), remains nearly constant for all values of ν. Thus, R f h (see (7)) doesn't increase with increase in ν and remains close to the capacity, C f h . The traffic-dependent fronthaul power consumption, given in (17), hence, remains close to P ft . A higher number (3-4) of quantization bits therefore provides a higher sum SE and hence, also maximizes the WSEE.
Latency: The per-iteration complexity of the decentralized Algorithm 4, as observed earlier in Section IV-C, is lower than the centralized Algorithm 3. We now demonstrate the same by comparing their per-iteration runtime. For this simulation, as shown in Fig. 4c, we consider an FD CF mMIMO system with M = 32 APs, each having N t = N r = 8 transmit and receive antennas, and plot the average runtime of each iteration by varying the total number of UEs, K, with K d = K u = K/2. We note that the decentralized algorithm has significantly lower periteration runtime, particularly for large K. Both these algorithms require only large-scale channel coefficients and hence need to be executed only once in hundreds of coherence intervals.

VI. CONCLUSION
We derived SE lower bound for a FD CF mMIMO wireless system with optimal uniform fronthaul quantization. Using a two-layered approach, we optimized WSEE using SCA framework which in each iteration solves a GCP either centrally or decentrally using ADMM. We showed how WSEE incorporates EE requirements of different UEs. We analytically and numerically demonstrated the convergence of decentralized algorithm. We showed that it achieves the same APPENDIX A We use the optimal uniform quantization model from [8], [12]. Using Bussgang decomposition [28], the quantization function Q(x) =ãx + √ p xςd , where p x = E{|x| 2 } is the power of the unquantized signal x,ã = 1 px X xh(x)f X (x)dx,b = 1 px X h 2 (x)f X (x)dx andς d is the normalized distortion whose power is given as E{ς 2 d } =b −ã 2 . Here h(x) is the mid-rise uniform quantizer with L = 2 ν quantization levels rising in steps of size∆, and ν being the number of quantization bits. The signal-to-distortion ratio SDR = E{(ãx) 2 } pxE{ς 2 d )} =ã 2 b−ã 2 . The optimal step-size∆ opt maximizes the SDR for a given ν. The optimalã andb values are calculated using the optimal∆ opt for each value of ν, and are given in Table II [8]. E{|h kl | 2 }θ l = ρ u Ku l=1β kl θ l .
We express the total quantization distortion (TQD) for the kth downlink UE as follows The result in (14) follows from the expression for the achievable SE lower bound We now derive the achievable SE expression for the lth uplink UE in (15). We know, from Section II, that g u ml =ĝ u ml + e u ml , whereĝ u ml and e u ml are independent and E{ ĝ u ml 2 } = N r γ u ml . We can express the desired signal for the lth uplink UE as given next E{|DS u l | 2 } = E{|ã m∈M u l √ ρ u E{ θ l (ĝ u ml ) H (ĝ u ml + e u ml )s u l }| 2 } =ã 2 N 2 r ρ u θ l ( m∈M u l γ u ml ) 2 .
The beamforming uncertainty for the lth uplink UE is expressed as Equality (a) is because: i) e u ml andĝ u ml are zero-mean and uncorrelated; ii) E{|ĝ u ml | 2 } = N r γ u ml . Equality (b) is because E{ ĝ u ml 4 } = N r (N r + 1)(γ u ml ) 2 [5] and E{ e u ml 2 } = (β u ml − γ u ml ). We simplify the MUI for the lth uplink UE as Equality (a) is obtained by using these facts: i)ĝ u ml , g u mq are mutually independent; and ii) E{|(ĝ u ml ) H g u mq | 2 }=E{(g u mq ) H E{(ĝ u ml )(ĝ u ml ) H }g u mq }=γ u ml E{(g u mq ) H g u mq }=γ u ml E{||g u mq || 2 }=N r γ u ml β u mq . (47) We next obtain the noise power for the lth uplink UE as The undistorted MR-combined uplink signal at the mth AP is expressed as intra-/inter-AP residual interference, RI u l + (ĝ u ml ) H w u m additive noise at APs, N u