Resource Allocation for Energy Efficient User Association in User-Centric Ultra-Dense Networks Integrating NOMA and Beamforming

A coupling of wireless access via non-orthogonal multiple access and wireless backhaul via beamforming is a promising way for downlink user-centric ultra-dense networks (UDNs) to improve system performance. However, ultra-dense deployment of radio access points in macrocell and user-centric view of network design in UDNs raise important concerns about resource allocation and user association, among which notably is energy efficiency (EE) balance. To overcome this challenge, we develop a framework to investigate the resource allocation problem for energy efficient user association in such a scenario. The joint optimization framework aiming at the system EE maximization is formulated as a large-scale non-convex mixed-integer nonlinear programming problem, which is NP-hard to solve directly with lower complexity. Alternatively, taking advantages of sum-of-ratios decoupling and successive convex approximation methods, we transform the original problem into a series of convex optimization subproblems. Then we solve each subproblem through Lagrangian dual decomposition, and design an iterative algorithm in a distributed way that realizes the joint optimization of power allocation, sub-channel assignment, and user association simultaneously. Simulation results demonstrate the effectiveness and practicality of our proposed framework, which achieves the rapid convergence speed and ensures a beneficial improvement of system-wide EE.


Introduction
During the past few years, the rapid proliferation of massive wireless smart devices and the trend increase in emerging applications, e.g., eXtended reality (XR), super Hi-vision (8K) videos, ultra-immersive * Principal corresponding author * * Corresponding author Email addresses: zhanglong@hebeu.edu.cn (Long Zhang), guobinzh@163.com (Guobin Zhang), aozhy1119@126.com (Xiaofang Zhao), lyl0112@hotmail.com (Yali Li), hct12138@hotmail.com (Chuntian Huang), ecsun@bjut.edu.cn (Enchang Sun), huangwei2@epri.sgcc.com.cn (Wei Huang) games, etc., have propelled the unprecedented growth in mobile data traffic. It is predicted that the total data traffic in global scale will reach 136 EB per month and 1000 times more until 2024 from the existing Long Term Evolution (LTE) system to the fifth generation (5G) mobile system [1]. Such a thousandfold traffic growth necessitates the configuration of ultra-dense networks (UDNs) as a new evolution paradigm to meet the challenges of fulfilling network capacity and spectral efficiency (SE) enhancement requirements for 5G and beyond [2,3]. Instead of relying on a tower-mounted macro base station (MBS) with high transmit power in macrocell sending signals to a large number of user equipments (UEs), e.g., 0.2 UEs/m 2 , UDNs deploy tens or hundreds more of low-powered radio access points (APs) with smaller coverage areas to coherently provide wireless access service for those users. As such, the ultra-dense deployment of APs has potentials to bring multiple benefits, e.g., enlarged cell coverage, improved spatial reuse of wireless resources, enhanced performance gains, etc [4,5].
In spite of being advantageous, such an increasing density of APs with dense cell coverage, e.g., 10 3 APs/km 2 or more, results in complex distribution of APs in UDNs and even possible overlapped coverage for users. Therefore, simply using traditional cell-centric architecture poses extra challenges on network planning and design for UDNs, e.g., complicated resource management, severe inter-cell interference, large signalling overhead, etc. More seriously, irregular coverage of cells may cause some users exist in the overlapped area with severe interference, while other users exist in the edge of cells or areas without coverage, which seriously degrade the quality-of-service (QoS) performance of users. As such, it is imperative to implement a transformation of network architecture from cell-centric to user-centric by adopting the idea of "network serving user" and cell-free fashion [6]. In a user-centric UDN, each user is simultaneously served by its selected subset of APs, i.e., an AP group (APG), in which the density of APs is comparable to or even higher than that of users. Through the deconstruction of cellular structure, user-centric UDNs not only eliminate cell boundaries with entirely suppressed inter-cell interference, but also achieve dynamic configuration of APG and flexible resource allocation in a user-centric manner.
While user-centric UDNs with ultra-densely deployed APs overlaid with traditional MBS in macrocell enable multi-Gigabit-per-second user experience and SE increases in wireless access downlink, limited wireless resources bring about serious competitions among APs towards massive access opportunities for users [7]. This drives the research community to design more resource-efficient wireless network paradigm that copes with the scarcity of wireless resources. Recently, non-orthogonal multiple access (NOMA) has been recognized as one of the enabling air-interface techniques for 5G and beyond due to its advantages in support of overloaded transmission with limited resources and higher SE [8]. The key idea of NOMA is to allow multiple signals multiplexed to transmit simultaneously on the same frequency/time resource block (RB) by differentiating the signals through distinct power levels or userspecific codes, i.e., power-domain or code-domain multiplexing. For power-domain NOMA, successive interference cancellation (SIC) is exploited at the receiver side to decode its own received signal and reduce the undesired interference effectively. In this regard, NOMA can be well tailored to wireless access downlink scenario in user-centric UDNs, where massive connectivity and heavy data traffic for users is required over limited wireless resources. From a user-centric point of view, multiple APs, e.g., an APG, can cooperatively and concurrently serve every user on the same sub-channel in access downlink via NOMA. By doing so, significant SE enhancement will be attained in comparison with conventional orthogonal multiple access (OMA) schemes.
On the other hand, tens or hundreds more of distributed APs in user-centric UDNs impose additional constraints on the design of backhaul connections. Unlike traditional macrocell, in which a dedicated, high-capacity wired backhaul exists, e.g., optical fiber and digital subscriber line connections, it is impractical and uneconomical for every AP to be connected via fiber backhauling to core networks [9]. This is due to the dramatic increase in deployment cost and possible geographical limitations for placement, e.g., hard-to-reach locations of APs in urban areas. An alternative is to utilize wireless backhauling, which allows low-cost plug-and-play APs to employ over-the-air links to the MBS for backhauling. To reduce computational complexity of wireless backhaul design and to improve system efficiency of wireless access, there is a need to apply the clustering scheme to classify all the APs into a disjoint part based on feasible policy, e.g., channel condition and spatial location. Given this context, it is critical to manage the interference in wireless backhaul connections, especially for inter-cluster interference in downlink transmission [10,11]. Recently, multiple-antenna techniques have been regarded as a promising solution to achieve both higher SE and powerful interference mitigation via transmit beamforming [12,13], especially for multi-user downlink multiple-input multiple-output (MIMO) scenarios [14,15,16].
Thus, a natural idea is to link beamforming and wireless backhaul together to manage the interference intelligently. In wireless backhaul, with multiple antennas at the MBS, downlink beamforming can be used to simultaneously transmit the weighted signals to APs in different clusters by concentrating the signal power to an intended AP while reducing the interference generated to other APs.
Under such circumstances, the integration of wireless access via NOMA and wireless backhaul via beamforming into user-centric UDNs is not only an extension and branch of traditional UDNs, but also a practical application incentive promoted to provide significant performance gains in terms of coverage, rate, delay, capacity, SE, and energy efficiency (EE). Despite these potential advantages, such an integration also imposes additional challenges and revealed some serious concerns particularly with ultra-dense and random deployment of APs and user-centric view of network optimization design. Firstly, relying on the sub-channels, every user is capable of being jointly associated with multiple APs for wireless access, and every AP has to be wirelessly connected to the MBS for backhauling. Hence, an increased complexity incurred by ultra-densely deployed nodes makes user association along with AP-MBS association a challenging problem. Secondly, due to limited available resources shared by high number of users and APs, flexible and efficient resource allocation schemes are essential and very crucial to alleviate competition, control interference, and optimize system performance. Thirdly, a large-scale deployment of APs inevitably triggers a enormous growth of energy consumption, causing global warming for our planet and more operational costs for network operators. As such, it is of paramount importance to take the EE into account in design objective for user-centric UDNs from the green communication perspective. Furthermore, user association also shows significant influence on the overall system-level energy consumption [17,18]. For instance, some of the APs are highly overloaded due to excessive associations with users, resulting in similar amount of energy consumed by other lightly underutilized APs, which degrades long-term EE performance. It is for this reason that energy efficient user association is a key issue in the field of EE in UDNs. Aiming to address the above problems, there are two key network bottlenecks that must be overcome, namely resource allocation for large-scale node deployments over the shared radio resources and energy efficient user association for achieving a load balancing of APs and MBS. Admittedly, these bottlenecks and challenges motivate the need for better understanding of the interplay between resource allocation and energy efficient user association, which typically require a trade-off between them.
Motivated by the above observations, we can find that the exploration of resource allocation for energy efficient user association has become highly valuable. Our objective in this paper is to achieve the resource allocation for energy efficient user association for identifying such an interplay under the scenario of user-centric UDNs integrating wireless access and wireless backhaul. To the best of our knowledge, the problem of resource allocation for energy efficient user association through the efficient integration of user-centric UDNs with NOMA and beamforming has yet not been thoroughly studied in the literature. For bridging the research gap, we investigate a resource allocation problem in this paper for energy efficient user association for downlink user-centric UDNs integrating wireless access via NOMA and wireless backhaul via beamforming, aiming to maximize the system EE under the constraints of achievable rate for wireless access/backhaul connection, transmit power limit of the MBS and every AP, and user association relations. The main contributions of this paper can be summarized as follows: • We develop a novel resource allocation optimization framework to achieve the energy efficient user association in downlink transmission of user-centric UDNs by jointly taking into account wireless access and wireless backhaul. This is a new approach to user-centric view of network optimization design in UDNs to capture the EE balance through a flexible paradigm of tightly integrating access downlink via NOMA and backhaul downlink via beamforming from a global standpoint. Our framework is the first time in the literature to identify a close coupling of NOMA based wireless access and beamforming based wireless backhaul in downlink user-centric UDNs.
• We formulate the resource allocation problem for energy efficient user association under such an integration of user-centric UDNs with NOMA and beamforming as a large-scale non-convex mixedinteger nonlinear programming problem, which is NP-hard to solve in reasonable time with the growing numbers of users and APs. The objective of joint resource allocation and user association is to maximize the system EE of downlink transmission subject to the constraints of achievable data rate for wireless access and backhaul connection, maximum transmit power for the MBS and each AP, and user association relations. The framework is shown to jointly optimize the transmit power allocated to users and APs, the sub-channel assignment for access and backhaul downlink, and the association relations for both user-AP and AP-MBS simultaneously.
• To tackle this problem with a reduced computational complexity, we firstly conduct a series of reformulation based on the time-sharing relaxation strategy to relax the binary variables for user association. Then the sum-of-ratios decoupling method is used to transform the fractional structure of the relaxed objective function into an equivalent parametric subtractive function. We accordingly employ the iterative successive convex approximation to convert the original highly non-convex problem into a series of convex subproblems via the exponential-logarithmic approximation, and apply the Lagrangian dual decomposition approach to solve these optimization subproblems. To ensure rapid convergence speed of the optimal power update, an effective algorithm with polynomial complexity in a fully distributed fashion is developed to determine a specific execution coordination between sub-channel assignment and power allocation.
• Through extensive simulations, we demonstrate the proposed algorithm is indeed an efficient and practical solution for joint resource allocation and user association in user-centric UDNs integrating NOMA and beamforming, and we obtain insights into how the various system parameters influence the convergence speed of optimal power update and system-wide EE. With regard to the same system parameters and requirements of data rate and power consumption for each user, each AP, and the MBS, we also show that the overall EE performance from a system point of view is always superior with the proposed framework when compared with the baseline schemes.
The rest of this paper is organized as follows. We first introduce the related work in Section 2.
Section 3 describes the system model, followed by a construction of the optimization problem. In Section 4, we present the problem reformulation through the relaxation of binary variables, the sum-of-ratios decoupling, and the successive convex approximation technique. Section 5 provides the Lagrangian dual decomposition method to solve the convex subproblem and proposes a decentralized iterative algorithm to derive the feasible solutions. In Section 6, we present the simulation results to evaluate the proposed optimization framework. Finally, we conclude our paper in Section 7.
Notation: Throughout this paper, we use a, a, A, and A to denote a scalar variable, a vector, a matrix, and a set, respectively. The distribution of a circularly symmetric complex-valued Gaussian random variable x with mean and variance σ 2 is represented by x ∼ CN , σ 2 , where ∼ stands for "distributed as". The identity matrix, or sometimes ambiguously called a unit matrix, is denoted as I, and an (n × n)-dimensional identity matrix is defined by I n . The superscript [·] T refers to the transpose of a matrix or a vector. In addition, we denote the statistical expectation of a random variable by the notation E {·}. Symbol C is used to indicate the complex number field. An n-dimensional complex vector is represented by C n×1 , whereas C n×m corresponds to the generalization to an (n×m)-dimensional complex matrix.

Related Work
Currently, many potential issues in the realization of user-centric UDNs have been identified and discussed separately [2,3,4,6]. Among them, resource allocation is a critical issue that has gained widespread popularity. In [19] Nguyen et al. [29] to maximize the cost efficiency in content delivery. However, the above related works in [28,29] applied the downlink beamforming only to the scenario of access links and did not consider the wireless backhaul design.
To sum up, as shown in Table 1, although a lot of works have been carried out on the resource allocation problem in user-centric UDNs, NOMA-aided UDNs, and beamforming-aided UDNs extensively, efficient integration of user-centric UDNs with NOMA and beamforming techniques has not been fully utilized. This research gap motivates us to pursue a solution for the problem of joint resource allocation and user association optimization to maximize the system-wide EE of downlink transmission integrating both access downlink via NOMA and backhaul downlink via beamforming.

System Model and Problem Formulation
In this section, we first introduce the network model of a typical user-centric UDN. Under this system configuration, we provide the transmission model from the downlink perspective, i.e., access downlink via NOMA and backhaul downlink via beamforming, and further describe the power consumption model for downlink transmission. Then, the system EE maximization problem for downlink transmission will be formulated.

Network Model
Consider a user-centric UDN as shown in Fig. 1, where an MBS with a large scale antenna array is located in the center with a large number of APs, denoted by a set M = {1, 2, · · · , M }, densely deployed within the macrocell coverage of that MBS. Particularly, the macrocell is connected to the core networks through optical fiber backhaul and the MBS is responsible for wireless backhaul connections for all the APs. The coverage radius of the macrocell is specified by r. There also exist N users randomly distributed in the overlapping macrocell coverage area, denoted by a set N = {1, 2, · · · , N }, sharing the same spectrum resource with the MBS and the APs. Note that each AP is equipped with one or more receive antenna(s) for backhaul connections, and also configured with multiple transmit antennas to serve more users simultaneously in a user-centric fashion. We assume that the locations of the APs are modeled by an independent homogeneous Poisson point process (PPP) Φ ρ1 with density ρ 1 = M πr 2 that is comparable to or even larger than user density ρ 2 = N πr 2 . For simplicity, we utilize a quasistatic deployment scenario for users, such that the location of each user remains unchanged within the considered time duration 1 . 1 We would like to mention that our proposed optimization framework for joint resource allocation and user association is conducted within the considered time duration, which can be interpreted as a specific time slot or a period of time. However, the results about this framework will be easily extendable to the general case for multiple time slots. In this paper, we focus on joint resource allocation and user association in downlink transmission of such a user-centric scenario by integrating wireless access and wireless backhaul. Specifically, the wireless downlink consists of two parts: (i) access downlink from an AP to a user in the corresponding cluster, and (ii) backhaul downlink from the MBS to an AP in the macrocell 2 . For the coordination between the MBS and the AP, we adopt a dynamic time division duplex (TDD) mode [7], in which both the MBS and the AP can independently transmit in wireless backhaul and wireless access, respectively.
The total available bandwidth W is equally divided to K orthogonal sub-channels, represented by a set K = {1, 2, · · · , K}. So each sub-channel has an equally-sized bandwidth of = W K . Due to the dense deployment scenario, we consider the universal frequency reuse policy so that the sub-channels are available to all the users for wireless access and all the APs for backhaul connections, respectively. To avoid the interference between access downlink and backhaul downlink, sub-channel set K is separated into two subsets, i.e., A = {1, 2, · · · , δ} for access downlink and B = {δ + 1, δ + 2, · · · , K} for backhaul downlink. In other words, the former δ sub-channels in K are used for wireless access, and the other K − δ sub-channels in K are selected for wireless backhaul.
Let us assume that perfect knowledge of the channel side information (CSI) for every sub-channel is known at both the MBS and every AP. In accordance with perfect CSI of every sub-channel, the APs allocate a subset of A to the users, and the MBS assigns a subset of B to the APs. To strike a balance between efficient user-centric wireless access and computational complexity, ultra-densely distributed APs are initially separated into F disjoint clusters based on their spatial directions 3 , denoted by a set Fig. 1. We suppose that an AP can only provide wireless access service exactly for one or more user(s) over a subset of A within the same cluster to avoid extra inter-cluster interference. More precisely, in every cluster f , user n can be simultaneously associated with at most M f APs on one or more sub-channel(s) within the considered time duration, for 0 ≤ M f M , f ∈ F, and n ∈ N . As such, M f APs in cluster f constitute a generalized APG, denoted by a set G f , to serve user n by concurrently transmitting independent signals in a user-centric way 4 , for G f ⊂ M. We wish to remark that the APs in generalized APG G f also belong to cluster f .

Access Downlink via NOMA
In the access downlink, a user in each cluster can be simultaneously served by multiple APs in a user-centric fashion through an assigned sub-channel from A. Motivated by that, we assume that the considered system adopts the power-domain NOMA for access downlink transmission, which enables that multiple signals from the APs in a cluster can multiplex on the same sub-channel at the same time.
According to the NOMA principle, one user can receive from the APs in the same cluster via multiple sub-channels, and one sub-channel can be assigned to multiple users.
For convenience, let us define a binary variable as follows to indicate the association relationship between user n on sub-channel k and AP m in cluster f , for f ∈ F, m ∈ M, n ∈ N , and k ∈ A: Let P AP f,m,n,k denote the allocated transmit power of AP m in cluster f to user n on sub-channel k. We further assume that all the sub-channels for access downlink follow a quasi-static block fading, where the channel gains remain to be constant over the considered time duration, but may vary independently between different time duration. As such, we denote the downlink channel coefficient from AP m in cluster f to user n on sub-channel k as h f,m,n,k = g f,m,n,k d −ϑ1 f,m,n , where g f,m,n,k is the flat Rayleigh fading channel gain, d f,m,n is the distance between AP m in cluster f and user n, and ϑ 1 is the path loss exponent. Let N f,k be the number of users using sub-channel k in cluster f , and s f,m,n,k be the transmitted symbol of AP m in cluster f to user n on sub-channel k. Thus, the received signal at user n on sub-channel k from AP m in cluster f can be expressed as: where z n,k ∼ CN 0, σ 2 n,k is the additive white Gaussian noise (AWGN) at user n on sub-channel k with zero mean and variance σ 2 n,k . After receiving the superposed signals from M f,k APs on subchannel k in generalized APG G f , user n employs the SIC technique to decode its desired messages, Let H f,m,n,k = |h f,m,n,k | 2 /σ 2 n,k represent the channel to noise ratio (CNR) of sub-channel k from AP m in cluster f to user n. Without loss of generality, we assume that the CNRs of the received signals at user n on sub-channel k served by M f,k APs on sub-channel k in generalized APG G f are sorted in the ascending order, i.e.: referred to as the result of the task of classifying all the APs into a specific disjoint part according to their spatial location relations. From a user perspective, an APG is a subset of APs in an AP cluster, and each AP in this subset is associated with that user in a user-centric fashion. 5 It should be pointed out that the group of M f,k APs on sub-channel k can be deemed to a subset of generalized APG G f on the entire sub-channels.
Note that the received signals with lower CNRs from the APs in a generalized APG are allocated higher powers and can be recovered by treating the received signals with lower powers as the interference in the SIC decoding [30,31]. To be precise, for the received signal from AP m, user n on sub-channel k first decodes the message from AP j in generalized APG G f , for j < m, and then removes this message from its received signals, in the order of j = 1, 2, · · · , m − 1. Through the sequential decoding, the signals from AP j can be treated as the interference, for j > m. As a result, the received signal-to-interference-plusnoise ratio (SINR) at user n on sub-channel k served by AP m in generalized APG G f by performing the SIC is given by: where M f,k j=m+1 H f,j,n,k P AP f,j,n,k is the interference that user n on sub-channel k receives from other APs in generalized APG G f . Correspondingly, the achievable rate (in bit/s) of user n on sub-channel k served by AP m in generalized APG G f can be written as: Recall that one or more user(s) in N over a subset of A can access to multiple APs in every cluster through a user-centric way. Let N f denote the number of users that are associated with the APs in cluster f , for 0 ≤ N f N . Therefore, the achievable sum rate of the system for access downlink via NOMA is calculated by:

Backhaul Downlink via Beamforming
In the backhaul downlink, the MBS concurrently transmits independent signals to the APs in different clusters over the sharing sub-channels. By exploiting multiple antennas at both the MBS and the APs, downlink beamforming is considered in wireless backhaul not only to increase the SE, but also to combat the inter-cluster and intra-cluster interference.
Let Q be the number of the transmit antennas for beamforming in the antenna array of the MBS, The downlink channel between the MBS and φ f,k APs on sub-channel k in cluster f is described by a fading) channel coefficient vector that is assumed to be complex Gaussian distributed with zero mean and unit variance matrix, i.e.,h f,m,k ∼ CN (0, I Q ). Thus, such kind of channel coefficient is time invariant over the considered time duration, but may still vary from different time duration. Moreover, we suppose that the channel coefficient vector is available at the MBS by the aid of CSI feedback information [12].
In order to represent the association relationship between the MBS and AP m on sub-channel k in cluster f , for f ∈ F, m ∈ M, and k ∈ B, a binary variable is also introduced, which can be defined by: Let us utilize s k = [s 1,k , s 2,k , · · · , s F,k ] T ∈ C F ×1 to represent the transmitted symbol vector of the MBS on sub-channel k for F clusters. Assume that P MBS f,m,k is the allocated transmit power of the MBS to AP m on sub-channel k in cluster f . Thereby, the transmitted symbols for φ f,k APs on sub-channel k in cluster f can be expressed as: where s f,m,k is the normalized transmitted symbol of the MBS to AP m on sub-channel k in cluster f , i.e., To carry out the downlink beamforming, let w f,m,k be the beamforming vector for AP m on sub-channel k in cluster f . Accordingly, the MBS's beamforming matrix on sub-channel k for F clusters is given by is the beamforming vector for φ f,k APs on sub-channel k in cluster f . Note that the conventional beamforming approaches can be used in that the downlink channel coefficient vectors are known at the MBS as mentioned earlier. However, we do not discuss the issue of the beamforming vector optimization as it is beyond the scope of the paper.
By combining the transmitted symbol vector and the MBS's beamforming matrix on sub-channel k for F clusters, we can obtain the transmitted signals on sub-channel k, i.e., X k = W k s k . To simplify analysis, we consider that the number of the used transmit antennas for beamforming at the MBS is equal to the number of APs on sub-channel k in cluster f . As a result, the received signal at AP m on sub-channel k in cluster f can be modeled as: where z m,k ∼ CN 0, σ 2 m,k is the AWGN at AP m on sub-channel k with zero mean and variance σ 2 m,k . Thus, the SINR at AP m on sub-channel k in cluster f for backhaul downlink via beamforming can be obtained as follows: where P MBS ,k is the total transmit power of the MBS to the APs on sub-channel k in cluster , for ∈ F \ {f }. It suffices to mention that the received signal at AP m on sub-channel k in cluster f is corrupted by intra-cluster interference, inter-cluster interference, and AWGN. For analytical simplicity, we employ the zero-forcing beamforming to eliminate the inter-cluster interference [32]. As such, the achievable rate (in bit/s) of AP m on sub-channel k in cluster f is given by: In consequence, the achievable sum rate of the system for backhaul downlink via beamforming can be denoted as:

Power Consumption Model
Power consumption during downlink transmission with the combination of wireless access via NOMA and wireless backhaul via beamforming is considered in this subsection. The total system power consumption is divided into the power consumed in access downlink and the power consumed in backhaul downlink.
For the access downlink, the power consumption is aimed at the power consumed at the users in receiving mode and at the APs in transmission mode, respectively. To be precise, the power consumption for user n in cluster f can be written as P Con where P R f,n is the constant circuit power consumption for received signal processing, P D f,n is the dynamic circuit power consumption for signal decoding, and ψ A is correlated with the number of APs in every APG on each sub-channel.
Additionally, the power consumption for AP m in cluster f sending signal to user n on sub-channel k is determined by the transmitter circuit power consumption P C m and the transmit power P AP f,m,n,k , i.e., P Con m = P C m + P AP f,m,n,k . Thus, the sum power consumption in access downlink can be expressed as: Receiving mode for users Transmission mode for APs a f,m,n,k P Con f,n + P C m + P AP f,m,n,k .
For the backhaul downlink, the power consumption consists of the power consumed at the APs in receiving mode and at the MBS in transmission mode. Similarly, the power consumption for AP m in cluster f can be specifically defined as P Con where P R f,m is the constant circuit power consumption for received signal processing, P D f,m is the dynamic circuit power consumption for signal decoding, and ψ B is also correlated with the number of APs in every cluster on each sub-channel. In addition, the power consumption of the MBS for downlink beamforming mainly depends on the transmit power P MBS f,m,k of the MBS to to AP m on sub-channel k in cluster f . Accordingly, the sum power consumption in backhaul downlink is given by: Receiving mode for APs Based on the sum power consumption in both access downlink and backhaul downlink, the total power consumption for downlink transmission can be represented as: Access downlink

Problem Formulation
In this paper, we investigate the resource allocation problem for energy efficient user association in downlink transmission of the system with the emphasis on the EE metric. It has been shown that the system-wide EE metric of interest is generally described in terms of bit-per-Joule capacity, to indicate how efficiently one Joule power consumption is utilized for data transmission of the system. Considering the wireless access via NOMA and the wireless backhaul via beamforming, the actual total achievable rate (in bit/s) of the system for downlink transmission is in general obtained by: From the perspective of wireless backhaul connections, the MBS in the macrocell must provide enough data rate for the APs to guarantee that all the users can obtain wireless access from these APs in a usercentric way. To reach this goal, the achievable sum rate of the system for backhaul downlink should not be less than that for access downlink, i.e., R BD Sum ≥ R AD Sum . Thus, the actual total achievable rate (in bit/s) for downlink transmission, henceforth referred to as the sum of data rate, on wireless access downlink of the system for all the users, can be expressed by R Tot = R AD Sum . Therefore, the system EE of downlink transmission, denoted by ξ EE (in bit/Joule), can be formally defined as the ratio of the total achievable rate R Tot (in bit/s) to the total power consumption P Tot (in Watt), which is then calculated as follows: Under the above setup, our objective is to maximize the system EE of downlink transmission while guaranteeing the data rate and power consumption requirements for the users, the APs, and the MBS, by the joint optimization of resource allocation and user association. Let R min n denote the minimum data rate for user n. We further employ P max and P max m to stand for the maximum transmit power of the MBS and the maximum transmit power of AP m, respectively. Then the optimization problem can be mathematically formulated as: With the constraint in (18b), the achievable rate of every user for wireless access via NOMA must satisfy its minimum data rate constraint. Constraint (18c) ensures that the achievable rate from the MBS to every AP for backhaul connection via beamforming has to be greater than wireless access rate from that AP to the users. Constraint (18d) is imposed to guarantee the MBS's maximum transmit power limit, and constraint (18e) indicates that the transmit power of every AP is restricted by its maximum power level. Finally, constraints (18f) and (18g) hold due to the definition of binary variable a f,m,n,k in access downlink (k ∈ A) and binary variable b f,m,k in backhaul downlink (k ∈ B), respectively.

Problem Analysis and Reformulation
In this section, we consider the solution to the optimization problem P1 to find an optimal resource allocation and user association scheme. Clearly, the problem is a non-convex mixed-integer nonlinear programming problem due to the existence of the interference terms in the objective function in P1, the nonlinear rate constraints in (18b) and (18c), and the binary-constrained variables in (18f) and (18g).
Such kind of problem is NP-hard and computationally intractable. Especially, for the UDN scenario with larger numbers of densely distributed users and APs, it is extremely difficult to solve the problem directly with feasible lower complexity.
To efficiently solve the problem, we need to transform it into a more tractable convex optimization problem. Having this in mind, we first relax the binary variables into continuous real variables to redesign some constraints for problem reformulation. Then, we leverage the sum-of-ratios decoupling strategy to achieve the transformation of fractional structure of the relaxed objective function into an equivalent parametric subtractive one. Lastly, we use the exponential-logarithmic transformation policy to construct a series of convex optimization subproblems, and further apply the method of iterative successive convex approximation (SCA) to obtain the feasible solutions by iteratively tightening the lower bounds of the achievable sum rate functions.

Relaxation of Binary Variable
As  (6) and (12) for access downlink and backhaul downlink can be respectively rewritten as: and Accordingly, the total power consumption in (15) for downlink transmission can be derived as: With such a relaxation process in mind, P1 can be reformulated as following problem: where . We wish to remark that the optimal solution of the reformulated problem P2 can be viewed as an upper bound of the solution to P1 through the relaxed binary variables and constraints.

Equivalent Reformulation via Sum-of-Ratios Decoupling
Although P1 has been transformed into a new one, we can easily find that P2 is still not a convex problem. It is still rather challenging to derive an optimal solution for this problem due to the reasons: (i) the existence of the interference terms and the fractional component for the objective function in P2, and (ii) the nonlinear and non-convex constraints in (22b) and (22c). Thus, we need to further convert this problem into an equivalent but more tractable one. Let us first recheck the structure of the objective function in P2, which can be specifically rewritten by: From (23), we can observe that the objective function holds the structure of a nonlinear sum of fractional functions. To maximize a sum of fractional functions subject to the non-convex constraints is a sum-of-ratios fractional programming problem, which is difficult to solve by conventional optimization methods [34]. To address this problem, we attempt to adopt the sum-of-ratios algorithm by decoupling the numerators and denominators of the objective function with fractional structure. More particularly, according to [34], the fractional form objective function in P2 is further reformulated into an equivalent parametric subtractive structure. Thereby, the optimization objective in P2 can be expressed as: max {af,m,n,k,bf,m,k, P AP f,m,n,k , P MBS where µ is an auxiliary parameter. So far, we break down the fractional structure of the objective function via the sum-of-ratios decoupling. Unfortunately, the objective function in (24) is still non-concave due to the interference terms in highly non-concave sum rate function R AD Sum . To obtain the convex structure of the objective function, by the help of the feature of logarithmic structure, we can rewrite R AD Sum as the following difference of convex structures: Through the above logarithmic operation, R AD Sum in the objective function in (24) can be formulated as a sum of difference of convex functions. As a result, P2 can be further expressed by: s.t. (22b), (22c), (22d), (22e), (22f), (22g). (26b)

Successive Convex Approximation
Apparently, the problem P3 is not convex because the constraints in (22b) and (22c) is highly nonconcave. To tackle such an issue, we resort to the SCA method for solving the non-convex optimization problem, where, in each iteration, the original highly non-convex problem is approximately transformed into a convex problem [35]. According to [36,37] where α f,m,n,k and β f,m,n,k are the auxiliary approximation variables, respectively. When the following constants are satisfied, the approximation of R AD Sum is equivalent to or tight at the lower bound in (27) In the same way, by applying (20), we can also obtain a lower bound of R BD Sum , which is specified by: where Λ f,m,k and Ξ f,m,k are the auxiliary approximation variables, respectively. When the following constants are satisfied, the approximation of R BD Sum is further achieved as the lower bound in (30), i.e.: For the given approximation variables α f,m,n,k , β f,m,n,k , Λ f,m,k , and Ξ f,m,k , we then transform P3 into an approximated one, i.e.: a f,m,n,k α f,m,n,k log 2 γ AD f,m,n,k + β f,m,n,k ≥ R min n , ∀n, (33b) 6 Note that the use of the logarithmic approximation makes a relaxation of highly non-concave sum rate function R AD Sum achieve the lower bound when both of the approximation constants are guaranteed. That is, the lower bound is said to be a tight lower bound.
a f,m,n,k α f,m,n,k log 2 γ AD f,m,n,k + β f,m,n,k , ∀m, a f,m,n,k ∈ [0, 1] , ∀f, ∀m, ∀n, ∀k, Apparently, the problem P4 is still non-concave. To address this issue, we intend to exploit the exponential-logarithmic transformation to achieve the logarithmic change of variables, i.e., P AP f,m,n,k = log 2 P AP f,m,n,k , for f ∈ F, m ∈ M, n ∈ N , and k ∈ A, and P MBS f,m,k = log 2 P MBS f,m,k , for f ∈ F, m ∈ M, and k ∈ B. For the exponential structure, we have P AP f,m,n,k = exp P AP f,m,n,k and P MBS f,m,k = exp P MBS f,m,k . To this end, by applying the logarithmic change of variables into a logarithmic transformation of the objective and constraint functions, we arrive at the following approximate parametric subproblem: a f,m,n,k α f,m,n,k log 2 γ AD f,m,n,k + β f,m,n,k , ∀m, exp P MBS f,m,k ≤ P max , ∀f, ∀m, ∀k, Solve subproblem P5 to obtain optimal solutions P AP(τ ) f,m,n,k = exp P AP(τ ) f,m,n,k and P f,m,n,k and β (τ +1) f,m,n,k to tighten the bound in (27) according to (28) and (29). f,m,k to tighten the bound in (30) according to (31) and (32).
It should be pointed out that the approximate subproblem P5 follows the log-sum-exp function structure after the exponential-logarithmic transformation. Given the fact that the log-sum-exp function is strictly convex [38], we finally convert P1 into a standard convex maximization problem with logarithmic change variables.
For convex problem, lots of traditional convex optimization solutions can be used to solve it. In fact, we only maximize a lower bound of the objective function in P5. To eventually solve P5, by help of the SCA approach, we need to further tighten the bound in (27) by iteratively updating α f,m,n,k in (28) and β f,m,n,k in (29), and meanwhile tighten the bound in (30) by iteratively updating Λ f,m,k in (31) and Ξ f,m,k in (32). After obtaining the optimal solution of P5, through the exponential transformation, we then derive the relaxed binary variables P AP f,m,n,k = exp P AP f,m,n,k and P MBS f,m,k = exp P MBS f,m,k , namely, the optimal power allocated by AP m in cluster f to user n on sub-channel k as well as the optimal power allocated by the MBS to AP m on sub-channel k in cluster f .
The detailed procedure of the adopted iterative algorithm via the SCA method to tighten the bounds in (27) and (30) is summarized in Algorithm 1. It is noteworthy that Algorithm 1 is implemented in an iterative way for each AP and the MBS, and is also distributed with guaranteed convergence and low complexity. For each iteration, approximation variables α  f,m,k . These bounds will be improved successively during each iteration, and the iterative process will terminate after finite iterations. So far, we have transformed P1 into a sequence of convex maximization subproblems P5 through the exponentiallogarithmic approximation. In the following section, we will design an effective algorithm to solve P5 for obtaining the optimal solutions, aiming to achieve the joint power, sub-channel allocation, and user association in reasonable time complexity.

Lagrangian Dual Decomposition
Since P5 is a standard convex maximization problem after the SCA process, we can adopt the Lagrangian dual decomposition method to solve it to obtain the optimal sub-channel and power allocation for energy efficient user association. The detailed procedure is given in the following. The Lagrangian function corresponding to P5 can be expressed by: a f,m,n,k α f,m,n,k log 2 γ AD f,m,n,k + β f,m,n,k where λ is the Lagrange multiplier (i.e., the dual variable) vector associated with constraint (34b) on the minimum data rate requirement for each user, ϕ is the Lagrange multiplier vector for constraint (34c) on the achievable rate between the backhaul connection and the wireless access of each AP, η is the Lagrange multiplier corresponding to constraint (34d) on the maximum transmit power for the MBS, and χ is the Lagrange multiplier vector accounting for constraint (34e) on the maximum transmit power of each AP. The boundary constraints (34f) and (34g) will be absorbed in the Karush-Kuhn-Tucker (KKT) conditions [38]. Thereby, the Lagrange dual function is obtained as: Thus, the Lagrangian dual problem can be represented by: Due to the differentiability of the Lagrange dual function, we then perform the update process of the Lagrange dual multipliers in (37) based on the subgradient method to minimize the dual. Let l and L max stand for the iteration index and the maximum number of iterations for the dual multiplier update process, respectively. Concretely, in the (l + 1)-th iteration, for l = 1, 2, · · · , L max , the dual multipliers can be independently updated by: where ζ χ are the step sizes at the (l)-th iteration for dual multipliers λ n , ϕ m , η, and χ f,m , respectively. Additionally, the step size for each dual multiplier should satisfy the following conditions:

Optimal Solution for Joint Resource Allocation and User Association
We are now ready to enumerate the KKT conditions. Let us use P * AP f,m,n,k , P * MBS f,m,k , a * f,m,n,k , and b * f,m,k to represent the optimal solutions to P5, respectively. According to the KKT conditions, upon taking the partial derivative of the Lagrangian function L (· · · ) with respect to P AP f,m,n,k and P MBS f,m,k in (35), respectively, the optimal solutions P * AP f,m,n,k and P * MBS f,m,k to P5 can be respectively obtained as: and After some necessary algebraic manipulations, we then easily obtain the optimal power allocated by AP m in cluster f to user n on sub-channel k, and the optimal power allocated by the MBS to AP m on sub-channel k in cluster f , which can be given as follows: and It is noticeable that there does not exist a derived closed-form expression of the optimal power allocation values from (48) and (49). However, the existence and uniqueness of the optimal power allocation P * AP f,m,n,k and P * MBS f,m,k are guaranteed according to [37]. Due to the space limitation, specific detail about the strict mathematical proof of the existence and uniqueness of optimal power allocation is omitted here, and readers can refer to [37] for more detailed description. Besides, we also would like to mention that the update of the optimal power allocation can be made locally by each AP and the MBS, respectively, via iteratively updating dual multipliers λ n , ϕ m , η, and χ f,m .
Meanwhile, according to the KKT conditions, upon taking the partial derivative of the Lagrangian function L (· · · ) with respect to a f,m,n,k and b f,m,k in (35), respectively, the optimal solutions a * f,m,n,k and b * f,m,k to P5 can be respectively calculated by: ∂L (· · · ) ∂a * f,m,n,k = ( + λ n − ϕ m ) α f,m,n,k P * AP f,m,n,k + α f,m,n,k log 2 (H f,m,n,k ) + β f,m,n,k −α f,m,n,k ( + λ n − ϕ m ) log 2 and Therefore, sub-channel k * is assigned to user n by AP m in cluster f by performing the maximization operation of ∂L(··· ) ∂a * f,m,n,k in (50), such that we have a * f,m,n,k * = 1, which is further expressed as: Similarly, sub-channel k * is also assigned to AP m in cluster f by the MBS by performing the maximization operation of ∂L(··· ) ∂b * f,m,k in (51), such that we obtain b * f,m,k * = 1, which can be specified by: From (52) and (53), it suffices to mention that an assignment of 1 to either a * f,m,n,k or b * f,m,k not only achieves the optimal sub-channel allocation to each user or each AP, but also indicates the determination of user association index, namely, the association relation for the user-AP or the AP-MBS.
So far, we have devised Algorithm 1 to generate the updated approximation variables used for tightening the bounds in (27) and (30), and also have given a solution for joint resource allocation and user association problem by incorporating the approximation variables as well as the iteratively updated dual multipliers. By taking the advantage of the Lagrangian dual decomposition, we still need to devise an effective algorithm to identify a specific execution coordination between power allocation and subchannel assignment and further to ensure fast convergence of the optimal power update. As a result, we present a distributed iterative algorithm to realize the joint optimization of power allocation, sub-channel assignment, and user association simultaneously, which is sketched in the Algorithm 2.
In Algorithm 2, the Lagrange multipliers are firstly assumed to an fixed value after the setup of initialization. Then, the approximation variables are obtained by using Algorithms 1. Then, the algorithm undertakes the iterative process. In each iterative process, each user and each AP can distributvely update the corresponding user association index by using the assigned sub-channels. Based on the results of the optimal sub-channel assignment and user association, each AP and each MBS can also update their transmit powers in a distributed manner. By updating the sub-channel assignment and user association Calculate sub-channel k * for the AP-MBS association b * f,m,k * according to (53).

10:
Use sub-channel k * in Step 9 to update b * f,m,k * .

11:
for n = 1 to N do 12: Calculate sub-channel k * for the user-AP association a * f,m,n,k * according to (52).

13:
Use sub-channel k * in Step 12 to update a * f,m,n,k * .    as well as the power allocation alternatively, the iteration process is terminated when the convergence of the optimal power update is guaranteed or the maximum number of iterations is reached. to be able to achieve good performance with reasonable training time [39,40,41]. However, it is out of the scope of this work and it will be a topic for our further study.

Remark 2:
For the proposed framework, we adopt the SCA method to obtain the feasible solutions to P5 by iteratively tightening the lower bounds of the achievable sum rate functions. Note that the SCA method has been proved to provide the global optimum in most cases according to [37]. Meanwhile, the sum-of-ratios algorithm is mainly applied to reformulate P2 into an equivalent parametric subtractive structure. However, when the number of sub-channels is larger than that of users, the Lagrangian relaxation has been proved to be near optimal for relaxing binary variables to be continuous real ones [33]. Consequently, the proposed iterative algorithm provides a near optimal solution to P1.

Computational Complexity Analysis
In this subsection, we analyze the computational complexity of the proposed iterative algorithm. The computational complexity of Algorithm 2 mainly resides in the determination of the updated approximation variables to tighten the bounds in (27) and (30), as well as sub-channel assignment, user association, and power allocation by iteratively updating the Lagrange multipliers.
The calculation of the updated approximation variables is implemented by Algorithm 1 as mentioned above. For Algorithm 1, in the step of obtaining P AP f,m,n,k and P MBS f,m,k , we denote κ as the number of elementary steps needed for solving P5 at each iteration. Then the complexity of this step is achieved by O (κN ). In the step of updating α f,m,n,k in (28) N f )). Therefore, the sum complexity of sub-channel assignment, user association, and power allocation in each iteration can be be expressed as: Let ∆ 2 denote the number of iterations needed for the algorithm convergence. Then the total complexity of sub-channel assignment, user association, and power allocation is derived as To summarize, the overall computational complexity of the proposed algorithm can be calculated as . Consequently, the proposed algorithm determines a specific execution coordination between sub-channel assignment and power allocation with a low-polynomial time complexity.
Remark 3: Different from the user-centric access framework in NOMA-based UDNs for both access and backhaul downlink proposed by [7], we conceive the idea of linking the beamforming with wireless backhaul to control the inter-cluster interference in macrocell intelligently in spite of the application of NOMA for wireless access. In addition to the different wireless techniques used for backhual downlink, we primarily utilize the sum-of-ratios decoupling and SCA methods to obtain the problem transformation for the complex system EE maximization problem. However, the work in [7] adopts the multiple-to-one twoside matching and difference-of-convex programming theories for seeking the conversion of the similar system EE maximization problem and the feasible solutions with low-complexity. From the solution perspective, due to the different frameworks, our proposed algorithm obviously differs from the adopted matching algorithm and iterative resource allocation algorithm in [7], which may result in different computational complexities. As listed in Table 1, the brief comparison of our proposed framework with the work in [7] is summarized from six aspects, and the detailed comparison is omitted here for the page limitation.

Simulation Results
In this section, we conduct simulation experiments to evaluate the performance of our proposed resource allocation optimization framework, and to gain insights into how the various system parameters affect the achievable EE in a user-centric UDN integrating access downlink via NOMA and backhaul downlink via beamforming. The performance of the proposed algorithm in our framework is compared with three conventional baseline schemes, which can be summarized as follows: • Equal-power based allocation scheme: As a classic power control strategy originally used in multicarrier systems, it can uniformly distribute power over all the sub-carriers to asymptotically maximize the sum network utility [42]. For performance comparison, we adopt the idea of the equalpower based allocation strategy to allocate transmit power for the users in wireless access and the APs in wireless backhaul. Based on this strategy, the power is equally allocated by every AP to each associated user on the corresponding sub-channel according to AP's maximum power P max m , and the power is also equally allocated by the MBS to all the APs on their corresponding sub-channels according to MBS's maximum power P max .
• Distance-based association scheme: Different form the baseline power allocation scheme as mentioned above, we here resort to the distance-based association approach as described in [43] as another baseline scheme for performance comparison. In this scheme, each user associates with the nearest AP in the cluster on the corresponding sub-channel for wireless access in a distributed manner. That is, user association at each user is determined by the distance metric between the user and the AP. Here, we do not exploit this distance metric to update the AP-MBS association relationship. However, the user-AP association can be still obtained by iteratively calculating sub-channel k * according to (53) in Algorithm 2.
• Max-SINR association scheme: In conventional heterogeneous cellular networks, the max-SINR association method always associates UEs with the AP that can offer the highest received SINR and allocates the redio resources accordingly [44]. Here, we use the max-SINR idea as a baseline scheme in the UDN scenario to obtain the user association based on the SINR level between the user and the AP. With this method, each user attempts to attach to the AP that provides the highest SINR by comparing the SINR between the user and the APs without calculating sub-channel k * for the user-AP association a * f,m,n,k * according to (52).
Based on the above descriptions, as shown in Table 2. we provide a brief comparison of the proposed algorithm with these baseline schemes from two points: (i) implementation of functions for resource allocation, and (ii) overall computational complexity. We wish to remark that the baseline schemes are not quite well tailored to our developed framework, and they can only achieve partial functions for resource allocation, which results in the lower complexity compared to our proposed algorithm.
Throughout the experiments, simulation results are obtained with the following default system parameters. For our considered user-centric UDN scenario, the locations of the users are randomly generated with equal possibility in a circular macrocell area with radius r = 100 m centered at the MBS. A large number of APs are also deployed within this area subject to an independent homogeneous PPP Φ ρ1 to provide wireless access service for those users. Especially, the densities of the APs and the users are specified as ρ 1 = 31.85M AP/km 2 and ρ 2 = 31.85N user/km 2 , respectively 7 . We set the minimum distance between the APs to be 2.5 m, and the minimum distance between the users is 0. For the sake of generality, each user in every cluster is assumed to be simultaneously associated with at most M f = 50 APs on one or more sub-channel(s), the MBS is also assumed to be simultaneously associated with at most φ f,k = 15 APs on each sub-channel in every cluster.
In our simulations, the total number of sub-channels is K = 5 × 10 3 with δ = 500 for access downlink and K − δ = 4.5 × 10 3 for backhaul downlink to meet the resource management requirement for ultradensely deployed nodes. The carrier center frequency is set to 2 GHz and the bandwidth of each subchannel is set to = 180 kHz. For the access downlink via NOMA, we assume that each sub-channel is assigned to at most N f,k = 10 users in every cluster to reduce the complexity of SIC decoding. In every generalized APG, each user can be simultaneously served by at most M f,k = 20 APs on each sub-channel. The pass loss between the AP and the user in every cluster is obtained by a quasi-static block fading model with the small scale Rayleigh fading channel gain distributed as g f,m,n,k ∼ CN (0, 1).
For the backhaul downlink via beamforming, the small scale Rayleigh fading channel coefficient vector from the MBS to the AP in every cluster is assumed to satisfy the complex Gaussian model distributed ash f,m,k ∼ CN (0, I Q ). The beamforming vector for each AP on every sub-channel in every cluster is generated based on the channel coefficient vector between the MBS and that AP [45]. We assume that the number of the transmit antennas for beamforming in the antenna array of the MBS is equal to the number of APs on each sub-channel in every cluster for simplicity of simulations. Without loss of generality, the path loss exponents with respect to both wireless access and backhaul downlink are set as the same value, i.e., ϑ 1 = ϑ 2 = 2. Unless otherwise stated, we set the noise powers at each user and each  AP on the corresponding sub-channels to be the same ones with σ 2 n,k | k∈A = σ 2 m,k | k∈B = N 0 , where the AWGN power spectral density is initialized by N 0 = −174 dBm/Hz. Before validating the system performance through the above simulation settings, we first provide insight on the convergence behavior of the proposed algorithm. Fig. 3 displays the convergence process of the proposed algorithm in terms of the EE with different numbers of the APs M and the users N after using four typical deployment scenarios generated in Fig. 2. It can be observed that the proposed algorithm increases consistently and converges rapidly in less than 14 iterations to reach the optimal points for different values of M and N . In addition, we can find that the proposed algorithm maintains the best performance with respect to M = 4000 and N = 4000. That is because the overall EE performance of the system is not superior when M N is small or especially less than 1. Such behavior can be interpreted that enough number of the APs are required to host the comparable number of the users to guarantee the better performance of wireless access from a user-centric perspective, i.e., M N ≥ 1. When  It is immediately seen that the EE of the system using no matter the proposed algorithm or the baseline Equal-power, R n min =30Mbit/s, P max =46dBm, P m max =32dBm Distance-based, R n min =30Mbit/s, P max =46dBm, P m max =32dBm Max-SINR, R n min =30Mbit/s, P max =46dBm, P m max =32dBm Proposed algorithm, R n min =30Mbit/s, P max =46dBm, P m max =32dBm Equal-power, R n min =30Mbit/s, P max =46dBm, P m max =32dBm Distance-based, R n min =30Mbit/s, P max =46dBm, P m max =32dBm Max-SINR, R n min =30Mbit/s, P max =46dBm, P m max =32dBm Proposed algorithm, R n min =30Mbit/s, P max =46dBm, P m max =32dBm The baseline schemes only realize an optimization of a single criterion without a joint consideration of power, sub-channel, and user association. This result further provides a hint to choose appropriate joint optimization mechanisms to further improve the system performance.
In Fig. 5, we show the comparison between the proposed algorithm and the baseline schemes in terms of the system EE against the number of the users with respect to two different numbers of the APs, i.e., M = 2000 and M = 4000, respectively. From Fig. 5, it is evident that the simulated system EE markedly increases with the continuous evolution of the number of the users, i.e., higher user density. Equal-power, R n min =30Mbit/s, P max =46dBm, P m max =32dBm Distance-based, R n min =30Mbit/s, P max =46dBm, P m max =32dBm Max-SINR, R n min =30Mbit/s, P max =46dBm, P m max =32dBm Proposed algorithm, R n min =30Mbit/s, P max =46dBm, P m max =32dBm Equal-power, R n min =30Mbit/s, P max =46dBm, P m max =32dBm Distance-based, R n min =30Mbit/s, P max =46dBm, P m max =32dBm Max-SINR, R n min =30Mbit/s, P max =46dBm, P m max =32dBm Proposed algorithm, R n min =30Mbit/s, P max =46dBm, P m max =32dBm The reason for this is that larger densities of the users basically obtain more EE gains in spite of more resource competition and high interference. As a consequence the obtained performance gains in the EE are on an increasing trend gradually for more and more users in the system. Moreover, it can be also seen from this figure that our proposed algorithm greatly outperforms the baseline schemes in terms of the system EE no matter M = 2000 or M = 4000. This is due to the fact that the proposed algorithm fully takes the joint optimization of power allocation, sub-channel assignment, and user association into account and thereby achieves good performance. As can be seen from the result, the EE performance of the system no matter for the proposed algorithm or for the baseline schemes when M = 4000 is always much higher than that of M = 2000. This behavior is explained as follows: more APs or the increasing densities of the APs can actually host larger amount of the users under the same system configuration, which can reduce resource competition for the users and further enhance the system performance. This result manifests the importance of the selection of the density of the APs.
Finally, in Fig. 6, we analyze the system EE performance for different values of AP's maximum power Energy Efficiency (bit/Joule) 10 12 Equal-power, M=2000, N=2000, R n min =25Mbit/s, P max =46dBm Distance-based, M=2000, N=2000, R n min =25Mbit/s, P max =46dBm Max-SINR, M=2000, N=2000, R n min =25Mbit/s, P max =46dBm Proposed algorithm, M=2000, N=2000, R n min =25Mbit/s, P max =46dBm W. This can be explained as follows: the transmit power of each user allocated by the AP is more likely to be updated when the AP's maximum power is in a small value below 3.5 W, thus resulting in the lower system EE. However, with the increase of AP's maximum power, the power of each user is properly allocated by the AP, thus satisfying the constraint of AP's maximum power. Furthermore, when AP's maximum power is enough larger, e.g., more than 3.5 W, the possibility of updating the power for each user is also very lower, which results in the nearly fixed values for the overall EE of the system. Such observations above demonstrate the benefit of the proposed algorithm in the maximum EE achievement and provide insightful guidelines for designing the practical user-centric UDNs.

Conclusion
In this paper, we proposed a resource allocation framework for energy efficient user association in downlink user-centric UDNs that closely integrate wireless access via NOMA and wireless backhaul via beamforming. The framework was aimed at the realization of the maximization of overall system-level EE by jointly optimizing user association index, sub-channel assignment, and transmit power allocation.
The aforementioned design problem was a large-scale non-convex mixed-integer nonlinear programming problem and thus difficult to be solved with affordable computational complexity, especially when the numbers of densely distributed users and APs were larger. Therefore, we conducted the problem reformulation through necessary variable relaxation and sum-of-ratios decoupling, and then converted this highly non-convex problem into the convex subproblem via the SCA method. On this basis, a distributed iterative algorithm was further developed to achieve the joint optimization of power allocation, sub-channel assignment, and user association simultaneously. Simulation results demonstrate the convergence of this algorithm, and also show that this algorithm achieves good performance with beneficial increase on the system-wide EE compared with other baseline schemes, indicating its potential for a practical design.

Declaration of interest
The authors declare that there is no conflict of interest regarding the publication of this paper.