Energy-Spectrum Efficient Content Distribution in Fog-RAN Using Rate-Splitting, Common Message Decoding, and 3D-Resource Matching

Multi-objective resource allocation is studied for edge-caching enabled fog-radio access network. Notably, joint maximization of the energy-efficiency (EE) and spectrum-efficiency (SE) and interference management are investigated for distributing contents from the cache-enabled fog access points (F-APs) and cloud base station (CBS) to the user devices (UDs). In our envisioned system, the UDs are grouped into multiple non-overlapping device-clusters based on their locations. A rate-splitting with common message decoding based transmission strategy is applied to enable UDs of each device-cluster to receive data from a suitably selected F-AP and CBS over the same radio resource blocks. To maximize system EE and SE jointly, a multi-objective optimization problem (MOOP) is formulated and it is solved in three stages. At first, by employing the $\epsilon $ -constraint method, the MOOP is converted to an EE-SE trade-off optimization problem. Then, by leveraging iterative function evaluation based power control and generalized 3D-resource matching, the EE-SE trade-off optimization problem is solved and a novel resource allocation algorithm is proposed to obtain near-optimal Pareto-front for the proposed MOOP. To reduce the complexity of obtaining near-optimal Pareto-front, a sub-optimal resource allocation algorithm is proposed as well. Finally, a low-complexity algorithm is devised to select a suitable operating EE-SE pair from the obtained Pareto-front. The conducted simulations demonstrate that the proposed resource allocation schemes achieve substantial improvement of system EE and SE over the benchmark schemes.


I. INTRODUCTION
L EVERAGING centralized cloud processing based resource allocation, distributed signal processing, and popular content caching at the network edge, fog-radio access network (F-RAN) presents a revolutionary paradigm to satisfy the beyond 5G (B5G) performance requirements [1]. In edge-caching (EC) enabled F-RAN, the content distribution phase is significantly important to ensure throughput and diverse latency requirements of the user devices (UDs). Particularly, EC enabled F-RAN jointly exploits both the cache-enabled fog access points (F-APs) and centralized cloud base station (CBS) to deliver contents to the UDs [2]. Hence, well-designed resource allocation plays a vital role for improving the content distribution performance of EC enabled F-RAN.
Because of the proliferation of mobile devices and various applications in the B5G network, there is a dire need of solution to efficiently deliver high-volume and latencysensitive enhanced mobile broadband (eMBB) data, generated from 3D video streaming and extended reality, to the UDs. Using the joint edge-cloud processing, EC enabled F-RAN presents a promising architecture to deliver eMBB data to the UDs with reduced latency [3]. However, the advantage of F-RAN enabled eMBB data delivery is confronted by the increased interference in the dense network and scarcity of the spectrum resources. In particular, interference severely reduces the spectrum-efficiency (SE) which is detrimental to the eMBB data. The conventional approach of interference management by scheduling the active links over orthogonal spectrum resources strictly limits number of the supported UDs in EC enabled F-RAN, and thus, it is incapable to satisfy the data demand of the overly congested B5G network. A promising way to improve the spectrum resource utilization is to group the UDs into multiple device-clusters where the members of each cluster are scheduled to receive data over the same spectrum resources. However, clustering inevitably increases the interference in the network and requires a complex interference management scheme. In addition, the number of F-APs in F-RAN needs to be increased to satisfy enhanced data-rate requirements of a large number of UDs. Hence, from a network operation point of view, maximization of the system energy-efficiency (EE) is also imperative to ensure sustainable operation of F-RAN. Indeed, the B5G network is expected to have 10 times enhancement of EE than 5G network [4]. Accordingly, F-RAN enabled eMBB data delivery calls for a novel solution that maximizes system EE and SE jointly, and provides efficient interference management with improved spectrum resource utilization. However, developing such a solution is non-trivial. This is because on one hand, joint maximization of EE and SE requires to maximize two conflicting objectives, and on the other hand, interference management in a network with multiple device-clusters is a well-known NP-hard problem. To address these challenges, we focus on designing novel resource allocation scheme while aiming at joint maximization of system EE and SE and interference management for the content distribution phase of EC enabled F-RAN.

A. Related Works
EC significantly reduces the backhaul energy consumption and leads to improved EE [5]. In EC enabled system, the enhancement of EE is attributed by optimizing both the content placement and content distribution phases [6]. A large number of recent works have investigated on-device caching and improvement of the system EE by capitalizing the proximity gain of an optimized device-to-device (D2D) assisted content distribution phase. Using game theory, the authors in [8] designed optimized caching strategy and incentive mechanism to encourage the devices to participate in the EC system. The authors in [9] proposed optimized proactive caching strategy and transmission power control to offload the contents from the centralized content server via D2D links while reducing energy consumption of the offloading process. The authors in [10] proposed a novel D2D-clustering based content distribution framework for improving EE of the network. In [11], novel unicast and multicast based D2D-assisted content distribution schemes were proposed for improving EE of network. Using the tools from stochastic geometry, the authors obtained optimal caching strategy for D2D-clustering enabled network [12]. Besides device-caching and D2D assisted content distribution, multi-level EC was also utilized to improve EE of the system. In [13]- [15], the authors studied multi-level EC enabled wireless network where mode selection, transmission power, and file combination were optimized to reduce energy consumption of the content distribution phase. Although the aforementioned works depicted the advantages of EC for improving system EE, these works did not explore interference management for delivering the cached contents. In the most of these works, interference was treated as a noise and consequently, the resultant EE is sub-optimal. Hence, it is imperative to investigate the benefit of interference management for improving EE of EC enabled system.
Resource allocation is envisaged to play the pivotal role for interference management in EC enabled F-RAN. In the contemporary literature, non-orthogonal multiple access (NOMA) has received strong reputation to multiplex multiple UDs over the same spectrum resource(s), and consequently, NOMA has also been applied to manage interference in F-RAN. By optimizing power control and spectrum resource-UD matching, the authors in [16]- [18] developed several novel resource allocation schemes to manage interference and maximize SE of NOMA-based F-RAN. The existing literature also focused on improving EE of F-RAN through resource allocation and interference management. The authors in [19] developed crosstier interference aware pricing mechanism for two-tier F-RAN, and developed a game theoretic resource allocation scheme. Considering the cost of content caching at the F-APs, the authors in [20] developed optimized beamforming vectors and UD/F-AP associations to maximize a cost-aware EE of F-RAN. The optimized beamforming vectors and UD/F-AP associations were also developed to maximize EE of F-RAN while considering outdated channel information at the cloud [21]. Note that the aforementioned works solely concentrated on optimizing either SE or EE of F-RAN in an individual manner. However, optimization of only SE is insufficient to characterize energy saving in the network, and optimization of only EE does not necessarily lead to an optimized SE. Hence, to ensure both greenness for network operation and improved SE, as required by B5G network, optimization of the trade-off between EE and SE of F-RAN is inherently important.
Rate-splitting with common message decoding (RS-CMD) has recently been emerged as a robust and resource efficient transmission strategy for mitigating interference [22]. RS-CMD enables decoding/canceling partial interference and treating partial interference as a noise, and thus, it appears as a more generalized multi-user transmission strategy than NOMA [23]. The problem of SE optimization for RS-CMD system was extensively studied for various wireless networking scenarios, such as, multi-user broadcast system [24], cloud-radio access network [25], cooperative network with user relaying [26], and aerial network [27]. Moreover, both EE maximization and EE-SE trade-off optimization for RS-CMD systems were studied in [28]- [30]. Although the aforementioned works applied RS-CMD in multi-antenna setting, RS-CMD is also advantageous for interference management in single-antenna network [31]. Motivated by the success of RS-CMD in interference management, in this work, RS-CMD is employed for the content distribution phase of EC enabled F-RAN. Note that, for an RS-CMD integrated system, the aforementioned works investigated a downlink scenario with single transmitter and multiple receivers and did not consider the problem of association between receivers and transmitters. However, the association between UDs and F-APs is crucial in EC enabled F-RAN as mentioned in [3], [17]. Consequently, in this work, we optimize the EE-SE trade-off of RS-CMD integrated F-RAN while taking the problem of association between UDs and F-APs into account.

B. Contributions and Paper Organization
In this work, a novel cluster-based resource allocation framework is developed for EC enabled F-RAN. In our envisioned system, the UDs are clustered based on their locations. We employ RS-CMD based transmission strategy to enable UDs of each device-cluster to receive data from a suitably selected cache-enabled F-AP and CBS over the same radio resource blocks (RRBs). As a result, our envisioned system enhances utilization of the limited RRBs in the content distribution phase of EC enabled F-RAN. However, the considered strategy leads to increased interference that affects both system EE and SE. To this end, our developed resource allocation framework aims to efficiently manage interference in the content distribution phase of EC enabled F-RAN while jointly maximizing system EE and SE. To the best of the authors' knowledge, this is the first work that optimizes the trade-off between system EE and SE for the RS-CMD integrated and EC enabled F-RAN. The specific contributions of this work are summarized as follows.
• A multi-objective optimization problem (MOOP) is formulated to maximize the EE and SE of the system jointly.
Notably, we optimize the power allocation of the F-APs and CBS, association between F-APs and device-clusters, and allocation of the RRBs among the device-clusters.
To obtain the near-optimal Pareto-front of the proposed MOOP, it is converted to an EE-SE trade-off optimization problem by applying the -constraint method. Since the EE-SE trade-off optimization problem is NP-hard, a three stage solution is devised. • At the first stage, we determine the maximum achievable system EE and SE by decomposing the resource allocation into two sub-problems. The first sub-problem obtains near-optimal power allocations of the CBS and F-APs by applying an iterative function evaluation (IFE) approach. The second sub-problem obtains a Paretoefficient matching of RRBs and F-APs with the deviceclusters by solving a generalized 3D-resource matching problem. Based on a block-alternating accent method, two convergent algorithms of polynomial computational complexity are proposed to determine the maximum achievable EE and SE of the system. • At the second stage, we propose a novel iterative ratesplitting and resource-matching (I-RSRM) algorithm to solve the EE-SE trade-off optimization problem iteratively. I-RSRM obtains the near-optimal Pareto-front of the proposed MOOP by iteratively adjusting the trade-off between EE and SE. To reduce the complexity of achieving near-optimal Pareto-front of the proposed MOOP, a sub-optimal resource allocation algorithm, referred as, decomposed rate-splitting and resource-matching (D-RSRM), is proposed as well. • At the final stage, a low-complexity algorithm is developed to select an operating EE-SE pair from the solutions obtained by I-RSRM and D-RSRM algorithms. The developed algorithm provides a suitable trade-off between system EE and SE while adjusting a single parameter. Extensive simulations are conducted to verify the superiority of the proposed near-optimal and sub-optimal resource allocation algorithms over several benchmark schemes. The rest of the manuscript has following organizations. Section II provides overall system model and problem formulation. Maximum achievable EE and SE of the system are obtained in Section III. EE-SE trade-off optimization and the proposed algorithms are discussed in Section IV. Simulation results and concluding remarks are presented in Sections V and VI, respectively. Downlink transmission in F-RAN with 2 cache-enabled F-APs, 2 RRBs, and 8 UDs.

A. System Overview
Consider a downlink F-RAN with one CBS, L single antenna F-APs, N orthogonal RRBs, and M single antenna UDs. Fig. 1 illustrates an example of such a network. Here, F-APs are connected with the CBS via fronthaul links. As per the contemporary F-RAN literature [16], [17], CBS proactively places data of the UDs at the cache-memory of F-APs in offline (i.e., before the communication takes place). As such, in online the F-APs can directly transmit data to the UDs without requiring data to be delivered from CBS over fronthaul links. The considered system is a standard F-RAN architecture [2], and it is benefited from both the cloud processing empowered resource allocation and local signal processing/encoding operations of F-APs. For simplicity, we assume that each F-AP caches identical contents. Let N = {1, 2, · · · , N}, L = {1, 2, · · · , L} and M = {1, 2, · · · , M} be the set of available orthogonal RRBs, F-APs, and UDs, respectively.
Based on the locations of UDs, multiple non-overlapping device-clusters are formed as such the mean-square distance between the clustered UDs and cluster center is minimized. Let, M UDs be grouped into K device-clusters, denoted by, Since the F-APs are cost-efficient devices, they can cache only a subset of the UDs' data in their storage. To this end, each device-cluster is further decomposed into two non-overlapping sets of cache-hit and cache-miss UDs. The requested data of the cache-hit UDs is available at the F-APs, and hence, they obtain data by directly accessing the local cache. In contrast, the requested data of the cache-miss UDs are not available at the F-APs, and as a result, they can obtain the data only from CBS. Note that our proposed resource allocation framework is valid for any device-clustering algorithm. Hence, in the ensuing analysis, we consider that the device-clusters are given, and we focus on the resource allocations among the device-clusters.
In a given device-cluster, both the cache-hit and cachemiss UDs receive data from a certain F-AP and CBS, respectively, over the same RRBs. Hence, the system EE and SE depends on the mitigation of the following interference in the content distribution phase, 1 namely, (i) inter device-cluster interference, (ii) interference between F-AP and CBS in each device-cluster, (iii) intra device-cluster interference, such as, interference among the cache-hit UDs and interference among the cache-miss UDs in each device-cluster. To combat inter device-cluster interference, each device-cluster is allocated with certain orthogonal RRBs. To combat interference between F-AP and CBS in a given device-cluster, joint F-AP and CBS power allocation over the shared RRBs is leveraged. Finally, to combat intra device-cluster interference, 1-layer RS-CMD is exploited for both the cache-hit and cache-miss UDs in each device-cluster. Along the lines of [3], [17], each devicecluster is allowed to be associated with maximum one F-AP. Meanwhile, an F-AP can serve multiple device-clusters over the orthogonal RRBs.
For analytical tractability, the following assumptions are made. A1: The described system runs on a slotted-time basis where the overall time duration is divided into equal and nonoverlapping time-slots (TSs). We assume block fading channel where the channels remain constant in a given TS, and can independently vary in different TSs. Data transmission from all the F-APs at each TS is synchronized by the CBS. A2: UDs can accurately perform successive interference cancellation (SIC), and the locations of UDs remain fixed for a number of TSs. A3: F-AP and CBS has accurate knowledge of the local and global channel state information, respectively, of both cache-hit and cache-miss UDs in the system. A4: Using the cloud processing, all the resource allocation tasks are executed at the CBS in a centralized manner. Note that the aforementioned assumptions are common in the contemporary literature of F-RAN [2], [16], [17]. Because of these assumptions, the proposed work provides the maximum theoretical performance gain for EC enabled F-RAN by jointly optimizing power allocation of the F-APs and CBS, association between F-APs and device-clusters, and allocation of the RRBs among the device-clusters.

B. RS-CMD Based Transmission Strategy
We describe RS-CMD based transmission strategy for the j-th device-cluster, S j . The cache-hit and cache-miss UDs of the j-th device-cluster are denoted by S where |·| refers to the cardinality of a set. We consider that the l-th F-AP and the n-th RRB are assigned to the j-th devicecluster. Hence, the UDs in the S where P C,j l,n and {P p,j l,e,n } denote the transmission powers assigned for the common and private messages for the cachehit UDs in the j-th device-cluster, respectively. Similarly, CBS encodes the data of the cache-miss UDs into a common message, denoted by,s c , and |S (H) j | private messages, denoted by, The signal transmitted by the CBS over the n-th RRB is expressed asX n = Q C,j nsc + where Q C,j n and {Q p,j t,n } denote the transmission powers assigned for the common and private messages for the cache-miss UDs in the j-th device-cluster, respectively. The received signals at the UDs are expressed as e,n X l,n + h (0) e,nX n + n, ∀e ∈ S t,n ) is the channel gain between the l-th F-AP and the e-th UD ( resp. the t-th UD) over the n-th RRB; h (0) e,n (resp. h (0) t,n ) is the channel gain between CBS and the e-th UD (resp. the t-th UD) over the n-th RRB; and n is the additive white Gaussian noise with variance σ 2 .
The UDs in the S sets should be able to decode their corresponding common messages. We denote for the common messages over the n-th RRB, the l-th F-AP and CBS target the achievable data rate of the UDs indexed by k j andk j , respectively. The common message data rates for the cache-hit and cache-miss UDs are expressed as R C,j l,n = log 2 1 + γ C,j l,n and R C,j respectively. Here, γ C,j l,n and γ C,j n are defined as (2) and (3), respectively, at the bottom of the next page.
After decoding the common messages, both the cachehit and cache-miss UDs decode their own private messages while canceling the interference from the decoded common messages by applying SIC, and treating the private messages of other UDs in the system as noise. The cache-hit (cachemiss) UDs also treat the interference from the common message of the cache-miss (cache-hit) UDs as noise. Thus, the scheduled data rates for the private messages of the e-th cache-hit and t-th cache-miss UDs, ∀e ∈ S (L) j and ∀t ∈ S (H) j , are obtained as, respectively, R p,j l,e,n = log 2 1 + γ p,j l,e,n and R p,j t,n = log 2 1 + γ p,j t,n . Here, γ p,j l,e,n and γ p,j t,n are given as (4) and (5) at the bottom of the next page. Using (2)-(5), we obtain the total scheduled data rates for the cache-hit and cachemiss UDs in the j-th device-cluster, respectively, as R n . Therefore, the overall throughput of the j-th device-cluster, given that the l-th F-UE and the n-th RRB are assigned to the j-th device-cluster, is obtained as, T

C. Problem Formulation
We introduce two binary variables, a l,j ∈ {0, 1} and b n,l,j ∈ {0, 1} such that a l,j = 1 if the l-th F-AP is associated with the j-th device-cluster and a l,j = 0 otherwise; and b n,l,j = 1 if the n-th RRB is assigned to both the l-th F-AP and the j-th device-cluster and b n,l,j = 0 otherwise. Hence, the total power consumption of the l-th F-AP is obtained as where P c is the constant circuitry power consumption and ε is the power amplifier efficiency. Meanwhile, the total power consumption of the CBS is obtained as t,n . Thereby, the system EE and SE are obtained as, respec- . We formulate an optimization problem, given by P0 at the bottom of the page, to maximize both the system EE and SE by jointly performing transmission power allocation of the F-APs and CBS, association between the F-APs and device-clusters, and allocation of RRBs among the device-clusters.
In P0, C1 implies that each device-cluster will be associated with only one F-AP whereas a given F-AP can be associated with maximum N R device-clusters; C2 implies the orthogonal allocation of the RRBs among the device-clusters; C3 implies that the n-th RRB can be assigned to both the l-th F-AP and the j-th device-cluster if and only if the l-th F-AP is associated with the j-th device-cluster; and C4 provides the transmission power allocation constraints for F-AP and CBS with P FAP max and P CBS max as the maximum transmission power of the F-AP and CBS, respectively.
P0 is a non-convex MOOP. The -constraint method can obtain Pareto-optimal solution for a non-convex MOOP [32]. Consequently, we employ the -constraint method to obtain a Pareto-optimal solution to P0. By introducing a lower-bound for the system EE, denoted by , a single objective formulation of P0 is obtained as P1 at the bottom of the page. Note that P1 is an EE-SE trade-off optimization problem since each solution to P1 provides a trade-off between system EE and SE. The following lemma justifies the efficiency of the problem formulation P1, Lemma 1: P1 provides a weakly Pareto-optimal solution to P0.
Proof: The proof is omitted, and it can be found in [34,Appendix A].
It is obvious that different values of will lead to different solutions to P1, and as per Lemma 1, each of these solutions will be weakly Pareto-optimal to P0. Hence, to solve P0 optimally, we need to obtain overall set of the Pareto-optimal solutions by solving P1 for different values of and select the solution providing a suitable EE-SE trade-off. However, to do so, the maximum achievable EE and SE of the system are required [35]. In what follows, we first obtain the maximum achievable EE and SE of the system, and afterwards, we solve P1 by varying .
P0: max III. EE-OPTIMAL AND SE-OPTIMAL SOLUTIONS To determine the maximum achievable EE and SE of the system by exploiting the considered resource allocation strategy, we formulate the following two optimization problems.
According to the well-known Dinkelbach theorem, if (P , Q, a, b) is a solution to P2.1 and η is the maximum EE of the system, the following condition must be satisfied.
Hence, solving P2.1 is equivalent to finding a resource allocation tuple as such the objective function of the following parameterized optimization problem approaches zero.
However, in P2.3, η also depends on the resource allocation variables as depicted from (10). Hence, we first solve P2.3 for a given η, and then, we update η as such (10) is satisfied. Moreover, the solutions to P2.2 and P2.3 converge when η = 0. Thereby, we focus on solving P2.3. Since P2.3 is NP-hard, we propose a modular approach to solve P2.3 with reasonable computational complexity. Particularly, we decompose P2.3 into two sub-problems for performing power control and resource assignment in an iterative manner. The detailed analysis is given as follows.

A. CBS and F-AP Power allocation
Considering that the assignment variables {a, b} are given, P2.3 is simplified as Although P2.4 is NP-hard, a high quality approximate optimal solution can be obtained by applying Lagrangian relaxation technique [33]. Therefore, to solve P2.4, we formulate a Lagrangian dual-optimization problem and solve it by applying the well-known Karush-Khun-Tucker (KKT) condition. The Lagrangian function corresponding to P2.4 is expressed as (13) at the bottom of the next page. Here, {λ l } and ν are the Lagrangian multipliers corresponding to C3 and C4 constraints, respectively. Using Lagrangian dual-decomposition, the power allocation problem for the j-th device-cluster, ∀j = 1, 2, · · · , K, is expressed as P2.5 : where L j is defined as (15) at the bottom of the next page. To apply the KKT condition, P2.5 is decomposed into inner and outer layer problems. The inner and outer layer problems iteratively update the power allocation variables and the Lagrangian multipliers, respectively. 1) Power Allocation Update: Let the Lagrangian multipliers be given. Capitalizing the IFE based power control method in [15], the near-optimal power allocations of the j-th device cluster, ∀j = 1, 2, · · · , K, are obtained in the following proposition.
Proposition 1: Let P C,j l,n , P p,j l,e,n , Q C,j n , Q p,j t,n be the given power allocations in the k-th iteration. A converged power allocation is obtained by updating the power levels at the (k + 1)-th iteration, ∀k, according to (16)- (19) given at the bottom of the next page. Here, the values of γ C,j l,n , γ C,j n , γ p,j l,e,n , and γ p,j t,n are obtained by plugging P C,j l,n , P p,j l,e,n , Q C,j n , Q p,j t,n to (2)-(5).
Proof: The proof is omitted, and it can be found in [34,Appendix B].
2) Lagrangian Multiplier Update: Using (16)- (19), the power consumption of the l-th F-AP and CBS are calculated as P l and P CBS , respectively. Leveraging the well-known sub-gradient method, at the (k + 1)-th iteration, the Lagrangian multipliers are updated as where {ζ 1 (k), ζ 2 (k)} are the step sizes at the k-th iteration, and for convergence, the step sizes are selected as such 3) Algorithm Development: We propose Algorithm 1 to obtain power allocations in the j-th device-cluster. For a given device-cluster, Algorithm 1 requires O(LN ) computational complexity to obtain a converged solution to P2.4. To demonstrate efficiency of Algorithm 1, we introduce the following proposition.
l,n is a function of the power allocations obtained from Algorithm 1, and it is given by U To solve P2.6, we describe the problem of assigning F-APs and RRBs among the device-clusters as a 3D-resource matching problem. The rules of matching and preference-orders of the device-clusters, F-APs, and RRBs are defined as follows. Definition 1: A 3D-matching is defined as a function Φ(·, ·) that maps elements of the following sets, S × L, S × N , and L × N , into the elements of following sets N , L, and S, respectively. For all {j, l, n} ∈ {S, L, N }, such a mapping satisfies the following conditions. Algorithm 1 Power Allocation Algorithm for the j-Th Device-Cluster 1: Input: Assignment variables {a l,j } and {b n,l,j } and EE η.
3: Initialize Lagrangian multipliers λ l , ∀l ∈ L and ν; iteration index k = k + 1. 4: repeat 5: Update , by using (16)- (19). 6: Update {λ l } and ν by using (20a) and (20b), respectively. 7: Condition 1) implies that if the n-th andñ-th RRBs are matched with the j-th device-cluster, then both the (j, n) and (j,ñ) pairs will be matched with the same F-AP. Condition 2) implies that if the n-th andñ-th RRBs are matched with the l-th andl-th F-APs, respectively, then the (l, n) and (l,ñ) pairs must be matched with different device-clusters. Condition 3) implies that the (l, n) pair can be matched with maximum one device-cluster, and the total number of matched device-clusters with the l-th F-AP over all the RRBs can not exceed N R . Condition 4) implies that two different deviceclusters will always be matched with orthogonal sets of RRBs. Finally, Conditions 5) and 6) imply that the j-th device-cluster, the l-th F-AP, and the n-th RRB will be matched if and only if any of these two elements are matched with the third element. Definition 2: To maximize the objective function value of P2.6, the j-th device-cluster prefers the (l, n) pair over the (l , n ) pair if the following condition is satisfied l ,n ∀l, l ∈ L, ∀n, n ∈ N . (23) The l-th F-AP prefers the (j, n) pair over the (j , n ) pair if the following condition is satisfied l,n ∀j, j ∈ S, ∀n, n ∈ N . (24) The n-the RRB prefers the (j, l) pair over the (j , l ) pair if the following condition is satisfied l ,n ∀j, j ∈ S, ∀l, l ∈ L. (25) In this matching problem, as the players are matched with their preferred pairs according to (23)-(25), either the system SE is increased or the power consumption of the system is reduced. However, for this matching problem, finding a stable 3D-matching is NP-hard and computationally intractable [36]. Consequently, we develop a heuristic algorithm to obtain matching among the device-clusters, F-APs, and RRBs. The overall matching has the following three phases.
F-AP Matching (FM) Phase: In FM phase, the deviceclusters are matched with suitable F-APs as such a single device-cluster is matched with only one F-AP and any F-AP is matched with maximum N R device-clusters. In this phase, two different sets, namely, UM D and UM R are maintained for the unmatched device-clusters and the F-APs capable of accepting device-clusters, respectively. At each iteration of the FM phase, an eligibility score, D j , is calculated for ∀j ∈ UM D , and it provides the eligibility of the unmatched device-clusters to be matched with their most preferred F-AP in the UM R set. The unmatched device-cluster having maximum eligibility score is matched with its most preferred F-AP in the UM R set based on (23), and such a device-cluster is removed from the UM D set. The UM R set is updated as well. The FM phase is continued until all the device-clusters are matched with certain F-APs.
RRB Matching (RM) Phase: In RM phase, the remaining RRBs, which are not matched with any device-cluster, are allocated among the device-clusters. The RM phase is a two-sided many-to-one matching problem where the matched device-cluster and F-APs in the FM phase constitute one side, and the unmatched RRBs constitute other side. Each unmatched RRB is matched with its most preferred devicecluster and F-AP pair according to (25). The RM phase is continued until all the RRBs are matched.
RRB Swapping (RS) Phase: In RS phase, a set of RRBs between any two device-clusters are swapped given the overall utility of the both device-clusters is improved. To explain the RS operation, we assume that Φ(j, l) = n and Φ(j,l) l,ñ is satisfied, the (n,ñ)-th RS are swapped between the j-th andj-th device-clusters. The RS phase is continued until there is no swappable device-cluster pair in the system.
The overall steps of the proposed 3D-resource matching are summarized in Algorithm 2. In Algorithm 2, L * j denotes the assigned F-AP to the j-th device-cluster, N * j denotes set of the assigned RRBs to the j-th device-cluster, K l is the number of device-clusters associated with the l-th F-AP, and UM N is the set of the unmatched RRBs.
2) Property Analysis of Algorithm 2: Although the proposed Algorithm 2 is heuristic, it provides a single-sided Paretoefficient matching for the device-clusters. To this end, we first introduce the following definitions. Consequently, despite being heuristic, Algorithm 2 is efficient. We obtain solution to P2.6 from Φ * as follows. Particularly, ∀j ∈ S, a l,j = 1 and b n,l,j = 1 if Φ * (j, n) = l, Φ * (l, n) = j, Φ * (j, l) = n, and a l,j = 0 and b n,l,j = 0, otherwise. The following proposition establishes the nearoptimality of the derived solutions.
Proposition 4: For the given power allocations, Algorithm 2 provides a near-optimal solution to P2.6.
Proof: The proof is omitted, and it can be found in [34,Appendix E].
The computational complexity of Algorithm 2 is explained as follows. Particularly, in each iteration of the FM phase in Algorithm 2, a single device-cluster is matched with an F-AP and RRB pair. In each iteration of the FM phase, the metric {D j } is calculated for all the unmatched device-clusters, and the set UM R is computed for the F-APs. Therefore, the required computation of the FM phase is Determine UM R = {l ∈ L|K l < N R }. 5: For each j ∈ UM D , determine 8: end while (End of FM phase) 9: repeat ∀n ∈ UM N (Beginning of RM phase) 10: 11: until UM N = ∅ (End of RM phase) 12: repeat ∀j ∈ S and ∀j ∈ S \ {j} (Start of RS phase) 13: 14: if ∃n ∈ N * j such that the j-th andj-th device-clusters are swappable over the (n,ñ)-th RRBs then 15: computations is (N − |S|)|S|. Finally, for performing the RS phase, each device-cluster has to visit the rest of the |S| − 1 device-clusters. Therefore, in the worst-case, total required computations to execute the Steps (14) and (15) in Algorithm 2 is |S|(|S| − 1)N . By combining the required computations of the FM, RM, and RS phases and considering the fact that |S| = K, the worst-case computational complexity of Algorithm 2 is obtained as

C. Algorithms for Maximizing EE and SE
Capitalizing the IFE based power control and 3D-resource matching, we develop Algorithm 3 and Algorithm 4 to determine maximum achievable EE and SE of the system, respectively. Both Algorithm 3 and Algorithm 4 iteratively update solutions to P2.4 and P2.6 by adopting a blockalternating accent method. Since both P2.4 and P2.6 are nearoptimally solved, Algorithm 3 and Algorithm 4 must converge as confirmed by the following proposition.
Proposition 5: Algorithm 3 and Algorithm 4 converge to near-optimal system EE and SE, respectively. Update the assignment variables, {a, b}, by using Algorithm 2 while plugging the updated power allocation variables of the previous step in (22). 5: Update the assignment variables, {a, b}, by using Algorithm 2 while plugging the updated power allocation variables of the previous step in (22), d = d + 1. 5: until SE (P , Q, a, b) converges or Maximum number of iterations is reached.

Proof: The proof is omitted, and it can be found in [34, Appendix F].
The computational complexity of Algorithm 3 and Algorithm 4 is explained as follows. Let the convergence of Algorithm 3 and Algorithm 4 requires Δ

A. Near-Optimal Algorithm Design
We obtain the Pareto-front of P0 by solving P1. P1 can be solved by applying a Lagrangian optimization technique. Let μ be the Lagrangian multiplier correspond to C5 constraint in P1. The partial Lagrangian of P1 is written as For a given value of , a local optimal solution to P1 can be obtained as Note that, P3.1 and P2.3 have a similar structure except some scalar multipliers. Hence, for the given values of μ and , Algorithm 1 and Algorithm 2 are applied to obtain {P * (), Q * ()} and {a * (), b * ()}, respectively. To avoid redundancy, the detailed analysis is omitted. Meanwhile, at the optimality of P3.1, μ * > 0 will be satisfied; otherwise, P3.1 will converge to an SE maximization problem. According to the complementary slackness condition, an optimal value of μ * must be obtained as such the C5 constraint becomes active. Since P3.1 provides a (local) optimal solution to P1, such a solution is also weakly Pareto optimal to P0. The overall set of the (weakly) Pareto optimal solutions to P0 is defined as the Pareto-front of P0. To obtain the Pareto-front of P0, we introduce the following proposition.
Proof: The proof is omitted, and it can be found in [34,Appendix G].
Based on Proposition 6, we propose an I-RSRM algorithm. The overall steps of I-RSRM algorithm are summarized in Algorithm 5. The total number of the solutions obtained by I-RSRM algorithm is N F , and each of theses solutions provides a certain trade-off between EE and SE of the system. However, since P0 is a non-convex and NP-hard multi-objective optimization problem, the global optimality of the Pareto-front obtained by I-RSRM algorithm is not guaranteed. Nevertheless, by selecting a small step size Δ and increasing N F , I-RSRM can obtain a good approximation of the optimal Pareto-front. Accordingly, I-RSRM provides a near-optimal Pareto-front for the multi-objective resource optimization problem, P0.

B. Low-Complexity Algorithm Design
Despite near-optimality, I-RSRM algorithm requires significant large number of iterations, and number of iterations increases with the size of network. Consequently, the convergence rate of I-RSRM algorithm may not affordable in large scale network. To reduce the number of iterations, we propose D-RSRM algorithm. The overall steps of D-RSRM are summarized in Algorithm 6. Unlike I-RSRM, D-RSRM decouples the power allocations and resource assignment. Particularly, D-RSRM initially determines the assignment of the F-APs and RRBs among the device-clusters by considering a uniform power allocation, and keeps such assignments fixed for the remaining steps. Essentially, the total number of iterations is substantially reduced in D-RSRM. Similar to I-RSRM, D-RSRM provides a set of resource allocation variables where each solution has a certain EE-SE trade-off. However, the Pareto-front obtained by D-RSRM is sub-optimal.

C. Operating EE-SE Pair Selection Algorithm
In this section, we explain the procedure of selecting a suitable operating EE-SE pair from the Pareto-front obtained  , Q EE , a EE , b EE ), respectively, by using Algorithm 1 and Algorithm 2. 2: Initialize: (1) Initialize μ, Assignment variables {a, b}. 5: repeat 6: Update power allocation variables using Algorithm 1 while replacing η ε and 1/ ln 2 in (16) Update μ using a bi-section method as such constraint C5 in P1 becomes active. 9: until L (2) converges or Maximum number of iterations is reached 10: Record the resource allocations . Update (m+1) = (m) + Δ, m = m + 1. 11: end while 12: Output: Set of Pareto-optimal resource allocations P * ( (m) ), Q * ( (m) ), a * ( (m) ), b * ( (m) ) , ∀m = 1, 2, · · · , N F . by the proposed I-RSRM and D-RSRM algorithms. Note that the decision of the system to trade-off certain portion of its optimal EE (or optimal SE) can be perceived as a cost paid by the system to achieve certain profit in terms of SE (or EE) improvement. Assume that the system allows maximum α percentage loss of its optimal EE or SE. Thus, the system can either trade-off maximum α percentage of the optimal EE or the optimal SE. Here, α is a system defined EE-SE trade-off control parameter. We need to decide whether it is beneficial to trade-off α percentage of optimal EE (resp. optimal SE) for having a gain in the achievable SE (resp. achievable EE) of the system. To this end, we denote EE * * and SE * * by the achievable EE and SE given the system is willing to trade-off α percentage of optimal EE and SE, respectively. Let, EE * and SE * be the maximum EE and SE in the available Pareto-front, respectively. To determine EE * * and SE * * , we formulate two optimization problems, given by, (28) and (29) at the bottom of the next page.
In both (28) and (29), the resource allocation variables are taken from the output of the proposed I-RSRM (D-RSRM) algorithm. Hence, both (28) and (29) By plugging (a, b) to Algorithms 3 and 4, determine EEoptimal and SE-optimal power allocations, respectively. Denote SE-optimal and EE-optimal resource allocations by (P SE , Q SE , a, b) and (P EE , Q EE , a, b), respectively. 4: Initialize: (1) = EE (P SE , Q SE , a, b), iteration index m = 1, step size Δ. 5: while EE (P EE , Q EE , a, b) > (m) do 6: Initialize μ. Update μ using a bi-section method as such constraint C5 in P1 becomes active. In (30a) and (30b), {w 1 , w 2 , w 3 } is a set of network determined weight-factors. In (30a) and (30b), χ EE and χ SE are defined as, respectively, where EE SE=SE * is the achievable EE when the system operates with its maximum SE, SE * , and ! SE EE=EE * is the achievable SE when the system operates with its maximum EE, EE * . Here, χ EE provides the percentage gain of EE when the system allows maximum α percentage loss of its optimal SE, and χ SE provides the percentage gain of SE when the system allows maximum α percentage loss of its optimal EE. Obviously, if BOT 1 > BOT 2 holds, it is more beneficial to trade-off α percentage of the optimal SE and the operating EE-SE pair of the system is selected as Conversely, if BOT 2 > BOT 1 holds, it is more beneficial to trade-off α percentage of the optimal EE and the operating EE-SE pair of the system is selected as 1 − α 100 EE * , SE * * . The overall steps of selecting a suitable EE-SE pair and the corresponding resource allocation variables are summarized in Algorithm 7. Note that the optimality of Algorithm 7's output depends on the optimality of the available Pareto-front. Accordingly, by increasing the number of solutions of I-RSRM and D-RSRM algorithms, the near-optimality of both the obtained Pareto-front and Algorithm 7's output can be improved.

D. Implementation and Computational Complexity Analysis
Implementation: Leveraging the reliable fronthaul links between CBS and F-APs and the computation resources of the CBS, both I-RSRM and D-RSRM algorithms are executed at the CBS. In each TS, CBS first collects the global CSI of the Algorithm 7 Operating EE-SE Pair Selection Algorithm 1: Input: Pareto-front achieved by I-RSRM/D-RSRM algorithm and the EE-SE trade-off control parameter α. 2: Calculate EE * * and SE * * by solving (28) and (29) network. Then, CBS obtains the Pareto-front of system EE and SE by executing the proposed I-RSRM/D-RSRM algorithm. Subsequently, by executing Algorithm 7, CBS determines the operating EE-SE pair and the corresponding resource allocations. Finally, CBS coordinates the resource allocation decisions with the F-APs via the fronthaul links. Note that the fronthaul links are mainly used for coordination/scheduling purpose, and hence, the burden over fronthaul is relieved. In this regard, the proposed implementation is in agreement with the conventional F-RAN architecture that exploits cloud resources for complex resource allocation tasks and local cache at the F-APs for fast data delivery to the UDs.

Complexity of I-RSRM:
The computational complexity I-RSRM algorithm depends on the number of the required Pareto-optimal points. Using the reported computational complexity of Algorithms 3 and 4, the computational complexity of obtaining the two corner points of the Pareto-front is O Δ . We assume that Δ (2) max iterations are required for convergence of the Steps 5-9 of Algorithm 5. Hence, we can readily justify that to obtain a single Pareto-optimal point, I-RSRM requires iterations. Recall, I-RSRM obtains N F points of the Pareto-front. Therefore, the overall computational complexity of the proposed I-RSRM max KLN + K 2 N . Evidently, D-RSRM brings substantial reduction of the computational complexity, especially for the system with large number of F-APs and RRBs.

V. SIMULATION RESULTS
In this section, the performance of the proposed resource allocation algorithms is evaluated by simulations. For simulations, we consider a 300m×300m square F-RAN cell with one CBS in the center, 28 UDs, 10 F-APs, and 36 RRBs. The UDs are grouped into 7 device-clusters where cluster-1, cluster-2, cluster-3, cluster-4, cluster-5, cluster-6, and cluster-7 contain 5 UDs, 3 UDs, 3 UDs, 5 UDs, 4 UDs, 3 UDs, and 5 UDs, respectively. The F-APs are deployed on a circle of 35m radius. The minimum distance from the F-APs to UDs and the minimum distance from the CBS to UDs are set as 40m and 75m, respectively. An example of the considered simulation setting is provided in [34, Fig. 2 [37]. Similar to [38], we consider P F AP max = P CBS max = 33 dBm, P c = 0.2 Watt, ε = 0.7, σ 2 = −174 dBm. Finally, for selecting a suitable EE-SE pair by using Algorithm 7, we consider {w i } = 1, ∀i ∈ {1, 2, 3}, in (30a) and (30b). We consider the following benchmark schemes. For all the benchmark schemes, the operating EE-SE pair is selected by employing the method described in Section IV. C.
• Downlink multicasting: Downlink multicasting approach entirely eliminates the intra-device cluster interference, such as, interference among the cache-hit UDs and interference among the cache-miss UDs of each device-cluster. To apply multicasting in a given device-cluster, both the F-AP and CBS first combine the messages of the cache-hit UDs and cache-hit UDs, respectively. Then, the F-AP and CBS transmit the combined message over the RRBs using two rates as such the transmitted message from the F-AP and CBS are decodable to all the cache-hit and cache-miss UDs, respectively. To mitigate interference between F-AP and CBS in the device-clusters, transmission power of both F-AP and CBS are jointly optimized by using Algorithm 1. To mitigate inter device-cluster interference, Algorithm 2 is used to obtain the associations among device-clusters, F-APs, and RRBs. • Unicast and treat interference as noise (Unicast/TIN): Unicast/TIN eliminates the interference between F-AP and CBS of the considered system model by allowing each device-cluster to be solely supported via CBS. However, this approach is subject to both inter devicecluster and intra device-cluster interference. To eliminate inter device-cluster interference, a many-to-one matching algorithm is applied to allocate orthogonal RRBs among the device-clusters.To mitigate intra device-cluster The only difference between I-RSRM and I-RSRM/IRA is that instead of 3D-resource matching, I-RSRM/IRA applies an alternating optimization technique, given in Algorithm 3 of [39], to determine the associations among device-clusters, F-APs, and RRBs.

A. Effect of EE-SE Trade-off Control Parameter
Figs. 2a and 2b illustrate the EE and SE of the proposed I-RSRM algorithm for different values of the EE-SE tradeoff control parameter α. For all the considered device-clusters in Figs. 2a and 2b, when α approaches small or large value, such as α = 10% or α = 60%, a large system EE is achieved, and when α approaches moderate value, such as α = 30%, a large system SE is achieved. Moreover, for a given range of α, the system EE and SE depict opposite behavior, i.e., increase of system SE results in decrease of system EE and vice versa. Such an observation is intuitive as EE and SE of the system can not be maximized simultaneously. An explanation of achieving large EE for both small and large values of α is provided as follows.
• Small α: When α → 0, the solutions to (28) and (29) approach the corner points of the developed Pareto-front.
Particularly, the EE obtained by trading-off α percentage of optimal SE approaches the achievable EE of the system operating with maximum SE, and the SE obtained by trading-off α percentage of optimal EE approaches the achievable SE of the system operating with maximum EE. Essentially, as α → 0, EE * * → EE SE=SE * and SE * * → ! SE EE=EE * , and consequently, χ EE → 0 and χ SE → 0 are satisfied. We can illustrate that in the considered simulation setup, the relative EE gap between the two corner points of the developed Paretofront is much larger than relative the SE gap. Hence, for small α, EE * −EE * * SE * −SE * * 1 and BOT 2 > BOT 1 are usually satisfied. Recall, when BOT 2 > BOT 1 holds, the achievable EE and SE of the system are given by 1 − α 100 EE * and SE * * , respectively. Since α is small, the system EE approaches the optimal EE, and owing to the conflicting behavior of EE and SE, SE becomes small. Accordingly, when α is small, the considered system example obtains large EE and small SE. • Large α: When α is large, the SE and EE constraints of (28) and (29)  and BOT 2 → χ SE − α + (α/100) 2 1−α/100 , respectively. Since the relative EE gap between the two corner points of the developed Pareto-front is much larger than the relative SE gap, χ EE > χ SE and BOT 1 > BOT 2 are satisfied. Recall, when BOT 1 > BOT 2 holds, the achievable EE of the system is given by EE * * , and as mentioned earlier that for a large α, EE * * approaches the optimal system EE. Consequently, the considered system example also achieves large EE as α becomes large. Meanwhile, between the aforementioned two extreme cases, certain α does exist for which system chooses a strategy that obtains a near-optimal SE. Fig. 2a illustrates that as α takes a moderate value, i.e., α = 30%, for all the device-clusters, the considered system example achieves larger SE compared to the scenarios when α = 10% and α = 60%. Note that α is a system defined control parameter. To determine a suitable value of α, one can first conduct an offline simulation where system EE and SE are evaluated for different values of α. Subsequently, by inspecting these results, the suitable value of α is selected so that the required design criteria are satisfied.

B. EE and SE Comparison Among the Proposed and Baseline Schemes
Figs. 3a and 3b plot the achievable SE and EE of the proposed schemes for different device-clusters while considering α = 30% and α = 60%, respectively. SE and EE of the EE-optimal and SE-optimal schemes are also depicted in Figs. 3a and 3b, respectively. Performance of the EEoptimal and SE-optimal schemes is obtained from Algorithm 3 and Algorithm 4, respectively. The SE-optimal and EE-optimal schemes provide the two corner points of the Pareto-front developed by I-RSRM algorithm. Figs. 3a and 3b depict that for all the device-clusters, the relative EE gap between the EE-optimal and SE-optimal schemes is much wider than the relative SE gap between EE-optimal and SE-optimal schemes. This confirms the previously mentioned fact, that is, the relative EE gap between the two corner points of the developed Pareto-front is much larger than the relative SE gap. Fig. 3a depicts that I-RSRM with α = 30% achieves approximately 70% of the optimal SE which is intuitively expected. Particularly, by trading-off 30% of optimal SE, I-RSRM tremendously improves the achievable system EE. For instance, for K = 5 device-clusters, I-RSRM with α = 30% improves the system EE by 6.09 times compared to the SEoptimal scheme. Moreover, I-RSRM with α = 30% achieves improved SE compared to the EE-optimal scheme as well. In Fig. 3a, for K = 5 device-clusters, I-RSRM with α = 30% achieves 28.25% improved SE compared to the EE-optimal scheme. Finally, Fig. 3b depicts that for all the device-clusters, I-RSRM with α = 60% approaches EE-optimal scheme, and it is consistent with Fig. 2b. We observe that the conventional EE-optimal and SE-optimal schemes do not provide both SE and EE guarantee simultaneously. In contrast, by tuning α, the proposed I-RSRM algorithm not only improves the tradeoff between EE and SE but also can approach the solution provided by the EE-optimal and SE-optimal schemes. Hence, the proposed I-RSRM algorithm is more amenable compared to the conventional EE-optimal and SE-optimal schemes.
Figs. 3a and 3b also confirm the substantial SE and EE gain achieved by the proposed I-RSRM algorithm. Note that both downlink multicasting and unicast/TIN approaches provide certain form of interference management for the considered system model, and consequently, they are the viable benchmark schemes for proposed resource allocation algorithms. Moreover, both of these approaches can be obtained as special cases of the proposed I-RSRM algorithm. Essentially, I-RSRM outperforms both downlink multicasting and unicast/TIN approaches. Fig. 3a depicts that for K = 5 device-clusters, I-RSRM achieves 22.36% and 23.65% improved SE compared to downlink multicasting and unicast/TIN approaches, respectively. Similarly, Fig. 3b depicts that for K = 5 device-clusters, I-RSRM achieves 88.76% and 121.93% improved system EE compared to downlink multicasting and unicast/TIN approaches, respectively. In a downlink multicasting approach, to avoid intra device-cluster interference, only the minimum throughput of the cache-hit and cache-miss UDs in each device-cluster is targeted. The achievable throughput of such an approach is substantially reduced as the channel condition of the UD's with minimum throughput is deteriorated. Note that both downlink multicasting and I-RSRM use the same algorithm for 3Dresource matching. Essentially, compared to downlink multicasting, I-RSRM achieves performance gain by exploiting an optimized RS-CMD technique. On the other hand, compared to unicast/TIN, I-RSRM achieves performance gain by exploiting the simultaneous transmission from both F-APs and CBS and optimized power allocations to effectively suppress interference in the network. Overall, although the proposed I-RSRM algorithm allows more interference in the system than both downlink multicasting and unicast/TIN approaches, due to an optimized resource allocation, I-RSRM achieves considerable EE and SE gain over both downlink multicasting and unicast/TIN approaches. Fig. 3a also depicts that the proposed low-complexity D-RSRM algorithm achieves from 95% to 97% of SE of I-RSRM algorithm. However, Fig. 3b depicts that as the number of device-clusters in the network is increased, the EE gap between the proposed I-RSRM and D-RSRM algorithms is noticeably increased.
Figs. 4a and 4b illustrate variation of the SE and EE of the proposed I-RSRM algorithm for different number of F-APs, respectively. As the number of F-APs in the network is increased, the diversity gain obtained from the opportunistic assignment between device-clusters and F-APs is enhanced. Such a fact leads to the improvement of both SE and EE of the system. As expected, for all the F-APs, the SE (EE) of the proposed I-RSRM algorithm resides between SE (EE) of SE-optimal and EE-optimal schemes. Moreover, Figs. 4a and 4b depict that I-RSRM achieves maximum SE and EE for α = 30% and α = 60% which is consistent with the observation made from Figs. 2a and 2b, respectively. For L = 4 F-APs, compared to SE-optimal scheme, I-RSRM with α = 10%, 30%, and 60% achieves 61.10%, 69.92%, and 53.99% of optimal SE, respectively, and improves the achievable EE by 8.24, 6.49, and 9.01 times, respectively. Similarly, For L = 4 F-APs, compared to EE-optimal scheme, I-RSRM with α = 60%, 30%, and 10% achieves 99.92%, 72%, and 91.31% of optimal EE, respectively, and improves the achievable SE by 1.004, 1.3, and 1.14 times, respectively. Indeed, the proposed I-RSRM algorithm improves the tradeoff between EE and SE of both EE-optimal and SE-optimal schemes.  Figs. 5a and 5b plot SE and EE of the proposed D-RSRM algorithm for different number of F-APs, respectively. As expected, both SE and EE of D-RSRM algorithm are improved as the number of F-APs is increased. However, compared to I-RSRM, D-RSRM experiences higher SE and EE loss compared to SE-optimal and EE-optimal schemes, respectively. For example, for L = 4 F-APs, I-RSRM can achieve up to 69.92% of optimal SE and 99.92% of optimal EE of the system. However, for the same system setting, D-RSRM can achieve up to 67.55% of optimal SE and 84.49% of optimal EE of the system. In addition, compared to I-RSRM, D-RSRM has inferior EE-SE trade-off performance. For example, for L = 4 F-APs, I-RSRM can achieve up to 99.92% of EE of EEoptimal scheme while improving SE by 1.004 times. However, for the same system setting, D-RSRM can achieve up to 84.49% of EE of EE-optimal scheme while sacrificing 11.68% loss of SE. From the aforementioned discussion, it is evident that the gap between maximum achievable SE of I-RSRM and D-RSRM algorithms is small. Moreover, D-RSRM has much lower computational complexity than I-RSRM. Hence, D-RSRM is a suitable alternative to I-RSRM for large size system given the system prioritizes SE. However, when the system prioritizes the improvement of the trade-off between EE and SE, I-RSRM is preferred.
Figs. 6a and 6b compare SE and EE of I-RSRM, D-RSRM, and I-RSRM/IRA algorithms for different RRBs. As expected, as the number of RRBs is increased, both SE and EE of all the considered schemes in Figs. 6a and 6b are increased. Note that unlike I-RSRM and D-RSRM, I-RSRM/IRA uses an alternating optimization technique given in [39, Algorithm 3] to obtain the assignments of RRBs and F-APs among the device-clusters. However, despite convergence, such an alternating optimization technique does not guarantee nearoptimal solution to P2.6. Consequently, I-RSRM/IRA obtains an inferior matching among the device-clusters, F-APs, and RRBs, and such an inferior matching leads to the noticeable SE/EE loss compared to the proposed algorithms, especially for large RRBs. From Fig. 6a, at N = 30 RRBs, the proposed I-RSRM and D-RSRM algorithms with α = 30% improve the achievable SE of I-RSRM/IRA by 22.69% and 16.17%, respectively. From Fig. 6b, at N = 30 RRBs, the proposed I-RSRM and D-RSRM algorithms with α = 60% improve the achievable EE of I-RSRM/IRA by 63.78% and 38.98%, respectively. Such an observation clearly demonstrates that the our proposed 3D-resource matching is highly efficient to improve both SE and EE of the system.

C. EE (resp. SE) Loss-Rate Vs. SE (resp. EE) Improvement
Figs. 7a and 7b demonstrate EE and SE loss-rate vs. SE and EE improvement for the proposed and benchmark schemes, respectively. Each point on Figs. 7a and 7b provides the achievable EE and SE obtained by trading-off certain percentage of optimal SE and EE, respectively. We make the following three observations. First, as the SE loss-factor and EE lossfactor become asymptotically large, the achievable EE and SE of I-RSRM algorithm approach the EE and SE obtained by the EE-optimal and SE-optimal schemes, respectively. Second, for I-RSRM algorithm, the improvement of EE by tradingoff optimal SE is much larger than the improvement of SE by trading-off optimal EE. For instance, when the SE lossfactor is increased from 20% to 50%, the EE of I-RSRM algorithm is improved by 137.59% as depicted from Fig. 7a. In contrast, when the EE loss-factor is increased from 20% to 50%, the SE of I-RSRM algorithm is improved by 14.57% as depicted from Fig. 7b. Finally, we observe that for a given SE and EE loss factor, the proposed I-RSRM and D-RSRM algorithms achieve substantially large EE and SE gain compared to the benchmark schemes. For instance, from Fig. 7a, when the system is willing to trade-off 30% of optimal SE, I-RSRM (resp. D-RSRM) achieves 131.45%, 131.67% and 134.61% (resp. 72.26%, 72.43%, and 74.61%) improved EE compared to the downlink multicasting, unicast/TIN, and I-RSRM/IRA schemes, respectively. Similarly, from Fig. 7b, when the system is willing to trade-off 30% of optimal EE, I-RSRM (resp. D-RSRM) achieves 37.32%, 36.36% and 38.76% (resp. 24.03%, 23.37%, and 25.33%) improved SE compared to the downlink multicasting, unicast/TIN, and I-RSRM/IRA schemes, respectively. The aforementioned observations suggest that compared to the benchmark schemes, both of the proposed I-RSRM and D-RSRM algorithms significantly improve the trade-off between EE and SE of the system.

VI. CONCLUSION
We proposed cluster-based resource allocations to simultaneously maximize system EE and SE of the content distribution phase of RS-CMD integrated and EC enabled F-RAN. By adopting the -constraint method, an EE-SE tradeoff optimization was conducted while jointly performing the transmission power allocation of CBS and F-APs, assignment of the F-APs among the device-clusters, and scheduling of the RRBs among the device-clusters. An IFE based power control method and 3D-resource matching were developed. Based on such analyses, near-optimal and sub-optimal algorithms were proposed to obtain the Pareto-front of EE-SE trade-off optimization problem, and select a suitable operating EE-SE pair. Simulation results confirmed the following two observations: (i) By tuning a suitable EE-SE trade-off control parameter, the proposed near-optimal algorithm substantially improves the trade-off between EE and SE compared to the conventional EE-optimal and SE-optimal schemes, and (ii) the proposed near-optimal and sub-optimal algorithms yield significant EE and SE gain compared to the benchmark schemes.