AI-Enhanced Load Balancing in Federated Edge Computing Networks: Challenges and Solutions

—In federated edge computing networks, edge computing (EC) nodes from multiple service providers (SPs) are installed to serve the same customer base and each SP intends to maximize their economic utilities. Therefore, to address research problems like EC node placement, job request allocation, and load balancing, we need to involve some economic aspects along with network engineering aspects. This implies that we need to propose novel system models and formulate problems in a signiﬁcantly different way than the existing ones. Thus, we are motivated to discuss about the primary aspects of these new problem formulations and propose a novel economic and game-theoretic model for load balancing among federated EC nodes in this paper. In this game formulation, instead of focusing on latency minimization, under-loaded EC nodes intend to maximize their economic utilities by receiving any extra workload and incentives from their overloaded neighbors while satisfying the expected latency target. Furthermore, we design a centralized control mechanism, tailor-made for ultra-reliable and low-latency (uRLL) applications, for implementing this load balancing framework by incorporating artiﬁcial intelli-gence (AI)-enhanced trafﬁc prediction algorithms.


I. INTRODUCTION
E DGE computing technologies like multi-access edge computing, fog computing, and cloudlet computing act like a bridge between remote cloud servers and mobile edge devices by providing accessibility to computational and memory resources with a very low latency [1]. These EC nodes are mainly placed across the access and metro network segments of the Internet and create large-scale peerto-peer (P2P) distributed systems. The placement of EC nodes is driven by several key factors like the underlying access network technology and architecture, available network bandwidth, the density S. Mondal  of mobile devices and their demand for computational and memory resources [2]. In EC networks, the job requests originating from mobile devices are assigned to the available EC nodes, which is referred to as the job request allocation problem. Nonetheless, sometimes certain EC nodes may get overloaded if too many job requests get assigned to them. Then some of their job requests need to be offloaded to their under-loaded neighbors, which is known as the load balancing problem [3]. A load balancing framework among EC nodes can be implemented via a centralized protocol where an oracle node makes the load balancing decisions. Such frameworks can provide quick and efficient solutions but, cannot handle very large systems. An alternative approach is to implement P2P or decentralized frameworks, which are efficient in handling large systems and robust against EC node failures and network disruptions. Nonetheless, in spite of several advantages of P2P or decentralized frameworks, a major challenge in implementing such frameworks is the exchange of exhaustive control packets among EC nodes. These issues can be overcome by incorporating AI-based techniques in designing network management protocols [4].
We critically observe that most of the recent works that address the load balancing problem among EC nodes, do not consider the coexistence of EC nodes from multiple SPs. When neighboring EC nodes from same as well as different SPs coexist over the same network, we call this network setup as a federated EC network. Due to the lack of frameworks or software for controlling and federating resources among EC nodes from multiple SPs, the authors of [5] identify that there is a need for proposing new schemes for the federation of EC nodes. Especially, designing of new pricing models for the federation of resources among EC nodes and policies for resource sharing among EC nodes schemes appear to be highly essential. When a group of SPs are willing to cooperate among them- selves to maximize their utilities, then cooperative game-theoretic models can be proposed where a group of SPs form stable coalitions [6]. However, if the SPs are completely non-cooperative, we can propose load balancing mechanisms based on noncooperative game-theoretic models [7].
In this article, we propose a novel economic and game-theoretic model for load balancing among federate EC nodes from different SPs. In this model, the EC nodes are interested in maximizing their economic utilities, rather than minimizing the overall latency. The primary motivation behind this approach is our realization that mobile users do not pay any extra price to the SPs for executing the job requests within the requested quality-ofservice (QoS) latency. This implies that for job requests with a QoS latency target of 5 ms, user satisfaction is the same for an overall latency of 1 ms, 4 ms, or 5 ms. Therefore, any under-loaded EC node always finds it beneficial to receive some extra workload from its overloaded neighbors while satisfying the expected QoS latency, because the associated incentives help to maximize their individual utilities. Furthermore, we show that this model can be implemented through an AI-enhanced centralized framework, suitable for uRLL applications, where a neutral mediator supervises the market competition among EC nodes from different SPs. To satisfy the stringent latency restrictions of uRLL applications, each EC node sends their predicted job request arrival rates to the mediator for making the load balancing decisions before the actual job request arrival and avoid the latency overheads for computing the load balancing strategies.

II. CHARACTERISTICS OF LOAD BALANCING
PROBLEM AMONG FEDERATED EC NODES In practice, the EC nodes are distributed over a large geographic area and the incoming job requests vary over time. Hence, load balancing among EC nodes is an important research challenge. In Fig. 1, we show the fundamental aspects of load balancing in federated EC networks. The very first important aspect is the economic interaction among EC nodes from the same as well as different, i.e., heterogeneous SPs. The economic interaction happens in terms of a few parameters like the payment received by the EC nodes from users, penalties paid by the EC nodes to the users for failing to meet QoS requirements and the mutual incentives exchanged for P2P computation offloading. In a federated environment, an overloaded EC node pays an additional incentive for offloading its extra workload to an under-loaded neighbor from a different SP. These payment parameters can be imposed by a neutral central authority like government or mediator. They can also be dynamically determined through a suitable economic or bargaining policies.
The second aspect is heterogeneous QoS requirements support, because the job requests arriving to EC nodes may have different bandwidth demands for data transmission, CPU cycles for job processing, and maximum tolerance for end-to-end (E2E) latency. Such diverse requirements can be handled through network slicing in 5G and beyond networks. A very intriguing research challenge is the strategic partitioning of processors in EC nodes for P2P load balancing. The third aspect is truthful information exchange among nodes. Network parameters like job request arrival rates, job request service rates and QoS latency targets of the incoming job requests are private information for the EC nodes. Therefore, the EC nodes from different SPs may reveal false information to gain some profits from the market. Hence, to enforce truthful information revelation, we need to design incentive-compatible mechanisms for centralized frameworks or reinforcement learning based decentralized frameworks.
In recent times, we observe a growing interest to apply game theory in addressing network problems like load balancing as it provides a wide variety of mathematical models for strategic interaction among rational and independent agents [8]. Especially in federated EC networks, game-theoretic models appear to be highly essential as EC nodes from different SPs may cooperate or compete with each other. For example, the authors of [9] proposed a collaborative load balancing algorithm for latency improvement based on the Nash bargaining solution for a cooperative game among fog computing nodes.
On the other hand, the authors of [10] proposed a distributed non-cooperative load balancing game in small cell networks among the neighboring cloudlets who try to minimize their end-toend latency subject to explicit energy and latency constraints. This model is very efficient under moderate load conditions, but under very high load conditions it performs poorly because some of the cloudlets may violate the latency constraints and the Nash equilibrium (NE) solution becomes infeasible. Moreover, to the best of our knowledge, most of the existing frameworks were not designed for federated EC networks, which we addressed in [11], [12]. In this game, an overloaded cloudlet paid incentives for offloading job requests to a cloudlet only from a different SP. Otherwise, incentives were not required and the game-theoretic framework reduced to a more suitable distributed optimization framework.

III. SYSTEM MODEL
In this section, we discuss about the job request arrival and service processes at the set of N EC nodes C = {1, 2, . . ., N } where N ≥ 2. We also design an AI-enhanced control mechanism to implement this load balancing framework.
(a) Job Request Service Process: Usually each of the EC nodes contain a finite number of processors and we assume that job processing capabilities of each unit processor is the same. The average service rate of the EC node-i, i ∈ C with a single processor is denoted by µ i (jobs/s) and with n i processors is denoted by µ ii = n i µ i . We consider the processing model of the EC nodes as M/M/1 queuing systems based on the assumption that incoming job requests are maximally parallelized [10].
(b) Job Request Arrival Process: The average job request arrival rate to EC node-i is denoted by λ i . Each EC node determines to process a major share of the incoming job requests internally and offload excess workload to a neighboring EC node, through some internal scheduling algorithm (beyond the scope of this paper). The average job request arrival varies over time and we assume that each λ i is independently and uniformly distributed over the support Λ i = [0, λ max i ], ∀i ∈ C. In practice, although the job request arrival process to EC nodes is a non-stationary process, but it displays some pseudostationary characteristics as the mean job request arrival rate varies gradually.
(c) QoS Latency Requirements: In this paper, rather than individual job requests, we are considering a batch of incoming job requests to EC nodes. Hence, the computational and latency requirements of all the incoming job requests to EC node-i are denoted by the consolidated tuple (µ i , λ i , D Q ). We assume that all job requests belong to a similar type of low-latency applications and the value of QoS target latency D Q is also the same. We consider a more generalized model with multiple classes of job requests with heterogeneous D Q values as our future work. This implies that the duration of each timeslot is also equal to D Q and if any highly overloaded EC node fails to process some of its incoming job requests within D Q , it will drop those job requests and pay a penalty for that. Note that an M/M/1 queue provides only an upper bound for the processing latency of an EC node with the aggregated processing rate of all the processors.
(d) User Mobility Model: We assume that the mobile users do not leave the coverage area of an EC node within 1-10 ms. Hence, we can consider the quasi-static mobility model for mobile users. This implies that mobile users are almost stationary to the corresponding EC node during computation offloading period, but may move on later.

A. AI-enhanced Centralized Control Mechanism
In a centralized framework, the federated EC nodes are non-cooperative and rational utility maximizers, but the mediator is neutral and supervises the interactions among the federated EC nodes only to ensure a fair market competition, as shown in Fig. 2. The EC nodes predict their future job request arrival rates, denoted asλ = (λ 1 ,λ 2 , . . .,λ N ) ∈ Λ, the average round-trip data transmission latency among mobile devices and the corresponding EC node-i, denoted by t ui and the inter-EC node round-trip data transmission latency, denoted by t i j , ∀i, j = i ∈ C. We assume that EC-nodes share this information truthfully as the mediator can impose an incentivecompatible mechanism [12]. We further assume that the incoming job request traffic is bursty in nature and has short-term dependency characteristics. Hence, we use the auto-regressive integrated moving average (ARIMA)-based traffic prediction algorithm [13]. Each EC node makes an initial prediction by observing the moving average over a finite set of historical data samples and sends their predicted information to the mediator two time-slots earlier.
The accuracy of the average job request arrival traffic can be improved further by observing the error iteratively in subsequent time-slots. Based on the revealed information, the mediator computes the NE load balancing strategies for each EC node. The fundamental stages of the overall control design are summarized below: (a) Each EC node-i continuously observes a finite set of historical data samples and uses an ARIMA-based algorithm to predict the average incoming job request arrival rateλ (n+1) i for the (n + 1) th time-slot at the (n − 1) th time-slot. (b) Each EC node also estimates the parameters t ui and t i j by using the given stochastic parameters of the wireless and optical interfaces between mobile devices and EC nodes. (c) Each EC node communicates their latest predictions onλ (n+1) i , t ui and t i j to the centralized computational facility installed by the mediator. (d) The mediator employs a gradient projection algorithm to computes the NE computation offloading strategies and sends back to the EC nodes while n th time-slot is ongoing. (e) Thus, the overloaded EC nodes offload a fraction of their total incoming workload to their underloaded neighbors when the (n + 1) th timeslot actually begins. (f) During the (n + 1) th time-slot, each EC node calculates the prediction error on the job request arrival rate to aid the learning algorithm further for improving the accuracy in next time-slots.
IV. ECONOMIC AND NON-COOPERATIVE LOAD BALANCING GAME AMONG EC NODES In the federated EC node deployment scenario, the complete job request offloading strat-egy space of all EC nodes is defined as a ma- Each ϕ i j denotes the fraction of job requests EC node-i offloads to its neighboring EC node-j. Each EC node receives a linearly proportional price (Ω 1 ) per workload from the connected mobile devices. Each EC node pays a linearly proportional price per workload (Ω 2 ) for offloading job requests to a neighboring EC node from a different SP and also, receives a linearly proportional price for executing its neighbor's offloaded jobs. The EC nodes can also cooperate or bargain among themselves to decide the value of Ω 2 , which leads to a cooperative or bargaining game-theoretic model and is part of our future work. Each EC nodei has to pay this price to EC node-j only when it belongs to a different SP. In addition to these, if any EC node violates the QoS target latency D Q , then it pays a linear penalty price with a proportionality cost factor (Ω 3 ) [10].
The objective of each federated EC nodes is to maximize their individual utility We define the utility function as U N i (ϕ i , ϕ −i ) = (revenue earned from mobile users) + γ ji × (incentives earned, if received job requests from neighbors) -γ i j × (incentives paid, if offloaded job requests to neighbors) -(penalty, if QoS latency target is violated), where γ i j = 1 if neighboring EC nodes belong to different SPs, else γ i j = 0. Note that, we consider the utility function only under the condition of stable operation, i.e., µ ii − 1 − j =i ϕ i j λ i − j =i ϕ ji λ j > Fig. 3: Variation of utility function of EC node-i against ϕ ji . 0, ∀i, j = i ∈ C and a necessary condition Ω 2 ≥ Ω 3 max{t ui } + 1 max{µ ii } + max{t i j } − D Q , ∀i, j = i ∈ C is satisfied such that the feasiblilty of pure strategy Nash equilibrium (NE) solution is ensured [11]. Now, we can calculate each of the terms present in utility as follows: • revenue earned from mobile users = Ω 1 × (total workload from associated mobile users), • incentives earned for receiving or paid for offloading job requests = Ω 2 × (amount of workload received or offloaded), • penalty if QoS latency violated = Ω 3 × max{0, (latency of the job requests − D Q )}.

A. Computation of Pure Strategy NE Solution
The value of U N i (ϕ i , ϕ −i ) is maximum when the end-to-end latency of the incoming job requests is equal to D Q because the latency penalty is zero. Thus, any overloaded EC node intend to offload its extra workload to avoid the latency penalty and keep the utility at the maximal point. Similarly, any under-loaded EC node is interested in receiving extra workload and associated incentives without exceeding D Q , to push its utility towards the maximal point, as shown in Fig. 3. To incorporate this property, we introduce an additional set of shared constraints ϕ i j ≤ ψ i j , where ϕ i j is decided by an overloaded EC node, but ψ i j is decided by the corresponding under-loaded EC node. Therefore, we identify this game formulation as a generalized Nash equilibrium problem. We can label an EC node as overloaded if T i = (t ui + 1 Based on the load conditions in various scenarios, the utility and constraints of different EC nodes should be organized and the pure strategy NE can be computed through an optimization problem formulation as shown in [12]. Now, we explain the load balancing game through some basic examples with illustrations in Fig. 4. We consider a set of three EC nodes C = {1, 2, 3}, where EC node-1 and 2 belong to SP-A and EC node-3 belongs to SP-B with µ 11 = µ 22 = µ 33 = 1000 jobs/s, D Q = 10 ms, t u1 = t u2 = t u3 = 2 ms, and t 12 = t 21 = 0.5 ms, t 23 = t 32 = 0.7 ms, and t 31 = t 13 = 0.9 ms. In addition, we arbitrarily choose Ω 1 = 5 × 10 4 , Ω 2 = 3 × 10 4 , and Ω 3 = 9 × 10 4 , such that the necessary game design condition is satisfied. (a) All EC nodes are under-loaded: In the first scenario, we consider the job request arrival rates (a) One underloaded and two overloaded EC nodes.
• If an under-loaded EC node has sufficient capacity, it accepts the entire extra workload from its all overloaded neighbors. • If an under-loaded EC node does not have sufficient capacity, it accepts the maximum possible workload only from its neighboring EC nodes with the same SP. (c) Two under-loaded and one overloaded EC nodes: This is an alternative scenario to the previous one where we consider λ 1 = 970 jobs/s, λ 2 = 820 jobs/s and λ 3 = 700 jobs/s. Therefore, the end-toend latencies of job requests arriving at EC node-1 is T 1 = 35.33 ms > 10 ms, at EC node-2 is T 2 = 7.55 ms < 10 ms, and at EC node-3 is T 3 = 5.33 ms < 10 ms. This implies that EC node-1 is overloaded but EC node-2 and 3 are under-loaded. Thus, EC node-1 offloads its excess workloads to EC node-2 and 3, as shown in Fig. 4b. The pure strategy NE solution is ϕ * 21 = ϕ * 23 = ϕ * 31 = ϕ * 32 = 0, ϕ * 12 = 0.0235, ϕ * 13 = 0.0745, i.e., EC node-2 receives job requests at a rate ϕ * 12 λ 1 = 0.0235 × 970 = 22.71 jobs/s from EC node-1 but do not receive any incentives as they belong to the same SP. On the contrary, EC node-3 receives job requests at a rate ϕ * 13 λ 1 = 0.0745 × 970 = 72.29 jobs/s and receives an incentive Ω 2 ϕ * µ 11 = 37167.95 cost units. Note that the share of workload distributed between EC node-2 and 3 is dependent on their respective workloads and EC node-2 alone cannot process all the extra workload from EC node-1. If we had, say λ 1 = 970 jobs/s, λ 2 = 750 jobs/s, and λ 3 = 700 jobs/s, then the NE solution would be ϕ * 12 = 0.0979, ϕ * 13 = 0 as this helps EC node-1 to avoid paying extra incentives. Again, if EC node-2 and 3 could not accept the total workload (ϕ * 12 + ϕ * 13 )λ 1 , say λ 1 = 970 jobs/s, λ 2 = 850 jobs/s and λ 3 = 840 jobs/s, then they had to partially accept the job requests such that D Q is just satisfied. Then the NE solution would be ϕ * 12 = 0.0172, ϕ * 13 = 0.0198. • If an overloaded EC node has two underloaded neighbors from the same and different SPs who can receive the extra workload, then it prefers to offload to the same SP. • An overloaded EC node offloads the maximum possible workload to its neighbor from the same SP and then to different SPs. (d) All EC nodes are overloaded: In this scenario, we consider λ 1 = 910 jobs/s, λ 2 = 978 jobs/s and λ 3 = 953 jobs/s. Therefore, the end-toend latencies of job requests arriving at EC node-1 is T 1 = 13.11 ms > 10 ms, at EC node-2 is T 2 = 47.45 ms > 10 ms, and at EC node-3 is T 3 = 23.27 ms > 10 ms. Clearly, all EC nodes are overloaded and fails to satisfy the QoS target latencies of their individual job requests and they cannot improve the latency performance by mutual job Clearly, the above numerical examples help us to understand the fundamental insights on load balancing strategies among non-cooperative EC nodes in a federated environment. We could highlight the fact that the EC nodes prefer to receive or offload job requests from or to EC nodes with the same SP. Different combinations of the aforementioned characteristics are applicable against all possible scenarios with N ≥ 3 EC nodes.

B. AI-enhanced Framework Evaluation
In Fig. 5, we show the impact of the prediction accuracy of the incoming job request arrival rates on the NE utility values of the federated EC nodes. For the job request arrival rate prediction, we use the moving-average method based ARIMA algorithm and consider that job request arrival rates to each EC node remains stationary for 30 seconds. We consider the same three EC nodes in Sec. IV-A with µ ii = 1000 jobs/s and λ i varying within 0-1000 jobs/s. Therefore, from the plot, we observe that whenever the actual job request arrival rates of the EC nodes change, the NE utility values of the EC nodes based on the predicted job request arrival rate are slightly erroneous. However, within a few timeslots, each EC node can accurately predict the actual job request arrival rate. Therefore, the NE utilities of the EC nodes with predicted job request arrival rates match to the NE utilities with actual job request arrival rates.

V. CONCLUSIONS AND FUTURE RESEARCH
In this paper, we have proposed a novel economic and game-theoretic model for load balancing among non-cooperative federated EC nodes. We have implemented this model as a centralized framework with AI-enhanced traffic prediction methods for uRLL applications. We strongly believe that this work will encourage more future research in several related directions, e.g., truthful incentive-compatible mechanism design, implementation of decentralized load balancing frameworks, computation of load balancing strategies by using reinforcement learning, to name a few.