A Decentralized Economic Load-Balancing Framework among Federated Cloudlets

—Edge computing servers like cloudlets from different service providers compensate scarce computational and stor-age resources of mobile devices, are distributed across access networks. However, the dynamically varying computational requirements of associated mobile devices make cloudlets either overloaded or under-loaded. Hence, load balancing among neighboring cloudlets appears to be an essential research problem. Especially, the load balancing problem among federated cloudlets from the same as well as different service providers for low- latency applications needs signiﬁcant attention. Thus, in this paper, we propose a decentralized load balancing framework among federated cloudlets for low-latency applications that focuses on latency bound rather than latency minimization. In this framework, we employ dynamic processor slicing for handling heterogeneous classes of job requests. We propose a continuous-action reinforcement learning automata-based algorithm that enables cloudlets to independently compute the load balancing strategies in a completely distributed network setting without any exhaustive control message exchange. To capture the economic interaction among federated cloudlets, we model this load balancing problem as an economic and non-cooperative game and by scaffolding the properties of the game formulation, we achieve faster convergence of the reinforcement learning automata. Furthermore, through extensive simulations, we study the impacts of exploration and exploitation on learning accuracy.

Abstract-Edge computing servers like cloudlets from different service providers compensate scarce computational and storage resources of mobile devices, are distributed across access networks. However, the dynamically varying computational requirements of associated mobile devices make cloudlets either overloaded or under-loaded. Hence, load balancing among neighboring cloudlets appears to be an essential research problem. Especially, the load balancing problem among federated cloudlets from the same as well as different service providers for lowlatency applications needs significant attention. Thus, in this paper, we propose a decentralized load balancing framework among federated cloudlets for low-latency applications that focuses on latency bound rather than latency minimization. In this framework, we employ dynamic processor slicing for handling heterogeneous classes of job requests. We propose a continuous-action reinforcement learning automata-based algorithm that enables cloudlets to independently compute the load balancing strategies in a completely distributed network setting without any exhaustive control message exchange. To capture the economic interaction among federated cloudlets, we model this load balancing problem as an economic and non-cooperative game and by scaffolding the properties of the game formulation, we achieve faster convergence of the reinforcement learning automata. Furthermore, through extensive simulations, we study the impacts of exploration and exploitation on learning accuracy.
Index Terms-Cloudlets, load balancing, long short-term memory, non-cooperative game theory, reinforcement learning.

I. INTRODUCTION
T HE next-generation Internet is not only expected to route data, but also to store and process data generated from a large number of pervasive mobile devices like smartphones and Internet-of-Thing (IoT) devices. With the recent emergence of ultra-reliable and low-latency communication (uRLLC) applications such as virtual/augmented reality, automotive, and tele-operation as part of the tactile internet [1], [2], edge computing solutions like multi-access edge computing, fog computing, and cloudlet computing [3] have been proposed to reduce both communication and computation latencies. Edge computing servers like cloudlets are fundamentally a computer or cluster of computers installed in the proximity of mobile device users and distributed across access networks [4]. As cloudlet computing systems are essentially distributed computing systems, therefore, various efforts have been made to design a suitable job request allocation problem S. Mondal  from mobile devices to cloudlets while meeting computation and communication constraints [5]- [11]. Although job allocation frameworks allocate job requests to the most favourable cloudlets, due to the dynamic nature of job request arrival process, cloudlets at different parts of a large network become overloaded and/or under-loaded at different times. Thus, a plethora of research initiatives are present in the literature to achieve an efficient load balancing frameworks among neighbouring cloudlets [12]- [17].
Primarily, centralized and decentralized control mechanisms are used in existing literature to address load balancing problems [18]. A common objective function and a series of constraints are formulated in centralized optimization models to determine the optimal load balancing strategies for all cloudlets. A centralized problem of latency minimization is formulated by the authors of [12] and proposed a network-flow based heuristic algorithm for solving it. On the contrary, in decentralized control models, all distributed nodes exchange their local control information among themselves and determine the load balancing strategies without any central controller node. The authors of [19] made feasibility study of two decentralized load balancing algorithms. Although such models are robust for large networks, they cause an exhaustive control message exchange and communication burden on the network. To avoid this, reinforcement learning algorithms appear to be a valuable approach but can present different complexity and convergence issues in real-time [20]. In [21], the authors proposed an efficient reinforcement learning algorithm to find the optimal load balancing decision for fog nodes with unknown reward and transition functions. The authors of [22] proposed a deep recurrent Q-network approach to approximate the optimal joint task offloading and resource allocation for heterogeneous job requests in multi-fog node systems.
Recently, there is a growing interest in applying cooperative and non-cooperative game-theoretical models to various network-related issues, as game theory offers many effective tools for evaluating and researching the relationship between distributed agents in conflict and cooperation [23]. The authors of [14] proposed a distributed non-cooperative load balancing game in small cell networks among the neighboring cloudlets, and compared its findings with a centralized load balancing system that leverage the Lyapunov-drift technique to maximize the long-term system performance. Each cloudlet tries to minimize end-to-end latency costs under specific energy and latency constraints in this formulation. This model, therefore, works very well if the network is loaded moderately, but under very high load conditions it performs very poorly because under very high load conditions, some of the cloudlets start to violate the latency constraints and the Nash equilibrium (NE) solution becomes infeasible. By identifying the estimated latency as the dis-utility function of every cloudlet, the authors of [16] devised a non-cooperative load balancing game where cloudlets try to minimize its disutility and proposed an iterative proximal algorithm to compute the NE solution. In this framework, none of the cloudlets is allowed to offload until their incoming job requests reach a certain threshold. Nonetheless, this algorithm tends to assign a large number of job requests to the under-loaded cloudlets and hence, the end-to-end latency overshoots under very high load conditions.
In general, we notice that most of these load balancing frameworks assume that all the cloudlets are from a single service provider (SP). However, in a practical deployment scenario, usually multiple cloud SPs install cloudlets over the same customer base and to the best of our knowledge, there is no existing cloudlet federation frameworks for controlling and federating cloudlet resources across multiple SPs [3]. Therefore, in this paper we focus on load balancing problem among federated cloudlets, defined as a set of neighboring cloudlets from the same as well as different SPs. Network optimization based frameworks proposed in [12]- [14], [16] are inapplicable against such scenarios because cloudlets from different SPs are are self-interested agents that are most-likely to deviate from a global optimal solution. Moreover, a cloudlet needs to pay incentives for offloading excess job requests to a neighboring cloudlet of different SP, but no payment is required for offloading to a cloudlet of the same SP. Thus, to capture this multi-party interaction among federated cloudlets, we formulate the load balancing problem as an economic and non-cooperative game that acts as an optimization problem among cloudlets from the same SP and acts as a noncooperative game among cloudlets from different SPs.
We critically observe that most of the existing works in literature stressed on minimizing the overall latency while addressing load balancing problems among neighboring cloudlets. However, we feel that keeping the overall latency bounded within the requested quality-of-service (QoS) latency target is more important for economic interaction among federated cloudlets. For example, with a QoS latency target of 10 ms, users do not differentiate among job processing times 4 ms, 8 ms, or 10 ms. Thus, under-loaded cloudlets will always encourage its neighboring overloaded cloudlets to offload extra job requests to them because this provides an opportunity to earn extra incentives. Nonetheless, failing to meet the QoS latency target should incur a significant economic penalty on the cloudlets. With this realization, we propose a novel economic utility function for the cloudlets which is maximum when the end-to-end latency is equal to the QoS latency target for each job class and decreases very rapidly if the overall latency exceeds the QoS target latency.
As the existing load balancing frameworks make the load balancing decisions after the actual job request arrival to cloudlets, the overhead time of the load balancing algorithms makes these frameworks highly unfit, especially for lowlatency applications. To deal with such scenarios, we make the cloudlets predict the job request arrival rates and make the load balancing decisions beforehand, so that immediate processing after actual job request arrival is possible. Furthermore, most of the works assume that job requests demand similar computational resources and do not address the processing of multiclass job requests. However, in practice job requests from different users are heterogeneous and may demand a diverse set of computational resources. Therefore, in this work we consider a general scenario where incoming job requests are from heterogeneous classes and we consider that the cloudlets use processor slicing technique to dynamically slice its computational resources over time for handling heterogeneous job classes. Both these aspects are aligned with beyond 5G and 6G network visions [24].
To compute the pure strategy NE load balancing strategies of the federated cloudlets, we propose a decentralized framework in this paper. Although distributed frameworks are more robust than centralized frameworks, all the cloudlets need to exchange extensive control information among themselves [18], [25]. This issue can be resolved by using various artificial intelligence-based schemes to learn network conditions and make load balancing decisions. However, sometimes the job request arrival process may become fast-varying with lowcorrelation between trained data and real-time data. Hence, we avoid static supervised learning methods to find the NE load balancing strategies that heavily rely on historical data. We propose a reinforcement learning automata-based algorithm while ensuring quick convergence by using the properties of the underlying game. This empowers the cloudlets to make load balancing decisions independently, with a minimal interchange of control messages.
Note that cloudlets are clusters of computers and they are located at some distance from each other. Therefore, along with the processing latency, the intermediate data transmission latency also plays a crucial role in load balancing problems. Most load balancing frameworks do not pay significant attention to this factor. Nonetheless, depending on the bandwidth availability in the inter-cloudlet links, the computation offloading strategies of the cloudlets may also get affected. Hence, We address this issue by considering a bandwidth constraint in our game formulation. Our primary contributions in this paper are given as follows: (i) We formulate the load balancing problem among federated cloudlets with heterogeneous classes of job requests as a novel economic and non-cooperative game-theoretic problem and prove the existence of NE of this game formulation. (ii) We consider practical bandwidth constraints for intercloudlet links in our game formulation and design a complete time-slotted model for implementation of the proposed load balancing framework in practice. (iii) We propose a distributed continuous-action reinforcement learning automata-based algorithm such that neighboring cloudlets can independently compute the NE load balancing strategy and quick convergence is ensured by scaffolding the learning algorithm with the particular characteristics of the underlying load balancing game. (iv) We study various characteristics of the NE learning algorithm against a realistic setting by using the clusterusage traces released by Google in 2011. This trace has non-stationary and self-similar characteristics, but we can still achieve a very high degree of accuracy in learning the NE strategies. (v) Finally, we show that any participating cloudlet can achieve better economic utilities by following our proposed NE load balancing strategies than recently proposed game-theoretic load balancing frameworks, particularly under highly overloaded conditions. The rest of this paper is organized as follows. In Section II, the details of the system model are presented. In Section III, a non-cooperative game-theoretic problem among federated cloudlets for computation offloading is formulated. In Section IV, a distributed continuous-action reinforcement learning automata-based algorithm is proposed. In Section V, the proposed load balancing framework is evaluated. Finally, in Section VI, our primary achievements by using the gametheoretic framework are summarized.

II. SYSTEM MODEL
In this section, we discuss the considered system model and the primary assumptions made. We consider a general heterogeneous deployment scenario for federated cloudlets over access networks. The total number of federated cloudlets in the network is ≥ 2 and C = {1, 2, . . . , } denotes the set of all federated cloudlets. These cloudlets receive heterogeneous job requests from their respective connected devices. If any cloudlet becomes overloaded, it can offload the extra load to its neighboring under-loaded cloudlets, as shown in Fig. 1.
(a) Job request arrival process: We consider multiple classes of job requests with heterogeneous QoS requirements and denote the average job request arrival rate of class-∈ J = {1, 2, . . . , } to the th cloudlet by . The total job request arrival rate at the th cloudlet is given by = =1 . Each cloudlet relies on the access network SPs to successfully deliver the job requests from the associated mobile devices to them and pays for the bandwidth consumed. However, if the network fails to deliver some of the job requests to the cloudlets due to bandwidth constraints, then the network SPs pay a penalty in proportion to the undelivered job requests. As the average job request arrival varies from time instance to time instance, we assume that each is independently distributed over the support Λ = [0, , ], ∀ ∈ C. Therefore, the computation job request profile or true type of all the federated cloudlets is represented as = ( 1 , 2 , . . . , ) ∈ = (Λ 1 × Λ 2 × . . . × Λ ). In practice, the job request arrival process to cloudlets is self-similar and non-stationary [26]. Therefore, the cloudlets predict the incoming job request arrival rates by employing long short-term memory (LSTM) networks [27]. The transmission latency of the incoming job requests and the intermediate transmission latencies with the neighboring cloudlets are also estimated by each cloudlet.
(b) Job request service process: We assume that processors in the cloudlets have similar job processing capabilities, i.e., the same CPU cycle rate (cycles/s), say for each job class-. We also consider that the job requests from each classrequire some CPU cycles with an average value . Hence, the average service rate (jobs/s) of incoming job requests of  class-is defined as = . Therefore, indicates a parametric description of the job requests arriving at the th cloudlet. We consider that each th cloudlet has total ∈ Z ≥1 processors and dedicates ∈ R ≥1 processors for each job class-. By using Google cluster-usage traces, the authors of [28] showed that exponential distribution fits perfectly with the service times of the job requests. In practice, some of the job requests can be completely parallelized and some cannot be parallelized at all. Hence, to guarantee the worstcase performance of the cloudlets, we model the cloudlets as / / queuing systems [29].
(c) QoS latency requirements of job requests: The individual job requests from mobile devices demand a certain number of CPU cycles to process the jobs within a predefined QoS latency target [7]. However, in this paper we are considering a batch of incoming job requests rather than individual job requests to cloudlets. Thus, we denote the computational and latency requirements of the job requests of classto th cloudlet by the consolidated tuple , , . To handle the processing of different job classes, we use processor slicing technique [22] to slice the total processors into slices of processors for each job class-. Moreover, each cloudlet uses a timeslotted model to ensure the QoS latency target for the job requests of each job class- , as shown in Fig. 2. The duration of each fundamental timeslot is chosen to be equal to and without loss of generality, we assume that values are integer multiples of , i.e., are integer values). Depending on the stationarity of the incoming job request traffic, we choose a bigger time interval which is an integer multiple of all of 1 , 2 , 3 , . . . , . Each cloudlet uses a prediction algorithm a few timeslots before the beginning of interval to predict the job request arrival rates for all the job classes. Based on this predicted job request arrival rates, the processor slicing and the NE load balancing strategies for each cloudlet are computed. These values remain unchanged over each interval . As the job requests arrive within each timeslot, they are marked with an integer ( − 1). If some jobs could not be processed within that timeslot, they are rolled over to the next timeslot by decreasing the marking by 1. This can continue until those jobs are processed and the jobs can be deleted after the marking becomes 0. Therefore, the job request arrival queue of the cloudlets can maintain a steady-state unless they are extremely overloaded. As the / / queue provides the worst-case processing latency of the cloudlets, we are also ensuring that all the incoming job requests are processed when the average latency of each cloudlet is ≤ . Please refer to Appendix-A for more discussions on timeslots and modelling of cloudlets as / / queues.
(d) User mobility model: We assume that the mobile users cannot move beyond the coverage area of a cloudlet within a few milliseconds, thus consider the quasi-static mobility model for mobile users. This means that mobile users can be considered almost stationary to the corresponding cloudlets during computation offloading period, but may move on later [7]. Each cloudlet prioritizes the processing of the incoming job requests internally or offloads to a neighboring cloudlet to satisfy the QoS latency target through some internal scheduling algorithm (beyond the scope of this paper).

III. ECONOMIC AND NON-COOPERATIVE LOAD BALANCING GAME AMONG CLOUDLETS
In this section, we formulate the load balancing problem among ≥ 2 neighboring federated cloudlets as a continuouskernel non-cooperative game. In a practical deployment scenario, overloaded cloudlets intend to offload a fraction of its job requests to its under-loaded neighboring cloudlets. We denote the fraction of class-job requests th cloudlet offloads to its th neighboring cloudlet by . The complete job request offloading strategy space of all cloudlets for each job classis defined as In a stable market scenario, all the SPs tend to install cloudlets with similar processing capacity (i.e., = , ∀ , ) to meet a standard QoS for the same customer base. Hence, the total processing latency of the class-job requests at th cloudlet with¯ derived as follows: where, E ( ,¯ / ) is the Erlang-C formula, given by: Note that, we are using incomplete Gamma function in (2), which is often used for approximate and exact representation of many Mathematical series [30]. Each th cloudlet makes an optimal processor slicing by observing their load conditions and by solving the following optimization problem: We consider that soft processor slicing is available and can take any real and ≥ 1 value. Hence, P is a continuous convex optimization problem and can be solved by gradient projection algorithm (refer to Appendix-B).

A. Economic and Non-cooperative Game Formulation
In this paper, we consider the most commonly used pricing schemes e.g., pay-as-you-go policy, where users pay a fixed price per job request without any long-term commitments [31]. For the total amount of incoming class-job requests from all the connected mobile devices, each th cloudlet earns a linearly proportional revenue (Ω ,1 ) per workload. Each th cloudlet also pays a linearly proportional price per workload (Ω ,2 ) for offloading job requests to a neighboring th cloudlet from a different SP and also, receives a linearly proportional price for executing its neighbor's offloaded jobs. The cloudlets can also use cooperative or bargaining strategies among themselves to decide the value of Ω ,2 . We define a parameter to distinguish the price for offloading a job request to neighboring cloudlets as follows: if neighboring cloudlet belongs to the same SP This means that each th cloudlet pays a price to th cloudlet to offload any job requests when it belongs to another SP, i.e. = 1. In addition to these, each th cloudlet pays a penalty price with a proportionality cost factor (Ω ,3 ) for exceeding the QoS target latency . Note that if an overloaded cloudlet offloads some job requests to a neighboring cloudlet and it fails to process them for some reason, then the penalty is actually paid by the neighboring cloudlet. In this work, we consider a linear penalty price similar to the linear latency cost designed in [14]. Therefore, all the federated cloudlets with utility functions U ( , − ), ∀ ∈ C, where − = ( 1 , . . . , −1 , +1 , . . . , ), in the load balancing game intend to solve the following maximization problem: The first term in (3) denotes the total payment received by the cloudlet from mobile users and is linearly proportional to the average workload. The second term denotes the payment th cloudlet receives from th cloudlet to execute its offloaded job requests and the third term denotes the payment th cloudlet makes to th cloudlet for offloading job requests. The fourth term denotes the penalty th cloudlet pays for overall latency (sum of transmission, processing, and queuing latencies) if it exceeds against the total incoming class-job requests, otherwise no penalty is applied. We denote the average roundtrip data transmission latency among mobile devices and the corresponding th cloudlet by and the inter-cloudlet roundtrip data transmission latency by , ∀ , = ∈ C. We also consider that overloaded cloudlets may face network bandwidth constraint while offloading job requests as =1 ≤ , where denotes the average number of bits/job request of class-and denotes the bandwidth available in the link between th and th cloudlets. Each cloudlet needs to pay a price to network SPs for the bandwidth consumed by the offloaded job requests to neighboring cloudlets, but this price is paid separately and over a longer period of time. We assume that the cloudlets operate under the condition of stable operation, i.e., {(1 − = ) + = }/( ) < 1, ∀ , = ∈ C, ∀ ∈ J . The utility of each cloudlet in this load balancing game is an affine function when the total latency is within , otherwise, it becomes a non-linear function whose maximum value is achieved at total end-to-end latency equal to . Hence, the cloudlets are always interested in gaining some incentives by receiving some extra job requests from neighboring cloudlets without exceeding but, the utility starts to decrease beyond this point. Moreover, the individual rationality of each federated cloudlet is maintained due to the default utility, U 0 , ∀ ∈ C.
Furthermore, due to the utility function (3) and constraints (4), which does not provide an explicit latency bound on the participating cloudlets, even highly over-loaded cloudlets can participate in the game and can offload some of the job requests to the relatively under-loaded neighboring cloudlets. This leads to a utility higher than the utility achieved without participating in the game. Note that under such network load conditions, the game formulation in [14] that has explicit delay bound on participating cloudlets becomes infeasible and a valid NE solution can not be computed. We prefer to investigate the NE of the game Γ, because none of the federated Definition Total number of federated cloudlets in the network Total number of different job classes generated from mobile devices Average service rate of job class-at th cloudlet Average job request arrival rate of job class-to th cloudlet Revealed average job request arrival rate of job class-to th cloudlet QoS latency target of job class-Total number of processors installed in th cloudlet Number of processors for job class-of th cloudlet The fraction of jobs th cloudlet offloads to th cloudlet Ω Incentive paid for offloading class-jobs by th to th cloudlet Ω Penalty paid to the market regulator for not processing received jobs Data transmission latency between mobile devices and th cloudlet Data transmission latency between th and th cloudlets cloudlets find it beneficial to deviate unilaterally from the NE computational offload strategy * = ( * 1 , * 2 , . . . , * ).

B. Analysis of the Pure-Strategy Nash Equilibrium
We observe that the utility functions U ( , − ), ∀ ∈ C are non-differentiable in nature due to the presence of max{0, } function. Hence, we cannot derive the best response functions of the cloudlets by directly differentiating the utility functions [32]. At first, we identify whether the cloudlets are underloaded or overloaded and organize their utilities accordingly. As different job classes are processed independent of each other through processor slicing, each cloudlet can simultaneously be under-loaded for one job class while overloaded for some other job class. After this, we proceed to analyze the pure-strategy NE load balancing strategies. Intuitively, three cases can arise in practice and the pure strategy NE solutions are described as follows: Case-1: In this case, all the cloudlets are under-loaded, i.e., they have sufficient computational resources to meet the QoS latency target . Hence, the unique NE solution is * = * = 0. This implies that both the cloudlets can achieve their maximum utilities as the total revenue earned without offloading any job requests to each other. Case-2: In this case, all the cloudlets are overloaded and they can not reduce their individual latencies by offloading any job requests to each other. Thus, it is obvious that the unique NE strategy for both the cloudlets is not to offload any job requests to each other, i.e., * = * = 0.
Case-3: + T ( , ) ≥ , + T ( , ) < , ∀ ∈ C , ∈ C , ∈ J , where C ⊂ C is the set of overloaded cloudlets and C = C \ C is the set of under-loaded cloudlets. We can mark the under-loaded cloudlets and overloaded cloudlets by observing such that + = . Thus, th under-loaded cloudlets do not need to offload anything, i.e., * = 0 but th cloudlet needs to offload their excess job requests to th cloudlets to meet the QoS target latency , as long as overloaded cloudlets do not exceed . Hence, the following NE solution is * > 0 and * = 0. As we can not solve this game analytically, we can verify this solution by a suitable algorithmic solution. As long as the under-loaded th cloudlets can process the entire extra load from th cloudlet, they accept the entire workload. However, when overloaded cloudlets cannot process the entire extra load, then they allow the th only to offload job requests partially such that they do not exceed their QoS target latency . In this case, we realized that the NE load balancing strategies are not entirely controlled by the overloaded cloudlets but also by the under-loaded cloudlets. Hence, we introduce a new set of decision variables for each th cloudlet denoting the fraction of job requests received from all th cloudlets i.e., = ( 1 , 2 , . . . , ) with the jointly shared equality constraints, = , ∀ , = ∈ C, ∈ J . These new set of constraints play an essential role in the evaluation of NE load balancing strategies among federated cloudlets.

IV. DECENTRALIZED LOAD BALANCING FRAMEWORK
In a distributed cloudlet network, the incoming classjob request arrival rate to each cloudlet ∈ C is known only to that cloudlet and no other entity in the network. We assume that the federated cloudlets are non-cooperative and rational utility maximizers and they do not share this private information with each other. Therefore, a reinforcement learning automata-based algorithm helps the cloudlets to independently make load balancing decisions only from their private information and to aid this decision-making process, we use the economic and non-cooperative game formulated in Section III. Moreover, some particular characteristics of the underlying game formulation is used to reduce the search space of the reinforcement learning algorithm and greatly improve the convergence rate of the algorithm. The fundamental stages of the overall control design are summarized below: (a) A load-predictive learning algorithm is executed by each cloudlet just before every time interval −1 by using historical data to predict the incoming job request arrival rate of the next time interval . Each cloudlet uses their predicted job request arrival rates to perform processor slicing for the time interval . (b) Each cloudlet also estimates the transmission latency of the incoming job requests by using the given stochastic parameters of the access network between mobile devices and cloudlets. Each cloudlet also estimates the intermediate transmission latencies with its neighboring cloudlets. (c) Using this learned information, each cloudlet shares a random amount of its job requests to the neighboring cloudlets, depending on the latest probability distribution over its strategy space and observes the utility and rewards received. Based on the reward values received at th and ( − 1) th time-slots, each cloudlet updates the probability distribution over its strategy space for ( + 1) th time-slot.

A. Distributed Reinforcement Learning Algorithm
In this subsection, we design a continuous-action reinforcement learning automata-based algorithm for learning the NE of the continuous-kernel non-cooperative load balancing game formulated in Section III. At first, we define the mixed-strategy of each th cloudlet as continuous probability density function (PDF) ( ) over its pure-strategy space Φ [33]. Therefore, the probability of randomly choosing an action within a close neighborhood of by th cloudlet can be determined from the corresponding PDF ( ). The complete mixed-strategy of all the cloudlets is defined as F := 1 × ... × over the complete pure-strategy space Φ. When each th cloudlet chooses an action , i.e., offloads job requests to neighboring th cloudlets, then the environment responds with a random reward R ( , − ) ∈ [0, 1], which is defined as: Note that constraint (4) is also considered while evaluating (5). In the load balancing game, with a continuousaction reinforcement learning automata-based algorithm, each th cloudlet starts with uniform probability distributions as their mixed-strategies over their individual pure-strategy action spaces and keeps on updating the PDFs based on the received rewards in the following time-slots to ultimately find their pure-strategy NE [34]. After exploring an action ( ) ∈ Φ during th time-slot, the PDFs are updated for ( + 1) th timeslot by the following update rule: where, Θ is the learning rate parameter, is the spreading rate parameter and is a normalization factor so that +∞ −∞ ( +1) = 1, for any . Note that, our proposed reinforcement learning automaton (6) operates as gradient bandit algorithm, based on the idea of stochastic gradient ascent algorithms [35]. Moreover, the term (R ( ) − R ( −1) ) used in this model makes this algorithm highly robust in tracking a non-stationary job request arrival process. The PDFs are continuously updated by the cloudlets based on their private information and rewards received at every time-slot to learn the pure-strategy NE of the non-cooperative load balancing game and the space complexity of this algorithm is O(( × ) 2 × ), where is the length of memory required for storing the discrete version of ( ) [36].
Theorem 4.1. The continuous-action reinforcement learning automata-based algorithm with update rule (6) converges to the pure-strategy Nash equilibrium of the non-cooperative load balancing game.
Please refer to Appendix-D for the proof. Although, the convergence of the proposed algorithm is guaranteed but, we can speed up the convergence rate of the algorithm several times more by scaffolding our understanding about the underlying load balancing game. We observe that whenever some cloudlet is in under-loaded condition, i.e., ( +T ( , )) < , its NE strategy is not to offload any job requests to its neighboring cloudlets. Thus, all cloudlets can update their PDFs accordingly without exploring many job request offloading strategies as long as the under-load condition persists. In addition to this, during overload condition, i.e., ( +T ( , )) ≥ , each cloudlet will offload only a portion of the received job requests and try to shift the peak of the PDF ( ) around the pure strategy NE solution * , such that it can meet with the rest of the job requests by itself. Hence, the corresponding search spaces can be reduced accordingly. As each cloudlet is unaware of the load condition of its neighboring cloudlets, some of the offloaded jobs may get dropped as the under-loaded cloudlet will process jobs as long as the QoS target latency is satisfied. Therefore, when the offloaded jobs are partially processed, each overloaded th cloudlet need to compute the fraction of processed job requests, denoted byˆ . This implies that when th cloudlet offloads job requests but the th cloudlet processes onlyˆ jobs, where 0 ≤ˆ < . Note that cloudlets do not get incentives for dropping offloaded job requests and hence, the rational cloudlets do not drop job requests intentionally. In such cases, the th cloudlet updates the PDF by usingˆ in (6) instead 5: if th cloudlet is overloaded or partially processes the jobs received from all th cloudlets then all th cloudlets compute that onlyˆ jobs are processed (0 ≤ˆ < ), whereˆ are chosen according to the ratio of . 6: At the end of th time-slot, each th cloudlet receives a reward R ( ) from the environment. 7: Each th cloudlet update their mixed-strategy ( +1) ( ) by the reinforcement learning automaton for ( ) ∈ Φ :  of . Furthermore, when th cloudlet is under-loaded and receives job requests from multiple cloudlets but can partially process the job requests i.e., ( + T ( , )) < but ( + T ( , ( + = )) + ) ≥ the residual job processing capacity of the th cloudlet is distributed according to the ratio of . The proposed algorithm is summarized in Algorithm 1.

V. RESULTS AND DISCUSSIONS
In this section, we investigate various fundamental properties of the proposed load balancing strategy through numerical evaluations. For this purpose, we consider a set of 10 neighboring federated cloudlets. At first, we consider a single job class-to compare system performance with a few existing frameworks and the average processing rate of each of the cloudlets with multiple processors as = 1000 jobs/s and incoming job request rates to each cloudlet varies within 0-1500 jobs/s. We consider the duration of each timeslot as = 5 msec, the average value of latency between mobile users and cloudlets as 2 msec, the intermediate transmission latency between neighboring cloudlets varies within 0.5-1 msec, the number of bits/job request varies within 100-200 KB [37], and the available bandwidth lies within 0.5-1 Gbps [1]. The optimal values of proportionality price factors Ω ,1 , Ω ,2 , Ω ,3 , and Ω ,4 can be determined by studying the market equilibrium conditions for providing  cloud-based services [38]. In actual practice, sometimes the proper price factors are also determined by applying the multiple criteria decision-making theory [39]. However, in this work we arbitrarily choose Ω ,1 = 5 × 10 3 , Ω ,2 = 3 × 10 4 , and Ω ,3 = 9 × 10 4 cost/unit load such that feasible solutions corresponding to aforementioned network scenario can be obtained.
In Fig. 3(a), we compare average end-to-end latency of all the participating cloudlets against job request arrival rate with our currently proposed framework and the frameworks proposed in [14] (labeled as "ref. game-1") and [16] (labeled as "ref. game-2"), respectively. To make a fair comparison, we consider only a single class of job requests with QoS latency target 1 = 10 msec (as [14] and [16] deal with a single job class), the difference in job request arrival rates among the under and overloaded cloudlets is within 0-200 jobs/s, and the service rates of all the cloudlets are = 1000 jobs/s. The characteristics of our proposed game-theoretic load balancing framework are intuitively explained in Appendix-C and we see that all the frameworks yield almost similar average latency under low and medium load conditions and the overall latency is within QoS target latency (observe the red dashed line). Hence, the users do not experience any degradation in QoS and latency minimization does not serve any significant impact.
Nonetheless, under high load conditions, when all the cloudlets become sufficiently overloaded, ref. game-1 becomes infeasible as some of the cloudlets start to violate explicit latency constraints and ref. game-2 appears to overload the under-loaded cloudlets by uncontrolled offloading of the job requests. However, our framework still allows the overloaded cloudlets to strategically offload (partial offload) some job requests to relatively under-loaded cloudlets that can still accept some job requests while meeting the QoS latency target 1 = 10 msec. Therefore, the overall latency performance becomes comparatively better than ref. game-1 and ref. game-2. Note that neither of the ref. game-1 and ref. game-2 considers a bandwidth constraint on the inter-cloudlet links which is considered in our framework. Therefore, if the bandwidth is insufficient on the inter-cloudlet links, the performance of ref. game-1 and ref. game-2 will be much worse. However, for the sake of a fair comparison, we have considered that the bandwidth of inter-cloudlet links is sufficient for offloading excess job requests of overloaded cloudlets. Next, Fig. 3(b) shows a comparison among average utility values of all the participating cloudlets against job request arrival rate. As economic incentives are not involved in ref. game-1, and ref. game-2 frameworks, we first find their respective NE load balancing strategies and then compute the inter-cloudlet payment for offloading job requests and penalties for violating QoS latency target by our considered parameters. This brings all the load balancing frameworks on a common ground against federated cloudlet scenarios. Under low load conditions, all the load balancing frameworks yield similar utilities because none of the cloudlets offload any job requests. However, it is clear  To observe the performance of our proposed load balancing game with some real-world traces, we consider the cluster usage trace released by Google in 2011 [40]. We have used a dataset of a total size of nearly 10 6 samples for this purpose. We identify two different classes of incoming job requests with 1 = 10 msec and 2 = 20 msec and fit the PDFs of interarrival times with exponential PDFs (refer to Appendix-A). We perform an event-driven simulation on OMNeT++ with these traces among three cloudlets. Each of these cloudlets has = 10 processors and the different number of processors dedicated to the two different job classes ( ) are decided by solving P . The job request service rates are 1   randomly offloads a fraction of their incoming job requests to their under-loaded neighbors during all intermediate timeslots within interval. Every time a job request arrives, each th overloaded cloudlet randomly offloads the job requests to its th neighboring cloudlet as per * and the job processing simulation is implemented in the OMNeT++ environment as described in Section II.
In Fig. 4(a), we show three subplots to compare the actual and simulated utilities of each of the cloudlets. The actual utilities are referred to the final steady-state utility values obtained by the reinforcement learning algorithm against each set of values of all the cloudlets. Firstly, we plot the actual utilities with the actual job request arrival rates. Secondly, we plot the simulated utilities with actual job request arrival, but the NE solution is computed by using the predicted job request arrival rates. Hence, the NE solution is erroneous and the simulated utilities also deviate from the learned utilities. We observed that the simulated utilities have a mean error of 10% from the actual utilities. Out of this, nearly 8% error was due to the prediction error in job request arrival rates and the rest of the error was due to the approximation error in modeling the job request arrival process as a Poisson process. In Fig. 4(b), we show the actual and simulated average latencies of each of the cloudlets. As fair processor slicing method is employed, processors among all the job classes are shared fairly. This implies that if the job request arrival rate for one job class becomes very high, it gets a higher number of processors to lower down its processing latency, but the other classes are not deprived so that their processing latencies become very high in turn. Thus, if a cloudlet appears to be overloaded even after internal processor slicing, it should try to offload its extra job requests to its neighbors to meet the respective values. Again, we consider the same scenario of three federated cloudlets with 1 1 = 2 2 = 3 3 = 1000 jobs/s and = 10 msec. Moreover, the job request arrival rates to each of these cloudlets are dynamically varying and the values are considered from the Google cluster traces. Therefore, the utilities of each th cloudlet U also changes accordingly. Each cloudlet predicts their incoming job request arrival rates one-timeslot before the beginning of time interval with an LSTM network, trained by a stochastic sub-gradient algorithm to decide whether it is under-loaded or overloaded. If a cloudlet finds itself overloaded, it starts to randomly offload its extra load to under-loaded neighboring cloudlets according to its reinforcement learning automata. Fig. 5(a) has a learning rate Θ = 0.9 and spreading rate = 0.03 and Algorithm 1 takes some time to learn the actual utility values due to more exploration. Fig. 5(b) has Θ = 4.0 and = 0.01 where Algorithm 1 learns the actual utility values relatively faster but performance may degrade during sudden changes due to less exploration. Note that if a cloudlet is under-loaded then it decides its NE strategy without any exploration. Overall, we observe that by employing Algorithm 1, the cloudlets can achieve fairly accurate utility values.
From the previous results, we found that it is essential to choose the learning rate and spreading rate parameters in such a way so that a proper balance between exploration and exploitation is maintained against the stationarity time of job request arrival rates. Thus, in Fig. 6 we plot the average NE learning accuracy against the stationarity period of the job request arrival rate. We vary the stationarity time from 0.5 sec to 150 sec and also tune Θ from 0.5 to 3 with = 0.01 in Fig. 6(a), and tune from 0.009 to 0.030 with Θ = 0.9 in Fig. 6(b). We observe in general, that the performance of our proposed Algorithm 1 increases as the stationarity of the job request arrival rate increases because the algorithm is given more time-slots to exploit and explore the search space.

VI. CONCLUSIONS
In this paper, we have proposed a novel load balancing framework among federated neighboring cloudlets from the same as well as different SPs. In this framework, a distributed continuous-action reinforcement learning automatabased algorithm is used to facilitate the neighboring cloudlets to find their load balancing strategies. The primary advantage of this algorithm is that a minimal exchange of control messages among neighboring cloudlets is required. We formulate the load balancing problem among multiple federated cloudlets as an economic and non-cooperative game where we focus on latency bound rather than latency minimization. Furthermore, this framework acts like a non-cooperative game among neighboring cloudlets from different SPs and like an optimization problem among cloudlets from the same SP. We use some typical characteristics of this underlying game to achieve faster convergence for the proposed reinforcement learning automata. In addition, this load balancing framework is capable of handling job requests from heterogeneous classes by employing processor slicing, which is unique over stateof-the-art frameworks. Through numerical evaluations, we have also shown that our proposed framework achieves better economic utilities under medium and high load conditions compared to some existing frameworks. We have used real job request arrival traces in an event-driven simulation for performance evaluation of the load balancing framework in a realistic scenario. We showed that the average accuracy of the utility values of the cloudlets with predicted job request arrival rates is nearly 90% of the utility values with actual job request arrival rates. A large percentage of the error arises from the job request arrival rate prediction error and only a very minimal percentage from our proposed model. for the job requests of each job class-. The duration of each fundamental timeslot is chosen to be equal to and without loss of generality, we assume that values are integer multiples of , i.e., 1 = 1 × , 2 = 2 × , 3 = 3 × , and so on ( 1 , 2 , 3 , . . . , are integer values). Depending on the stationarity of the incoming job request traffic, we choose a bigger time interval which is an integer multiple of all of 1 , 2 , 3 , . . . , . Each cloudlet uses a workload predictive algorithm a few timeslots before the beginning of each time interval to predict the job request arrival rates for all the job classes. Based on this predicted job request arrival rates, each cloudlet can solve P to decide the processor slicing. They can use this information to calculate the pure strategy Nash equilibrium (NE) load balancing strategies. This processor slicing and the NE load balancing strategies are applied when the time interval actually begins. During each interval, the processor slicing and the NE load balancing strategies remain unchanged over all the timeslots.
The cloudlets need to use the timeslotted model to ensure the target QoS latency for each job request class. The job requests arrive continuously and randomly within each timeslot and are marked with an integer − = ( − 1). The duration of a timeslot is subtracted from to adjust the time lost due to the transmission latencies. If some jobs could not be processed within the timeslot of job request arrival, especially the jobs arriving at the verge of finishing of the timeslot, they are rolled over to the next timeslot by decreasing the marking by 1. This process can continue until those jobs are processed and the jobs can be deleted after the marking becomes 0. This process is followed even if the ongoing time interval is over and the next time slot +1 begins. Therefore, the job request arrival queue of the cloudlets can maintain a steady-state unless they are extremely overloaded. As the job request arrival rates are predicted slightly before the actual job request arrival, all these network management operations can be done very efficiently.
For more clarity on this timeslotted protocol design, we provide the following numerical example. We consider the values = 1 msec, 1 = 5 msec, 2 = 10 msec, and = 1 sec. Thus, there are 1000 timeslots of = 1 msec within each interval of = 1 sec. Say, 5 timeslots before the beginning of each interval, the processor slicing and NE load balancing strategies are computed based on the one timeslot ahead predicted job request arrival rates. When job requests of class-1 arrive within a particular timeslot, they are marked as 4. If some of these jobs could not be processed within that timeslot, they are rolled over to the next timeslot while updating the marking as 3 and continued upto 0. If these jobs are still unprocessed after this timeslot, then they are dropped. Similarly, when job requests of class-2 arrive within a particular timeslot, they are marked as 9 and the same process is followed. At the end of each interval, the processor slicing and NE load balancing strategies are re-calculated. Therefore, the job request arrival queue of the cloudlets can always remain in the steady-state.
(b) / / queuing model for cloudlets: We consider that each cloudlet has a finite number of processors, say . In practice, some of the jobs can be completely parallelized and some cannot be parallelized at all. Hence, the actual performance of each cloudlet will lie within the performance bounds provided by / / and / /1 queuing systems. Recall that the average processing latency of an / / queuing system with job request arrival rate = jobs/s, service rate = jobs/s and = /( ) is expressed as follows [1]: where, E ( , / ) is the Erlang-C formula, given by: Again, the average processing latency of an / /1 queuing system with the combined service rate of jobs/s is expressed as follows: Therefore, under light load conditions, we have 1, − ≈ 0 and hence derive the following: Fig. 1: / /2 queuing system with service rate = 1 jobs/sec and an / /1 queuing system with service rate = 2 jobs/sec.
On the other hand, under heavy load conditions, we have and hence derive the following: From the above analysis, we observe that E( ) ≥ E(ˆ ) holds true under low as well as high load conditions. Therefore, to guarantee the worst-case performance of the cloudlets in terms of the processing latency, we model the cloudlets as / / queuing systems. To illustrate this, we compare the performance of an / /2 queuing system with service rate = 1 jobs/sec for each of the processors and an / /1 queuing system with service rate = 2 jobs/sec in Fig. 1. Hence, we have modeled the cloudlets as / / queuing systems in our manuscript as this guarantees the worst-case average processing latency of the cloudlets.
(c) Poisson job request arrival to cloudlets: To validate our assumption that the incoming job requests to the cloudlets follow the Poisson process, we consider the cluster usage trace released by Google in 2011 [2]. The Fig. 2a shows the interarrival times of the job requests from two different job classes, i.e., the difference between the timestamps of the consecutive job requests. From this plot, we can observe two primary characteristics, viz., self-similarity and non-stationarity. These properties of the Google cluster traces were studied in more details by the authors of [3]. Due to self-similarity property, the series of job request inter-arrival times looks the same under any magnification of scales. The non-stationarity property indicates that the distribution and statistical moments of the job request inter-arrival times vary over time. As the job request arrival process is a non-stationary random process, we considered small windows of 1 sec and attempted to fit the distribution of the inter-arrival times with exponential probability distribution in Fig. 2b. We checked that the distribution fitting satisfies a significance level of 0.05 (verified with Chisquared goodness-of-fit test). Similar observations were also reported by the authors of [4]. Note that we have scaled up the job request arrival rate (jobs per 300 sec are treated as jobs per 1 sec) since we found that the job request arrival rates to the Google clusters are much lower than our expected job request arrival rates to the cloudlets.
In Fig. 3a, we show job request arrival rates for two different job classes from the Google cluster traces. As the job request arrival process is self-similar and non-stationary, each cloudlet uses long short-term memory (LSTM) networks and stochastic gradient descent algorithm to predict the incoming job request arrival rates to the cloudlets. In this figure, the job request arrival rates over the first 100 sec interval are used to train the LSTM network. Once the network is trained, it is used to forecast the job request arrival rates 1 sec before the actual job request arrival over the next 50 sec interval. We observe that a very high degree of prediction accuracy is achievable by the LSTM networks because the prediction error is greatly reduced at each iteration by observing the actual job request arrival rates from the previous iteration. Fig. 3b shows the job request arrival rates used in the event-driven driven simulation by using OMNeT++.

APPENDIX B OPTIMAL PROCESSOR SLICING IN CLOUDLETS
In this load balancing problem, each cloudlet uses processor slicing technique to handle multiple classes of incoming job requests with heterogeneous QoS target latency requirements, as shown in Fig. 4. We assume that each cloudlet has multiple processors and denote the total number of processors in th cloudlet by . If the incoming job requests can be classified in to classes, then the th cloudlet needs to create slices with processors in each slice such that 1 ≤ ≤ . Firstly, each th cloudlet observes their individual incoming job request arrival rates for each class and finds the optimal values of by solving the following optimization problem: This optimization problem ensures an optimal processor slicing among all the job classes and allows soft processor slicing, i.e., can take any real and ≥ 1 value. We formulate this problem such that a fair number of processors are allocated for all the job classes and avoid situations where the highest number of processors are allocated to a job class whose latency difference from can be extremely low but high for others. To solve the problem efficiently, we can reformulate the problem as follows: We can calculate the processing latency values as follows: (a) Inter-arrival times of the job requests from three different classes to Google clusters.
(b) Fitting job request inter-arrival times over 1 sec windows to exponential distribution. where, E ( , / ) is the Erlang-C formula, which can be written in an alternative form with = /( ) for some advantage in the analysis as follows: Note that, we are using incomplete Gamma function in (7), which is often used for approximate and exact representation of many Mathematical series [5]. The authors of [6] proved a conjecture that the average latency of / / queues are convex functions of the number of processors. They considered an expression like (6) and showed that it is a ratio of two expressions where the numerator is strictly decreasing and the denominator is strictly increasing with . By further exploiting this property, they proved the convexity of this function. This implies that R is a continuous convex optimization problem and hence, it can be solved very efficiently with gradient projection algorithm.  Proof: Recall that in the game Γ, the utility function of each th cloudlet U ( , − ), ∀ ∈ C is defined as follows: From this definition, we can observe that when th cloudlet is able to meet the QoS target latency (under-loaded), or ( + T ( ,¯ ) + − ) ≤ 0, then the utility function can be interpreted as an affine function as follows: Nonetheless, when th cloudlet is unable to meet (overloaded), or ( + T ( ,¯ ) − ) ≥ 0, then the utility function appears as a non-linear function as follows: If (10) represents a concave function, then the sufficient condition is that its Hessian matrix should be a positive semi-definite matrix [7]. However, we find that both the diagonal and non-diagonal elements of this matrix are equal, i.e., 2 U ( ) 2 = 2 U , = , ∈ C, ∈ J . Clearly, this implies that the Hessian of (10) is not a positive semi-definite matrix and we extend to evaluation of the bordered Hessian matrix [8]. Note that the th order bordered Hessian matrix of U ( , − ), ∀ ∈ C, where = 1, 2, . . . , is written as: whose bordered elements are the first-order derivatives of U ( , − ). Now, we observe that the determinant values of the bordered Hessian matrix (11), denoted by , are negative Therefore, we can conclude that (10) is a quasi-concave function of [8]. In general, we can conclude that the utility functions of each th cloudlet U ( , − ), ∀ ∈ C are affine functions of when they are under-loaded and quasi-concave functions of when they are overloaded. Hence proved. It is interesting to note that in this load balancing game, each competing cloudlet is interested in maximizing their individual utilities rather than strictly minimizing the average end-to-end latency as most of the existing works. Hence, the cloudlets are always interested in receiving some job requests from neighboring cloudlets as long as the QoS latency requirement is met and some extra incentive is gained. Fig. 5 shows that with a sufficient amount of job requests and a set of properly chosen parameters Ω ,1 , Ω ,2 , and Ω ,3 , the utility function th cloudlet monotonically increases as more job requests are offloaded by the neighboring th cloudlet until the total end-to-end latency is equal to the target QoS latency value . The maximum utility is achieved at the point where the total end-to-end latency is equal to and the utility starts to decrease beyond this point. Therefore, the overloaded cloudlets can only offload to its under-loaded neighboring cloudlets until their overall latency is equal to . Thus, the fraction of incoming job requests offloaded by an overloaded cloudlet is controlled by the overloaded cloudlet as well as its underloaded neighboring cloudlets.
Furthermore, note that in our game formulation, there is no explicit latency constraints on the federated cloudlets, even highly over-loaded cloudlets can participate in the game and can offload some of the job requests to the relatively underloaded neighboring cloudlets. This makes their utility higher than the utility by not participating in the game. Nonetheless, under such conditions the game formulation in [9] that has explicit delay bound on participating cloudlets becomes infeasible and hence, a valid NE solution can not be obtained. As we compute the NE solution of this load balancing game, we can observe a very important property of the solution. The authors of [10] proved that the average latency expression of / / queuing systems is monotonically increasing and convex against load . Intuitively, this implies that as the class-job request arrival rate ( ) increases to th cloudlet, it needs to offload a higher fraction of job requests to its neighboring cloudlets for maintaining the stable condition of operation. Mathematically, if th cloudlet is overloaded such thatˆ > > 0, thenˆ > > 0.

APPENDIX D PROOF OF THEOREM 4.1
Proof: We considered that the strategy space Φ is a compact set in the load balancing game Γ and the stochastic reward values are normalized, i.e., R : R → [0, 1], ∀ ∈ C, ∈ J . The authors of [11] showed that with such conditions, a continuous action reinforcement learning automata-based PDF update rule, can be guaranteed to converge to a local optimal NE, if the following necessary restrictions are imposed: (i) We choose a sufficiently small value of Θ such that each th cloudlet can match their expected strategy through iterations.
(ii) We choose a value of such that the equilibrium point of ( ) has an upper bound 1 √ 2 . Thus, if the above restrictions are satisfied, then the continuous action reinforcement learning automata-based algorithm will converge to at least one of the existing NE of the underlying continuous-kernel game.
In Fig. 6, we show the convergence properties of the proposed reinforcement learning algorithm. We consider two neighboring cloudlets (just for illustration purpose), Cloudlet-1 and Cloudlet-2 with intermediate transmission latency = 1 msec, trying to meet a QoS requirement of = 10 msec. We also consider that 1 1 = 2 2 = 1000 jobs/s and 1 = 970 jobs/s, 2 = 800 jobs/s, such that Cloudlet-1 is overloaded, but Cloudlet-2 is under-loaded. Therefore, to meet , Cloudlet-1 needs to offload 0.083 × 1 job requests to Cloudlet-2, whereas Cloudlet-2 does not need to offload anything. From the PDFs of both the cloudlets also, we observe that the most preferable decision for Cloudlet-2 is not to offload and Cloudlet-1 prefers to offload around 8-9% of its total incoming job requests. As we choose the learning and spreading parameters as Θ = 0.9 and = 0.01, respectively, we see that Algorithm 1 converges to the expected utility and reward values for both the cloudlets within a few hundred iterations. Note that, instead of searching over the whole strategy space, we considered only the most likely strategies that the cloudlets should possibly consider by using our understanding from the underlying load balancing game. Moreover, by using these techniques, Algorithm 1 can perform 100% accurately when all cloudlets are under-loaded without much exploration.