Asynchronous Coordination of Distributed Mixed-Integer Linear Subsystems via Surrogate Lagrangian Relaxation Asynchronous Coordination of Distributed Mixed-Integer Linear Programming Systems via Surrogate Lagrangian Relaxation

With the emergence of Internet of Things that allows communications and local computations, and with the vision of Industry 4.0, a foreseeable transition is from centralized system planning and operation toward decentralization with interacting components and subsystems, e.g., self-optimizing factories. In this paper, a new “price-based” decomposition and coordination methodology is developed to efficiently coordinate subsystems such as machines and parts, which are described by Mixed-Integer Linear Programming (MILP) formulations, in a distributed and asynchronous way. To ensure low communication requirements, exchanges between the “coordinator” and subsystems are limited to “prices” (Lagrangian multipliers) broadcast by the coordinator, and to subsystem solutions sent to the coordinator. Asynchronous coordination, however, may lead to convergence difficulties since the order in which subsystem solutions arrive at the coordinator is not predefined as a result of uncertainties in communication and solving times. Under realistic assumptions of finite communication and solve times, convergence of our method is proved by innovatively extending Lyapunov Stability Theory. Numerical testing of generalized assignment problems through simulation demonstrates that the method converges fast and provides nearoptimal results, paving the way for self-optimizing factories in the future. Accompanying CPLEX codes and data are included. Note to practitioners—In view of a foreseeable transition toward self-optimizing factories whereby machines and parts have communication and computational capabilities, a novel distributed and asynchronous method to coordinate distributed subsystems is developed. Under realistic assumptions of finite communication and solve times, method convergence is proved. Numerical testing of generalized assignment problems through simulation demonstrates that the method converges fast and provides near-optimal results, paving the way for self-optimizing factories in the future. Accompanying CPLEX codes and data are included.


I. INTRODUCTION
ith the emergence of Internet of Things [1,2] empowered by smart sensors together with advanced computation and communication technologies, and with the vision of Industry 4.0 [3,4], a foreseeable transition is from centralized system planning and operation toward decentralization, e.g., self-optimizing factories with interacting components and subsystems. Within these futuristic factories, machines and parts are coordinated through 5G networks to meet certain objectives such as on-time delivery. In manufacturing, examples of operations optimization problems include planning, scheduling and dispatching problems [5,6]. Scheduling problems are solved before each shift and require short solving times such as a few minutes, and online dispatching of a part to a machine may require a few seconds. Because of the many possible interconnections among parts, operations and machines, efficient communication scheme is required to prevent bandwidth overloading. This motivates the need for efficient coordinated operations of subsystems while ensuring high computational and communication efficiency.
Within manufacturing, machine and part subsystems are frequently formulated as mixed-integer linear programming (MILP) subproblems. Traditionally, to coordinate MILP subproblems, Lagrangian relaxation (LR) [7][8][9][10][11] has been used by exploiting problem separability in manufacturing problems such as job-shop scheduling [10] and in power systems problems such as unit commitment [11]. Within standard LR, multipliers (or "shadow prices") are updated using subproblem solutions based on levels of violation of relaxed constraint using subgradient methods [12][13]. Because of exploitation of decomposability, the LR method is a good candidate for coordinating distributed subsystems whereby a coordinator updates multipliers and only needs to know solutions of subproblems associated with distributed subsystems. However, standard LR methods suffer from major convergence difficulties such as high computational effort, zigzagging of multipliers and the need to know the optimal dual values. Moreover, since standard LR requires solving all subproblems to update multipliers, the LR method is synchronous. When the number of subproblems is large, synchronous coordination may lead to inefficient time management since "fast" subproblem solvers will likely spend significant amount of time waiting for synchronization.
Some the above difficulties have been overcome within subgradient incremental methods [14,15], Alternate Direction Method of Multipliers (ADMM) [16][17][18][19][20][21], surrogate subgradient method [22], and surrogate Lagrangian relaxation (SLR) [23][24]49]. The distributed and asynchronous incremental subgradient method [15] for optimizing convex dual functions consisting of a large number of components, which arise within Lagrangian relaxation framework with a large number of subproblems, overcomes the synchronization difficulty. However, the method imposes the requirement that all Mikhail A. Bragin subproblem solutions arrive to the coordinator with the same "long-term" frequency on average. ADMM [16][17][18][19][20][21], a decomposable version of the Method of Multipliers (frequently referred to as "Augmented Lagrangian Relaxation" (ALR) [25,26]), aims at accelerating convergence of traditional LR by penalizing constraint violations by using quadratic penalty terms and by decomposing relaxed problems arising in ALR to reduce computational effort. However, when it comes to coordination of MILP subproblems, ADMM does not converge.
Our recent SLR method [23,24,49] has overcome major convergence difficulties of standard Lagrangian Relaxation such as high computational effort, zigzagging of multipliers, and the need to know the optimal dual value for convergence. In [49], it has been demonstrated that the method is capable of efficiently coordinating thousands of subsystems. These methods will be reviewed in more detail in Section II.
In this paper, a novel distributed and asynchronous "pricebased" decomposition and coordination method based upon the SLR method will be developed to efficiently coordinate distributed MILP subsystems within futuristic self-optimizing factories in Section III. Within the framework, multiple distributed subsystems and one coordinator have computation and communication capabilities.
Information exchanges between the coordinator and subsystems are limited to "prices" (Lagrangian multipliers) broadcast by the coordinator and to subsystem solutions sent to the coordinator to avoid excessive data transfer within the system.
While asynchronous coordination avoids the synchronization issue, it leads to major convergence difficulties: 1) because of uncertainties in solving, communication and multiplier-updating times, the order in which subsystem solutions arrive to the coordinator is uncertain, and 2) subsystem solutions are obtained based on multipliers of different vintages, and multipliers may not converge. To overcome these difficulties while ensuring fast speed, rather than requiring the "long-term" frequency requirement as in [15], convergence is proved under a "freshness" assumption, whereby a coordinator can update multipliers without waiting for "slow" subproblems as long as all subproblem solutions arrive to the coordinator at least once within a finite number of iterations. Our novel idea to establish convergence is through the novel use of the Lyapunov energy function defined as the square of the distance from the current prices to the optimum with the idea of forcing this function to approach zero thereby ensuring that prices approach their optimal values. Although not monotonically decreasing as required by traditional Lyapunov methods for convergence [27], an upper bound is innovatively established as an envelope of Lyapunov functions for all possible (uncertain) trajectories of multipliers ("prices") that result from uncertain sequences of subproblem solutions arriving at the coordinator. Based on the contraction mapping concept whereby distances between multipliers at consecutive iterations decrease, it is then proved that this upper bound approaches zero.
In section IV, by simulating asynchronous update of multipliers, two examples are presented. The first small example is to show that Lyapunov functions within of the new method while non-monotonic, approach zero fast. The second example is based on generalized assignment problems, which can be viewed as simplified problems that arise within factories.
These results demonstrate that the new method converges fast. With such effective distributed and asynchronous coordination, the method has valuable implications for future self-optimizing factories to coordinate machines or parts.

II. LITERATURE REVIEW
In this section, standard Lagrangian Relaxation (LR) will be reviewed in subsection II.A. In subsection II.B, the distributed asynchronous incremental subgradient method as well as asynchronous ADMM, both are version of LR tailored for asynchronous coordination, will be reviewed and their limitations will be presented. In subsection II.C, our recent Surrogate Lagrangian relaxation will be reviewed as a promising approach for the development of an efficient asynchronous coordination method. Since this paper deals with coordination of MILP subsystems, branch-and-cut, an MILP method, will be reviewed in subsection II.D. Methods that do not support distributed coordination, such as heuristics methods, or the distributed methods that require continuity of problems will not be reviewed.

A. Standard Lagrangian Relaxation.
Traditionally, to solve MILP problems, Lagrangian relaxation [7][8][9][10][11] has been used to exploit problem separability. Specifically, in manufacturing, to solve job-shop scheduling, machine capacity coupling constraints are relaxed to decompose the problem into part subproblems [10]. In power systems, to solve unit commitment problems, system demand coupling constraints are relaxed to decompose the problem into individual unit subproblems [11]. Within standard LR, multipliers (or "shadow prices") are updated after receiving subproblem solutions based on levels of violation of relaxed constraint using subgradient methods [12][13]. Because of exploitation of decomposability, the LR method is a good candidate for coordinating distributed subsystems whereby a coordinator updates multipliers and only needs to know solutions of subproblems associated with distributed subsystems. However, standard LR methods suffer from major convergence difficulties. Because of the presence of discrete variables, the dual function is non-smooth polyhedral concave [28,p. 670,Proposition 7.1.2]. Therefore, gradients may not exist and subgradients are used. As a result, multipliers may suffer from zigzagging across ridges of the dual function [23, p. 192, Fig. 1; 29, p. 594, Fig. 1]. Also, convergence proof as well as practical implementations require the knowledge of the optimal dual value, which is unknown and is typically adaptively adjusted in practice as in "subgradient-level" methods [30] or incremental subgradient methods [31]. However, these adjustments are inefficient and convergence is slow as demonstrated in [23,199,7].

B. Distributed and Asynchronous Coordination Methods. Distributed Asynchronous Incremental Subgradient
Method. To optimize non-smooth dual functions consisting of a large number of components, which arise within the LR framework, in a distributed and asynchronous manner, a distributed asynchronous incremental subgradient method was developed [15]. The method requires that all subproblem solutions arrive to the coordinator with the same "long-term" > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 3 frequency on average, and convergence was proved using the diminishing stepsizing rule. Moreover, convergence was proved under the assumption that the subgradient is split into individual components and each component is updated independently rather than updating the subgradient as a whole. Under this scheme, convergence may be slow in situations whereby there are "fast" and "slow" subsystems solvers because "fast" subsystems may spend significant amounts of time to satisfy the "long-term" frequency requirement.
Alternate Direction Method of Multipliers. ADMM, a decomposable version of the Method of Multipliers [25,26] (frequently referred to as "Augmented Lagrangian Relaxation" (ALR)), aims at accelerating convergence of traditional LR by penalizing constraint violations by using quadratic penalty terms and by decomposing relaxed problems arising in ALR to reduce computational effort. Within the method, to alleviate the issues associated with synchronization, two conditions are used: 1) "partial barrier," which allows the coordinator to update multipliers after receiving solutions from one or few subsystems and 2) "bounded delay," which requires solutions from every subsystem to arrive at the coordinator at least once within a finite number of coordinator iterations [21,32]. The main difficulty of ADMM is that it can guarantee convergence for convex problems only [21, p. 419]. When solving nonconvex problems, ADMM does not convergence [33, p. 73]. When coordinating MILP subproblems, which are non-convex, ADMM does not converge because stepsizes within the method do not approach zero. However, stepsizes are required to approach zero to guarantee convergence when optimizing nonsmooth dual functions [13,23]. Moreover, quadratic penalties make the resulting relaxed problem nonlinear, which cannot be solved by MILP solvers. While penalty terms can be linearized [34], the minimum of penalties is typically not preserved through such linearization and performance of the method is degraded. Furthermore, penalties terms are a part of each subproblem formulation, but these terms involve decision variables from multiple subproblems. Therefore, additional communication requirements are entailed. For example, in power systems, communication requirements among subsystems [21,35] are needed.

C. Surrogate Lagrangian Relaxation Method
All major difficulties of standard LR such as high computational effort required to solve all subproblems, zigzagging of multipliers and the requirement of the knowledge of the optimal dual value, have been overcome within our recent surrogate Lagrangian relaxation (SLR) [23][24]49]. Within the method, it is not necessary to spend the effort to optimally subproblems. Rather, it is sufficient to optimize subproblems subject to the simple "surrogate optimality condition" [23, p. 178, eq. 12], guaranteeing that "surrogate dual" values approach dual values [23, p. 181]. Convergence is proved without requiring the knowledge of the optimal dual value. This was achieved with a constructive process based on the contraction mapping concept whereby distances between Lagrange multipliers decrease at consecutive iterations, and as a result, multipliers converge to a unique limit. At the same time, stepsizes are kept sufficiently large to avoid premature algorithm termination. Additionally, a constructive stepsizing formula satisfying these criteria has been developed. When solving large-scale problems, such as unit commitment problem arising in power systems [49], the method demonstrated high efficiency in the coordination of thousands of power generating units. SLR thus satisfies high computational efficiency requirement because of much improved convergence over standard LR, and low communication requirements because subsystems are not required to communicate with each other. The method has been shown to outperform other previous methods including coordination methods such as ADMM [24].

D. MILP Method: Branch-and-cut
The main premise behind branch-and-cut [36] is that if the convex hull 1 of an MILP is obtained, the problem reduces to solving a linear programming problem. Owing to linearity of the problem, the surface of the convex hull is polyhedral [41], and vertices of the convex hull are feasible solutions to the original MILP problem. Because of finite numbers of variables and constraints, the number of vertices is finite and linear programming methods such as simplex methods converge to the optimal feasible solution within a finite number of iterations [37, p. 6]. However, the convex hull generally cannot be obtained. After relaxing integrality requirements, branch-andcut solves the LP-relaxed problem [37]. Attempting to obtain feasible solutions, branch-and-cut uses "cuts" to cut off LP regions that contain fractional vertices without cutting off feasible solutions. While cuts generally require an infinite number of iterations to define facets of the convex hull, branchand-cut resorts to branch-and-bound [38,39] after a finite number of iterations when "tailing off" of cuts occurs [40, p. 349]. Since the number of fractional vertices that correspond to integer variables is finite, the number of branching operations required to obtain optimal feasible solutions is also finite.

III. CONVERGENCE OF DISTRIBUTED AND ASYNCHRONOUS SURROGATE LAGRANGIAN RELAXATION
In subsection III.A, an MILP problem formulation of a system consisting of several distributed subsystems is presented. In subsection III.B, a Distributed and Asynchronous Surrogate Lagrangian Relaxation (DA-SLR) is developed. In subsection III.С, convergence of DA-SLR is proved.

A. Distributed MILP Subsystems
Consider a system consisting of one coordinator and I distributed subsystems. Each subsystem is subject to its local linear constraints, which will not be considered for simplicity and ease of presentation. The entire system is subject to systemwide coupling constraints, which couple individual subsystems and the MILP formulation of a system can be written as follows: subject to > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 4 where xi = (yi, zi)  Xi  ℝ ℤ , Xi are closed and bounded sets, x = (x1,…,xI) = (y, z)  X  ℝ ℤ , y = (y1,…,yI)  ℝ , z = (z1,…,zI)  ℤ , with ℝ denoting the set of real numbers, ℤ denoting the set of integers. Functions fi: Xi  ℝ and gi: Xi  ℝ are linear. It is assumed that a set of feasible solutions that satisfy (1)-(2) is non-empty. To rule out possible irregularities such as linear dependence of gradients of active constraints in the continuous subspace ℝ , it is assumed that gradient vectors of active inequality constraints with respect to continuous variables y only are linearly independent at a local minimum x * = (y * , z * ) of (1) [42].

B. Distributed and Asynchronous Surrogate Lagrangian Relaxation
As discussed in Sections II, separability of the problem is exploited by relaxing coupling constraints (2) by introducing Lagrangian multipliers  T = (1, …, m) ℝ and decomposing the resulting relaxed problem into individual subproblems: It is assumed that subsystems have computational and communication capabilities. Namely, subsystems are capable of solving their own subproblem (3) and to send the resulting solution to the coordinator.
Within the SLR framework, as discussed in Section II, it is not necessary to spend the effort to fully optimize subproblems. Rather, it is sufficient to stop optimization after the "surrogate" optimality condition for subproblems [23, eq. 57] is satisfied at iteration k+1: This condition is not the necessary requirement in a sense that if a subproblem is solved to optimality and the best solution found is the same as the most recent subproblem solution, i.e., 1 kk jj xx   , then, although this solution does not satisfy (4), the algorithm can proceed. To coordinate subsystems, it is also assumed that the coordinator has capability to receive subproblem solutions, update multipliers and broadcast them to all subproblems. Here 1 1: are "surrogate" subgradient directions that are obtained instead of subgradient directions after receiving solutions from one or few subproblems. If inequality constraints are present in the formulation, multipliers are updated according to (5) with subsequent projection onto the positive orthant. Within the asynchronous framework, subproblems are assumed to be solved without waiting for other subproblems to finish, and coordinator updates multipliers asynchronously without waiting for all subproblem solutions to arrive. Multipliers (5) will be updated using stepsizes s k that satisfy [23, p. 180, eqs. 21a and 21b], which are derived based on the contraction mapping concept and are set as: For notational convenience, superscripts k of multipliers and subproblems will denote coordinator-updating iterations and superscripts of subproblems will denote the most recent subproblem solution available to the coordinator at iteration k.
To ensure that stepsizes (7) are well-defined, the following Assumption is required.

Assumption 1. Boundedness of surrogate subgradients.
Surrogate subgradients satisfy the following condition: (9) This assumption is realistic for MILP problems defined over a closed and bounded set. Indeed, surrogate subgradients are essentially levels of constraints violations. Since constraints are linear and each variable is defined over a closed and bounded set, constraint violations are finite.
Unlike the subgradient method, whereby zero subgradients imply that the optimum is obtained and the algorithm terminates with the optimal primal solution, within SLR, zero surrogate subgradient implies that there are no constraint violations and that a feasible solution is obtained, but it does not imply zero subgradient. Therefore, this solution is not guaranteed to be optimal. In this case, an iteration is skipped without updating multipliers (5) and stepsizes (7)- (8). Figure 1 illustrates the asynchronous update by using one coordinator and three subproblems, and the difficulties caused by asynchronous updating of multipliers. After obtaining a solution to the first subproblem, coordinator updates the multipliers without waiting for other solutions to arrive and broadcasts the updated multipliers to all subproblems. Subproblem 1 can then start solving the problem once receiving updated multipliers. Then, after third subproblem is solved, and its solution arrives at the coordinator, the coordinator once again updated multipliers and broadcasts their values to all subproblems, and so on. While asynchronous coordination avoids the synchronization issue, it leads to major convergence difficulties: 1) because of uncertainties in solving, communication and multiplier-updating times, the order in which subsystem solutions arrive to the coordinator is uncertain, and 2) subsystem solutions are obtained based on multipliers of different vintages, and multipliers may not converge. For example, as demonstrated in Fig. 1, at coordinator iteration 4, 1 3 is obtained using 2 , 2 4 is obtained using 0 and 3 2 is obtained using 1 . As a result, there may be convergence difficulties. In the following subsection, under realistic "freshness" assumption, convergence of the DA-SLR method will be proved.

C. Convergence of Distributed and Asynchronous Surrogate Lagrangian Relaxation
Convergence of Distributed and Asynchronous Surrogate Lagrangian Relaxation (DA-SLR) is proved in three stages. In Stage 1, it is proved that "surrogate" dual values approach dual values and multipliers approach the optimum "infinitely often" (Propositions 1-4). In Stage 2, the Lyapunov function is introduced as the square of distances from multiplies to the optimum, and the upper bound on Lyapunov functions is developed (Propositions 5-6). In Stage 3 it is proved that the upper bound on Lyapunov functions approaches zero thereby leading to convergence of multipliers (Proposition 7).

Stage 1. Convergence of "surrogate" dual values to dual values
It is assumed that subproblem solving times as well as communication times are finite, which is equivalent to the following "freshness" Assumption: Since subproblems are solved subject to the "surrogate" optimality condition (4), rather than obtaining dual values as within standard LR, "surrogate" dual values are obtained, which are generally above the dual surface. To prove this, Propositions 1-2 will first prove that subproblem solutions satisfying (4) converge their optimal values. Proposition 1. Convergence to optimal subproblem solutions for fixed . Assuming that subproblem solutions satisfy the surrogate optimality condition (4), for each subproblem j and there exist finite K'j such that the subproblem solution is optimal for multiplier values : Proof: As explained in subsection II.D, an optimal subproblem solution is obtained by branch-and-cut within a finite number of steps. A subproblem-feasible solution satisfying (4) is also obtained within a finite number of steps. Since multipliers are assumed to be constant here, (4) implies the decrease of subproblem objective function. Essentially, branch-and-cut will continue to search along the unexplored nodes of the branch tree trying to find a lower objective function value until the subproblem-optimal solution is obtained. □ The limitation of Proposition 1 is that it is proved for a fixed set of multipliers. Within DA-SLR, multipliers are updated, and, therefore, the objective functions of subproblems > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 6 (3) will change. In turn, this will affect the optimal solution of a subproblem. Proposition 2 removes this limitation. Proposition 2. Convergence to optimal subproblem solutions. Assuming that subproblem solutions satisfy (4), then for each subproblem j there exist finite Kj (> K'j) such that solution to subproblem j is optimal for multiplier values j K  : To prove (12), introduce the following operator: Because subproblems are defined over bounded sets Xj, solutions are finite and the following inequality holds: The operator A is thus bounded [43]. Therefore, there exists a finite constant C'A > 0 such that the following inequality holds: To establish (12), consider the following norm: Using (13), equation (16) can be rewritten as: Because Xj is bounded, subproblem objective functions (3) take on finite values, therefore, the following inequality also holds: Here, CA is a finite positive constant. Using the Cauchy-Schwarz inequality, equation (18) becomes: Since gj(xj) is a component of constraint violations, Assumption 1 is applicable, therefore: Since stepsizes (7)-(8) approach zero [23], there exist iteration K ' j and Kj such that for any  > 0 the following inequality holds: Therefore, From (20) and (22) is follows that As reviewed in Section II, subproblem convex hulls contain a finite number of vertices, each corresponding to a feasible solution. Moreover, it can be assumed that distances between any two adjacent vertices are greater than . Therefore, optimal solutions at iterations K ' j and Kj are the same and (12) holds.
Since it takes a finite number of iterations to obtain   ' * j K j x  without updating multipliers, it will also take a finite number of iterations to obtain   (not necessarily  * ), and surrogate dual values approach dual values: is a "surrogate" dual value obtained after solving one or few subproblems subject to the surrogate optimality condition (4).
Proof: As proved in [23], stepsizes (7)-(8) approach zero. To prove that surrogate dual values approach dual values, consider first the surrogate optimality condition for one subproblem j: By using (5), inequality (28) can be rewritten as: The inequality (29) can then be equivalently rewritten as: As stepsizes approach zero, there exists so that for all k > and all > 0, the following inequality holds: Therefore, forms a convergent sequence: Indeed, as proved in Propositions 1-2, subproblem solutions approach a limit, which here is denoted as j x . Moreover, the situation whereby is impossible. Multipliers cannot grow without bound because that would imply that there is always positive or always negative constraint violation, 2 implying infeasibility of (1)- (2) (7)- (8), there exists  > 0 and the following condition is satisfied "infinitely often" Here  is an infinite subset of natural numbers. (35) is not satisfied infinitely often for  > 0, then starting from iteration , the following inequality holds:

Proof: If condition
There are three cases: , and the following condition holds: There is a contradiction with (36) because in this case it is possible to find ' 0   that satisfies (35).

Stage 2. Development of an upper bound for Lyapunov functions
In this Stage, the Lyapunov function is defined as 2 If there are inequality constraints, then the violation would be positive.
which is the square of the distance from current to optimal multipliers. Because subproblem solving times and subproblem-coordinator communication times are uncertain, different sequences of subproblem solutions arriving at the coordinator lead to different trajectories of multipliers. As a result, the exact representation of the Lyapunov function is unknown. To resolve this issue, an upper bound of the Lyapunov function is derived in Propositions 5-6 as an envelope of all possible Lyapunov functions for any sequence of subproblems arriving to the coordinator. Two inequalities are derived based on whether condition (38) holds or not in Proposition 5. In Proposition 6, these inequalities are combined to derive an upper bound on all possible Lyapunov functions.

Proposition 5.
As proved in [23, p. 187], under condition (35) and assuming that stepsizes are "sufficiently small" 1/ (2 ) k s   the following inequality holds: If condition (35) is not satisfied or stepsizes are not "sufficiently small" , then the following inequality holds: Therefore, In Proposition 6 below, an upper bound on Lyapunov functions at iteration k+1 in terms of Lyapunov functions at iteration 0 is derived by induction taking into account all possible realizations of Lyapunov functions. otherwise.
Proof: Proof will follow by induction by first proving that the equation is true when k = 1, then assuming it is true for k, showing it is true for k+1.
Before starting the induction, consider the situation whereby k = 0. If condition (35) is satisfied, then inequality (39) holds for k = 0: If (2356) is not satisfied, then inequality (40) holds for k = 0: Since the term       which appears in (47) is greater than     Following the same logic, the following holds for k = 1:  if condition (35) does not holds at k = 1.
What remains to prove is that assuming that (45) holds at iteration k, it also holds for k+1: The validity of (50), will be derived using the same logic as that used in deriving (48). Consider the following inequality: The inequality (52) simplifies to the following: .

Stage 3. Convergence of the upper bound to zero.
In this Stage, it is proved that the upper bound on the Lyapunov function defined in (45) asymptotically approaches zero, thereby leading to convergence of multipliers to * . Proposition 7. Proof of Convergence. If stepsizes are updated per (7)-(8), then → * as → ∞.
Proof: In order to prove that → * , it is necessary to prove that the upper bound on Lyapunov function (right-hand side of (45)) approaches zero. This leads the Lyapunov function to converge to zero and to the convergence of multipliers.
Since * that maximizes the dual function (26), is assumed to exist, the term 2 *0   is finite. Therefore, it is sufficient to prove that the following expression approaches zero: where ℵ is the set is iteration numbers whereby inequality (39) holds, and is the set of natural numbers. To prove that (55) approaches zero, the stepsizing formula (7)-(8) is plugged in first, then the resulting function is upperbounded by using standard functions and their asymptotical representation, then, though algebraic manipulations, the condition for i  is derived to ensure that (55) approaches zero. By exploiting the fact that set ℵ is a proper subset of natural numbers  and that each term   If such N does not exist and condition (35) does not hold infinitely often, then there is a contradiction with Proposition 3.
To prove that the right-hand side of (57) approaches zero, consider from (8) which asymptotically behaves as 1 − 1 as → ∞ [23], therefore, asymptotically, the right hand-side of (57) becomes After expanding the inner product, and ignoring involving ( ) −2/ and higher order terms, (60) becomes To ensure that products involve terms less than 1 each, consider The second term of the right-hand side of (45) also approaches zero, because it involves similar products as in (55), and the proof follows exactly the same logic. The last term in the righthand side of (45) approaches zero because stepsizes approach zero. □

IV. NUMERICAL TESTING
The purpose of this section is to demonstrate performance of the Distributed and Asynchronous Surrogate Lagrangian Relaxation (DA-SLR) method. In Example 1, a small integer linear problem is considered to demonstrate that the Lyapunov function approaches zero fast. In Example 2, a generalized assignment problem with 20 machines and 1600 jobs is considered to demonstrate capability of DA-SLR to solve largescale optimization problems fast with near-optimal performance. Because of difficulties associated with other methods as reviewed in Literature Review, subsection II.B and space limitations, comparison of DA-SLR is performed against its sequential version -SLR [23] only, which, in turn, has been shown to outperform other previous coordination methods in [24]. The DA-SLR method is implemented using IBM ILOG CPLEX Optimization Studio Version: 12.7.1.0 on a PC with 3.10GHz Intel(R) Xeon(R) CPU and 32G RAM.

Example 1. A Small Integer Programming Problem.
Consider the following integer optimization problem     After constraints (65) are relaxed by using multipliers 1 and 2, the Lagrangian function becomes .
The relaxed problem is then separated into six individual subproblems, one for each variable: Derivation of dual function and optimal multipliers. Since the purpose of this example is to demonstrate convergence of multipliers to their optimal values, the knowledge of the dual function and optimal multipliers is needed. The dual function is obtained by minimizing the Lagrangian function (66)  Simulation. Because of the lack of distributed computing and communicating facilities, asynchronous coordination is simulated by simulating subproblem-solving, multiplierupdating, and communication times. Simulated solving an updating times are based on real times obtained by SLR first. According to the SLR results, subproblem solving times range from 2 millisecond (ms) to 115 ms with an average value of 5.36 ms. The multiplier-updating time is either 0 or 1 ms with an average value of 0.036 ms (the updating time is very short and the time resolutions within OPL CPLEX is 1 ms). Subproblem-solving and multiplier-updating times, thus, follow empirical distributions, which for simulation purposes are used to generate solving and updating times using discrete random number generators in MS Excel [48]. Communication time between the coordinator and subproblem solvers is randomly generated following a uniform distribution U[0.95, 1.05] as the average wireless 5G speed is 1 ms. 3 Based on the above data, absolute arrival times (time when one subproblem solver finishes solving one subproblem + communication time) of subproblem solutions are computed. Based on these absolute time stamps, a sequence of subproblem solution arrivals to the coordinator is obtained. Given solution arrival times, the sequence, and the multiplier-updating time, the set of latest subproblem solutions used to update multipliers at each coordinator iteration is determined. Then the time of multiplier arrivals to each subproblem solver is obtained. Given the time when one subproblem solver starts solving, appropriate multipliers to be used are also determined based on multiplier arrival times. In simulations, subproblems are solved and multipliers are updated based on simulated sequences, which are, in turn, based on empirical distributions as described above. To test robustness of the method DA-SLR, 10 testing cases are generated following the above procedure. To demonstrate convergence of DA-SLR when there is a "slow" subsystem, another testing case with one "slow" subproblem solver is also considered. The solving time of the "slow" subproblem solver is assumed to range from 20 ms to 450 ms. Other five subproblem solver remain the same. For comparison purposes, one more testing case with a "slow" subsystem is also generated for sequential SLR.
Results. Distances from multipliers to the optimum, which are square a square roof of Lyapunov functions, for DA-SLR (average, minimum and maximum over 10 cases) and SLR are shown in Fig. 2. The results for the case with a "slow" subsystem are shown in Fig. 3. As demonstrated in Fig. 2, average as well as minimum and maximum values of Lyapunov functions within DA-SLR while non-monotonic, approach zero fast. Moreover, distances to the optimum within DA-SLR approach zero faster, as compared to those within SLR. As demonstrated in Fig. 3, when there is a "slow" subsystem, distances to the optimum within DA-SLR also approach zero. While in this case, the Lyapunov function approaches zero slower than within the system without "slow" subsystems, and still faster than within SLR.
Initialization. The stepsize is initialized by using [23, eq. (76), p 190], whereby an estimate of the optimal dual value q * is used. This estimate is obtained by solving (70)-(72) after relaxing integrality requirements. Initial values of multipliers are obtained based on heuristic initialization rules following [47], whereby the second highest cost of assigning a job is used. DA-SLR (Average (Fig. 2)) Results. Because this example is complicated, optimal multipliers are difficult to obtain. Therefore, Lyapunov functions are not plotted. Rather, dual values and feasible costs obtained by using DA-SLR and SLR and are plotted in Fig. 4 Fig . 4. Performance of DA-SLR and comparison against SLR using parameters M = 75 and r = 0.05 for solving the GAP d201600 instance Fig. 4 demonstrates performance of DA-SLR for the GAP d201600 instance with 20 machines and 1600 jobs. The dual value is obtained every 500 iterations by solving all subproblems to optimality. 4 As shown in Fig. 4, with asynchronous coordination, a feasible cost 97,852 is obtained with a duality gap of 0.0316% after 78 seconds. This demonstrates that DA-SLR converges and finds high-quality solutions significantly fast. As shown in Fig. 4, within SLR, the best feasible cost 97,855 is obtained with a duality gap of 0.0332% after 950 seconds. As demonstrated in Fig. 5, within DA-SLR surrogate subgradient norms reduce fast, and faster than within its sequential SLR version. 4 It is expected that surrogate dual value approach dual values at convergence, but for demonstration purposes, dual values are obtained every 500 iterations.

V. CONCLUSION
In anticipation of trends toward self-optimizing factories, there is a need for efficient asynchronous price-based coordination of distributed subproblems. The novel distributed and asynchronous Surrogate Lagrangian Relaxation is developed and convergence is proved based on the novel use of Lyapunov energy function without requiring its strict monotonic decrease for convergence. Numerical results demonstrate that the novel approach converges fast. With this effective distributed and asynchronous coordination, the method has a strong potential to be used in future selfoptimizing factories to coordinate machines and in future power systems to efficiently coordinate distributed energy resources.