Online Optimization for Networked Distributed Energy Resources With Time-Coupling Constraints

This paper proposes a Lyapunov optimization-based online distributed (LOOD) algorithmic framework for active distribution networks (ADNs) with numerous photovoltaic inverters and inverter air conditionings (IACs). In the proposed scheme, ADNs can track an active power setpoint reference at the substation in response to transmission-level requests while concurrently minimizing the social utility loss and ensuring the security of voltages. Conventional distributed optimization methods are rarely feasible to track the optimal solutions in fast variable environments using a fine-grained sampling interval where the underlying optimization problem evolves with the iterations of the algorithms. In contrast, based on the framework of online convex optimization (OCO), the developed approach uses a distributed algebraic update to compute the next round decisions relying on the current feedback of measurements. Notably, the time-coupling constraints of IACs are decoupled for online implementation with Lyapunov optimization technique. An incentive scheme is tailored to coordinate the customer-owned assets in lieu of the direct control from network operators. Optimality and convergency are characterized analytically. Finally, we corroborate the proposed method on a modified version of 33-node test feeder. Benchmark tests show that the proposed method is computationally and economically efficient, and outperforming existing algorithms.


I. INTRODUCTION
A CTIVE distribution networks (ADNs) integrated with high penetrations of distributed energy resources (DERs) provide increasing flexibility for power systems and accommodate advanced ancillary services such as automatic generation control, fast ramping, and power reserves [1], [2]. However, coordinating numerous DERs to achieve some objectives while considering their distinct dynamics and constraints in a timevarying environment is extremely challenging. Moreover, since the customer-owned DERs are not directly dispatched by the utilities, an incentive-based scheme instead of the direct control from network operators is required.
There has been extended studies on optimal coordination of DERs with the ADN in the literatures. Some works such as [3] design a centralized solver for the formulated optimization problems, which is valid for the small-scale application and utility-owned assets. References [4]- [6] present distributed optimization frameworks, where multiple subproblems need to be solved iteratively until the convergence for each time slot. We term such scheme as solving the problem in a batch fashion [7]. In this case, the system profiles are presumed to be stationary and unchanged during the whole iterative procedure. However, if we use a small time-slot duration to track the optimal setpoints for DERs in the fast variation environment, the batch fashion will be communicationally costly and rarely feasible to yield the convergence before the system profiles change [7]. Additionally, if the batch mode is applied to design an incentive scheme, various rounds of bargains between ADN operators and customers are required before revealing the optimal price [6], which may be user-unfriendly.
Alternatively, the online convex optimization (OCO) has emerged as a promising paradigm. Unlike conventional batch fashion, a limited number of iterations are performed at each time slot in OCO. The generated coordination signals or setpoints are applied directly without waiting for convergency. Based on this computationally affordable online method, DERs can continuously pursue the trajectory of the time-varying optimizers using a fine-grained sampling time in the fast variable environment. For instance, Enyioha et al. [8] propose an online decentralized algorithm for the transmission-level economic dispatch. However, it only considers the active power balancing of large generation units while DERs are not involved. References [9] and [10] coordinate the networked microgrids and DERs to minimize the system cost and loss, respectively. A unified online feedback-based controller for DERs is presented in [7] to pursue a given objective. To provide ancillary services, a primal-dual-based algorithm is proposed in [11] to realize a virtual power plant. Zhou et al. [12] present an incentive-enabled online optimization framework.
For online ADN optimization, the aforementioned algorithms cannot well integrate the energy storage devices with time-coupling dynamics. For instance, the inverter air conditioning (IAC) is thermal storage devices whose power setpoints can be adjusted continuously to provide control flexibility to the ADN [13]. The main barrier for integrating IACs is that they feature constraints of the states of temperatures, coupling their power setpoints within the entire operating period. OCO [11] and [12] neglect DERs with the time-coupling constraints to advocate a fast online controller. Li et al. [14] propose an online algorithm for the optimization problems considering switching costs but only focuses on the temporal coupling between two successive time slots without considering the whole time span. Lookahead and model predictive control (MPC) schemes are leveraged to tackle the time-coupling issues in some recent works [15], [16]. The predictive-based approach is also tailored in an OCO framework in [17]. However, MPC-based framework can only employ a limited number of time windows ahead to avoid prohibitively high computational complexity with larger predictive window sizes. Some researchers leverage stochastic gradient-based methods to transfer these time-coupling constraints [3], [18]. However, [3] is designed in a centralized manner while [18] only considers coordinating batteries at the transmission level. Furthermore, they are not formulated in an OCO framework based on incentives.
This paper investigates a Lyapunov optimization-based online distributed (LOOD) algorithmic scheme to achieve an incentive-based DER coordination. In the proposed algorithm, the networked customer-owned DERs are coordinated to provide the active power tracking service at the substation, while simultaneously minimizing the utility loss and maintain node voltages within an acceptable range. Compared with most existing ADN online optimization works, the main innovations of our developed method are summarized as follows: 1) The proposed method can integrate numerous DERs with time-coupling constraints and tailor their distinct attributes to the OCO framework. Unlike most of the existing techniques for tackling the temporal correlations such as the greedy decoupling method and MPC-based controller, the developed LOOD algorithm can decouple the time-coupling constraints from a long-term horizon to each time slot. This makes the proposed method provide the most cost-efficient result, while also advocating for the real-time deployment due to the presence of the closed-form solution.
2) In the proposed method, DERs can individually make decisions in response to the incentive signals that are generated by the system-wide information, which well protects customers' privacies compared with the direct control schemes. Also, a first-order filter is applied in the incentive generator to alleviate potential fluctuations of the incentive signals and corresponding responses to smoothen the control and convergence process.
3) A rigorous mathematical analysis is conducted to demonstrate the impact on the optimality and convergence of our algorithm caused by the step size, weight coefficients, and initialization of virtual queues. In particular, the relaxation of the time-coupling constraints is proved to cause no violation in our setting. 4) We conduct large-scale numerical tests to compare the proposed algorithm against the state-of-the-art online algorithms and show that the proposed LOOD algorithm outperforms most of them, while reducing significant computational complexity and avoiding the need for prediction of future information that is essential for controllers like MPC.
The remainder of this paper is organized as follows. Section II formulates a mathematical model to coordinate networked DERs. In Section III, LOOD is proposed. The performance analysis is analytically characterized in Section IV. Case studies and benchmarks are described in Section V. Concluding remarks are summarized in Section VI.
Notations and Definitions: We also have some definitions below: 1) We define the time-average value x of a variable x t as its mean over the whole time span, such that 3) A function f (·): R n → R m is σ -strongly convex if for any x 1 and x 2 in the feasible set, there exists some constant σ > 0 such that

II. SYSTEM MODELS
Consider a distribution network with high penetration of DERs. All the control actions are performed in a discrete-time manner with a time interval t. The time slots are indexed by t in T := {1, 2, . . . , T}. Let N := {1, 2, . . . , N} collect all the nodes in the network excluding the substation which is denoted by node 0. The sets of nodes connected with PV inverters and IACs are denoted by K ⊆ N and A ⊆ N , respectively.
The goals of the ADN operator include assuring voltage security at each node, minimizing the social utility loss, and tracking the reference power setpoints at the substation specified by the transmission-level operator when the ancillary service is requested [19]. The schematic overview of the proposed method is outlined in Fig. 1. To coordinate DERs, the ADN operator calculates incentive signals for the next time slot t + 1 based on the current information, i.e., the received measurements and the reference power setpoint at time slot t. In this paper, each node i ∈ N is regarded as a customer, who is presumed to make optimal decisions rationally in response to incentive signals. In addition, DERs are endowed with a home energy management system (HEMS) [20] to assist customers with rational decisions making. After receiving the incentive signals, each HEMS will update the setpoints of governed DERs locally and individually by some simple algebraic calculation.

A. Node Models
We consider two categories of devices that can represent most of the controllable DERs: devices without time-coupling constraints, e.g., PV inverters, small-scale diesel generators, etc., and devices with time-coupling constraints, e.g., IACs, battery storage systems, water heaters, etc. In this paper, we particularly focus on PV inverters and IACs as examples of the two categories, respectively, because as we will see next, the modeling of the two devices are complicated enough to be easily extended to the others of their corresponding categories.
1) PV Inverter Model: PV is connected to the network through an inverter. Let P av,t i be the maximal available active power output of PV i at time slot t and S PV i be the rated apparent capacity of inverter i. The active power output p pv,t i and reactive power output q pv,t i belong to a feasible set Z pv,t i given by: If ∀i / ∈ K, we have P av,t i = 0 and S PV i = 0 for all t. A quadratic function with coefficient c p i is designed to penalize the active power curtailment of PV. As injecting reactive power is not economic for customers, we also penalize the reactive power generation/absorbing as a quadratic function with a coefficient c q i as follows: Devices like small-scale diesel generators, fuel cells, and other controllable devices without time-coupling constraints can be modeled based on their physical limits and mathematically handled similarly as Eq. (3) and Eq. (4); see, e.g., [11]. For simplicity, we do not involve their specific models in this paper.
2) IAC Model: We consider various IACs may connect to one node, and each IAC is installed in an independent room. So, we index the IACs connected to node i ∈ A as a ∈ A i = {1, 2, . . . , A i }, where the cardinality A i represents the number of connected IACs at i. As the operating power is a linear function of its frequency [13], the operating power s t i,a of IAC a ∈ A i at time slot t is regarded as the optimization decision variable. The operating power s t i,a is confined in a box set s min i,a ≤ s t i,a ≤ s max i,a . We further define The reactive power consumed by the IAC can be specified by the active power according to its power factor, denoted by cosθ i,a . Many recent works such as [21] regard the power factor as a given time-invariant constant for each IAC since the operating condition is relatively stable. Therefore, we denote a coefficient ρ i,a = tanθ i,a and the reactive power of IAC a at node i and time slot t equals ρ i,a s t i,a . Then, based on the lower and upper bound vectors s min i and s max i , the feasible set for IACs is formulated as the following compact form: Note that the reactive power is not treated as a decision variable since it directly depends on the active power through the constant ρ i,a . The IAC features the indoor temperature T t i,a following a given dynamic, which can be depicted by the simplified equivalent thermal parameters (ETP) model [13]. Let C i,a denote the equivalent thermal capacity (J/ • C), W i,a be the equivalent thermal resistance ( • C/W), and Q t i,a be the cooling rate (W) of the IAC. We consider IAC working in the cool mode. Heat mode can be modeled similarly and is omitted here. Assume that only one IAC is installed in an independent room, the indoor temperature evolves as follows: where η i,a = e − t/(W i,a C i,a ) , and T t amb is the ambient temperature at time slot t.
Remark 1: Many recent works such as [13] and [22] see the thermal parameters C i,a and W i,a as time-invariant constants, which can be obtained beforehand by the curve fitting. In practice, stochastic fluctuations of the thermal parameters will impact the accuracy of the model. To cope with the bias of ETP model with fixed thermal parameters, real-time measurements of indoor temperature will be leveraged as feedbacks to compute the next round setpoints of operating power, which will be shown in Section III-C. Numerical test in Section V will verify the performance of the proposed algorithm considering the unpredicted stochastic fluctuations of thermal parameters.
Then, the cooling rate Q t i,a can be modeled as the following linear function of the operating power s t i,a [13]: where k i,a and f i,a are constant coefficients for a given IAC a.
For further discussion, the ETP model can be equivalently formulated as: where ξ +,t i,a and ξ −,t i,a denote the temperature increase part and decrease part from time slot t to t+1, respectively. The increase part is caused by the ambient heat radiation. The decrease part is a function of the operating power s t i,a to be optimized. To satisfy the temperature requirements, T t i,a is restricted by: where T L i,a and T H i,a are the lower and upper temperature limits. Note that constraints (9) are temporarily coupled. Therefore, making decisions at time slot t will impact the future temperatures and decisions. To cope with the timecoupling constraints, we will tailor the Lyapunov optimization approach to an online adaptation in Section III.
To better control temperature, we define a utility loss function of all IAC of node i at time t as a quadratic function penalizing the deviation of the actual temperature from the setpoint: where c AC i,a is a positive cost coefficient and T set i,a is the temperature setpoint defined as the median value of T L i,a and T H i,a . Remark 2: We have so far presented a complicated and practical IAC model. Other DERs with time-coupling constraints can be modeled in a similar or simpler way. For example, see battery storage model in [17], [18] and water heater model in [23]. Moreover, plug-and-play devices like electric vehicle charging and mobile storage systems can also be modeled by slightly adapting the proposed model as follows: when some mobile storage system i is plugged in at time t, its feasible set Z t i can be modeled similarly as a regular battery system; when it is plugged out at time t, its feasible set can then be modeled as a singleton Z t i = {p i = q i = 0} until it is back to action. Such adaptations all fall within our general assumptions for analytical and numerical characterization and thus do not affect our main results. Therefore, without losing generality, we focus on IACs only in this paper for presentational simplicity.
3) Aggregate Model for Nodes: T collect all the decision variables at node i. We can present the feasible set of z t i as: Note that Z t i only comprises some simple constraints excluding the time-coupling temperature constraints.
Then, we define the total utility loss of the customer i as the accumulation of that of each DER:

B. Network Model
For each node in the ADN, the aggregate active power consumption p t i and reactive power injection q t i can be calculated by: with p In,t i and q In,t i being the active and reactive power consumption of the inelastic loads of node i, respectively. For notational simplicity, we collect all the node active and reactive power by vector p t and q t , respectively. Let v t i denote the voltage magnitude of node i at time t, and collect the voltage magnitudes of all nodes at time t by a vector The active power at the substation at time slot t is denoted by p t 0 . To develop a computationally affordable controller to advocate the online implementation, power flow linearization is leveraged to model the AC power flow equations, given by: , and o ∈ R 1 can be obtained by numerous linearization methods with high accuracy, such as approaches in [9]- [11]. To further reduce the linearization inaccuracy, real-time measurements of the voltage magnitudes and active power at the substation, denoted by v m,t and p m,t 0 , respectively, are leveraged as feedbacks to the proposed LOOD algorithm.

C. Problem Formulation
The optimization problem can be formulated as a timeaverage utility loss minimizing problem P 1 as follows: where the objective function is the time-average value of t with t = E( N i=1Û t i ) being the expectation of the summarized utility losses of all customers at time slot t. We define random variables in (15). In practice, even though t may be estimated or obtained in real time, its realization is unknown in the P 1 since P 1 is formulated and solved from a long-term view. Thus, the expectation E is taken over the vector t , ∀t. Constraint (15b) confines the voltage to an acceptable range. Constraint (15c) tracks the power setpoint reference at the substation p t 0,set given by the transmission-level operator in real time with a permitted tracking error ε. E t ∈ {0, 1} is a binary indicator to switch on the tracking service when it is required. The ETP model and power flow equations are also involved in P 1 , i.e., constraint (15f). However, due to the stochastic fluctuations of thermal parameters and inaccuracy of linearized power flow equations, the solutions to P 1 may cause violations of constraints in practice. In the following, real-time measurements feedback at each time slot will be leveraged in the online solver to reduce the modeling discrepancy and ensure constraints are strictly satisfied.

A. Virtual Queue-Based Reformulation
The online implementation requires solving P 1 at each time slot. To decouple the time-coupling constraints from a long-term time horizon, the technique of virtual queues (see, e.g., [3], [24]) is leveraged to reformulate P 1 .
1) Virtual Queue Definition: By summarizing (8) over time from 1 to T and taking the expectation of each term, we get: Divide both sides of (16) by T and take T → ∞ to get:  (17) is a relaxed version of (15e). If we replace Eq. (15e) in P 1 by (17) and denote the pertinent optimizer as * r , we must have * r ≤ * . To address the relaxed temperature constraints in (17), we can define a virtual queue H t i,a for all a ∈ A i and i ∈ N as: The arrival rate of the queue is the injected temperature while the departure rate is the cooling temperature at time slot t. Following the rate stability theorem [24], we place (17) with: In most existing researches such as [3], the initial value of the virtual queue H t i,a is set to zero because it handles the time-average constraints without any relaxation. Differently, the original constraints must be satisfied at each time slot in P 1 . To avoid the violations of constraints due to the relaxation, a hot start approach of the virtual queues will be illustrated in Section III-C.
2) Lyapunov Optimization: The constraint (19) still hinders the online deployment as it is coupled over a long-term horizon. Consequently, the Lyapunov optimization presented in [24] is leveraged to transfer them to a penalty term attached to the objective function at each time slot based on the observation of the current states.
Let H t i ∈ R A i ×1 collect all the virtual queues defined for a ∈ A i of node i. Then, we define a Lyapunov function 1 2 H t i 2 to measure the size of the queues. Then, the conditional one-slot Lyapunov drift can be defined as follows to measure the expected queue size growth under observation of current state H t i : To satisfy constraints (19), we minimize the Lyapunov drift to push the queues toward a less congested sate. Following the minimizing drift-plus-penalty method in [24], we minimize the weighted sum of the drift and cost at each time slot to obtain * r as follows: where * r denotes time-average value of optimized t r and V is a positive coefficient to achieve a tradeoff between the stability of queues and utility loss.
Lemma 1: The drift-plus-penalty function is upper bounded at each time slot t by: where with ξ +,min Based on Lemma 1, instead of optimizing the drift-pluspenalty function, we will minimize its upper bound alternatively. Following the theorem on opportunistically minimizing an expectation in [24] (see 1.8 in [24]), the policy for the optimization is to observe the current state H t i and then select the minimizer of can be interpreted as an additional utility loss function of the aggregated IACs that carries more temporal knowledge than (10). Hence, we reformulate the utility loss function of each node as: Assumption 1: The utility loss function U t i (z t i ), ∀i, ∀t is σ -strongly convex and L-Lipschitz continuous.
Hereafter, the long-term optimization problem P 1 comprising the time-coupling constraints can be reformulated as a simple real-time problem to be executed at each time slot without reliance on high-complex solvers. The new problem P t 2 , ∀t is given by: where the update of virtual queues H t i,a follows Eq. (18). Let u H,t and u L,t collect dual variables associated with constraints (25b) and (25c), respectively, while λ H,t and λ L,t be dual variables for constraint (25d) and (25e), respectively. Note that all the dual variables are non-negative.
For notational simplicity, we denote the objective function being all the decision variables. Due to the strong convexity of U t i , the next result follows naturally.
Further, because g t (z t ) is a set of linear constraints, the Jacobian of g t (z t ) is bounded by a positive constant σ g over the feasible set of z t , such that ||∇g t (z t )|| F ≤ σ g . Notice that σ g can be characterized according the parameter matrices R, X, M, and N.
Theorem 1: The difference between time-average value of t, * l denoted as * l , and the optimizer of P 1 , i.e., * is bounded, such that * l ≤ * + B V . Proof: See Appendix B. Remark 4: P t 2 provides a time decouple reformulation within O( 1 V ) of the optimal results of original P 1 together with O(V) tradeoff in the time-coupling constraints. A large V can decrease the optimality gap but also bring about constraint's violation. In Section III-C, the upper limit for V that ensures the constraints is demonstrated.

B. Online Distributed Dual Ascent Algorithm
To design an online distributed solver for P t 2 , we consider its regularized Lagrangian function defined as follows: where T collects all the dual variables, and ϕ 2 d t 2 is a regularization term to ensure the strong-concavity of the dual function with a predefined parameter ϕ > 0. Such regulation is widely used in OCO, such as [10]- [12]. The bounded gap between the saddle point of the regularized Lagrangian function and the original one can be found in [25].
We next propose a dual ascent algorithm to find the saddle point of (26). To that end, consider the following dual problem: where D t (d t ) is the dual function calculated from: Assumption 2: P t 2 is strictly feasible for ∀t, i.e., it satisfies the Slater's condition.
The strong duality of P t 2 holds based on Assumption 2 [26]. Thus, if d t, * is the solver to (27), z t, * = argmin D t (d t, * ) is the optimal solution to P t 2 . We continue to investigate how to solve the problem in a distributed manner based on incentives. Given the optimal dual variables, the primal problem can be divided and equivalently solved through N subproblems. Particularly, each subproblem only requires the local information and a couple of coordination signals composed of the dual variables. The local subproblem denoted as P t L,i , ∀i, ∀t is given by: In P t L,i , α t, * i and β t, * i are the coordination signals with the vector forms being α t, * and β t, * , respectively. By [12], we design the signals as follows: Note that according to [12,Th. 2], the design of the coordination signals ensures an exact distributed reformulation, i.e., the optimal solutions of N subproblems P t L,i coincide with the centralized optimization results of P t 2 . In practice, the coordination signals α t, * i and β t, * i work as monetary incentives related to the active and reactive power injections for node i. The incentive signals comprise two components, the first being the price for voltage regulation, and the second used to induce customers to regulate the DERs for a power tracking service. Both two terms are characterized by the structure of distribution network through matrices R, X, M, and N, together with dual variables associated with voltage and power tracking constraints. Given Eq. (30), Eq. (29) presents a well-define local welfare maximization problem which minimizes the utility loss while concurrently maximizing the rewards.
We [P4] ADN operator measures the node voltage and active power at the substation, i.e., v m,t and p m,t 0 , respectively. [P5] ADN operator updates dual variables according to (31a)-(31d).
According to our setting, P t 2 is required to be online implemented. Concurrently, to reduce the fluctuations of incentives, a first-order filter is also applied in this algorithm. The update rules of dual variables and inventive signals are given by: To recap, the proposed LOOD algorithm is illustrated as In our algorithm, [P1] and [P4] to [P6] are processed by the ADN operator based on measurements and the received setpoint reference from transmission-level operators.
[P2] and [P3] are solved by customers locally depending on the private information and incentive signals. It is hard to get the optimal incentive signals in time varying conditions since the optimal dual variables can be revealed only after various iterations between the ADN operator and customers. However, the environments and customers' responses have changed during the multiple rounds' bargains. Alternatively, based on the OCO framework, we update the dual variables at time slot t+1 relying on the last round dual variables and current measurements. Convergence and optimality gaps of this online algorithm will be characterized analytically in Section IV.
Remark 5: In this algorithm, the actual incentive signal is tuned by a first-order filter, as shown in (31g) and (31h). The motivation of this filter is to smooth the incentive signal. In practice, fast fluctuations of the monetary incentive are not user-friendly. More importantly, the fluctuations of incentive signals will be reflected in the node power and voltage finally.
As the violations are essential to power systems' stability [27], we smoothen the incentive signals to reduce the violation of node power and voltage accordingly. We also characterize the discrepancy on the solver of the optimization problems after filtering the incentive signals.
Note that we utilize measurements feedback which follows the accurate AC power flow model in the update of dual variables. Thus, we have the following assumption to bound the discrepancy between the linearization power flow model and the actual measurements.
Assumption3: There exists a positive constant e such that: where g m,t (z t ) denotes the states of functional constraints calculated by AC power flow model. As the linearization approximation is accurate in normal conditions, the bias characterized by e is usually small.

C. Solving Local Problems
In the LOOD algorithm, customers need to solve P t L,i locally. Although the local problem may aggregate several devices and their pertinent variables, they can be decoupled, i.e., P t L,i can be rewritten as the summation of several subproblems that comprise disjoint components of the variable z t . Therefore, devices can compute their own decisions independent of each other.
For the PV inverters, the setpoints of active and reactive power generation at time slot t depend on some predefined parameters, received incentives, and the current available power output that can be observed. A closed-form solution of PV inverter to P t L,i (see Eq. (29) with Z PV,t i defined by Eq. (3)) can be given by: In many recent works, such as [7], [10], [11], and [12], P av,t i is revealed at the beginning of time slot t, and assumed to hold the value until t + 1 given that t is short enough. In fact, the estimated P av,t i is generally accurate due to the fine-grained duration of control.
Next, we give a closed-form solution for each IAC to P t L,i (see Eq. (29) with Z AC,t i defined by Eq. (5)) at time slot t as follows: where i,a = k i,a W i,a (1 − η i,a ). To reduce the discrepancy from the stochastic fluctuations of the thermal parameters, the measurement of the indoor temperature at the previous round, i.e., T m,t−1 i,a is used to replace the ETP model when we compute the current setpoint.
Remark 6: Neither the incentive signals update (Eq. (31)) or local decision-making (Eq. (33) and Eq. (34)) relies on the predicted information, which makes the proposed method indifferent to forecast errors that is essential for most real-time controller like MPC, and considerably reduces the computation burden for both ADN operator and local HEMS. The boundaries of h i,a and V are given by: , and β min i = min(β t i ) for all t ∈ T . We assume these boundaries of incentives can be estimated based on the history data.
The proof of Lemma 3 is that if T t−1 i,a ≥ T H i,a , a solver s t i,a that makes sure ξ +,t i,a − ξ −,t i,a ≤ 0 will be obtained based on (34). Similarly, if T t−1 i,a ≤ T L i,a , we will have ξ +,t i,a − ξ −,t i,a ≥ 0. To recap, the generated solver for the IAC will cool the room if the indoor temperature is going to exceed the upper limit, while stop cooling the room if the indoor temperature outrides the lower limit.
Remark 7: Based on Lemma 3, we only need to estimate the upper and lower limits of incentives. Thus, the accuracy requirements are relatively low. In practice, the ADN operator can also confine the incentives in a predefined range beforehand like many current demand response programs, which better advocates the estimation.

D. Summary of LOOD
To summarize the developed algorithmic framework, a flowchart of LOOD is outlined in Fig. 2.
The dashed box in Fig. 2 represents the process of measurement feedback. In practice, once the setpoints of DERs are implemented, indoor temperatures, node voltages, and power at the substation will change following the dynamics of the underlying physical system. Although we use simplified models to formulate the problem, the incentive signals and setpoints of devices are updated based on the filed measurements, which makes the proposed method robust to modeling errors. As illustrated in Fig. 2, we can use the nonlinear AC power flow equations and ETP model with time-varying thermal parameters to mimic measurements in the case study of Section V.

Remark 8 (Communication Failure):
The proposed method is robust to communication delays or failures to some extends. As discussed in [7] and [28], if the coordination signals are lost temporarily, customers can compute the setpoints of DERs based on the previous signals. The system can still track the optimizers in the long term. However, if the communication system is permanently down, the proposed algorithm that relies on central coordination may suffer from sub-optimal solutions. Such topics are beyond the scope of the work and our ongoing efforts are developing local controllers for better control under such scenarios.

IV. PERFORMANCE ANALYSIS
In this section we analytically characterize the performance of the LOOD method. We first introduce the following useful lemmas.
Lemma 4: Under Assumption 1, the inverse function of ∇U t i denoted by (∇U t i ) −1 exists and is 1 σ -Lipschitz continuous. Proof: See Appendix C. Lemma 5: The dual function, i.e., D t (·), has an L D -Lipschitz continuous gradient, where L D = σ 2 g /σ u + ϕ, based on Lemma 2 and the bounded Jacobian of g t (z t ). Furthermore, The proof can be found in Lemma 2 of [17]. Following Assumption 2, the dual problem is feasible due to the strong duality. Therefore, the difference between two consecutive dual optimums will always be bounded by a positive constant d , such that d t, * − d t+1, * ≤ d . Then, we can characterize the convergence of the dual variables.
Theorem 2 (Convergence of Dual Variables): If the step size in LOOD is chosen according to 0 < δ ≤ 1 σ D +L D , the discrepancy between optimal dual variables of P t 2 and the dual variables generated by LOOD are bounded by: where = d +δe 1−κ and κ = 1 − 2δσ D L D σ D +L D . Proof: See Appendix D. Theorem 2 characterizes a bounded gap between the optimal dual variables of P t 2 and the dual variables solved by LOOD. It also indicates ways to reduce the gap to improve the performance: 1) The deviation of optimal dual variables between consecutive time slots, i.e., d , which captures the underlying dynamics of the ADN due to the time-varying system profiles, such as the inflexible load, reference setpoint at the substation, and variable parameters in utility loss functions. Therefore, choosing small t, i.e., increasing control frequency, improves the performance; 2) The error e caused by power flow linearization motivates us to use accurate linearization method, e.g., real-time linearization method based on current operational points, i.e., the measurement feedback.
Moreover, when P t 2 degenerates to a time-invariant problem such that d = 0 and e = 0, the online algorithm can track the optimal solutions accurately as = 0 in this case.
It is worth noticing that the upper bounds presented in this paper represent the worst-case since they are obtained conservatively.

Corollary 1 (Convergence of the Primal Variables):
The discrepancy between optimal solutions of P t 2 and the solvers generated by LOOD are bounded by: Corollary 1 characterizes the convergence of the primal variables based on Theorem 2. The upper bound Y i has two components: The first term comes from the fixed gap of convergence of dual variables, i.e., . The second term is due to the presence of the first-order filter in the incentive function. A large filter coefficient φ will flatten the fluctuations of incentive signals but augment the fixed gap. These two terms are both characterized by L c (1+A i ) σ that captures the structure of incentive and utility functions. For example, when the utility loss functions of customers are more convex, i.e., the σ is larger, the upper bound Y i will be tighter.

Theorem 3 (Main Results):
If LOOD is used to solve the original problem P 1 , the difference between the LOOD-based time-average optimizer, denoted as LOOD , and the original optimum * is upper bounded as : The upper bound in Theorem 3 comprises two terms. The first one is caused by the time-coupling relaxation, i.e., the reformulation from P 1 to P t 2 . As characterized in Lemma 1, B depends on the temperature dynamics of IACs. Even though this gap is unavoidable, we can choose a proper coefficient V to reduce the deviation. The second term comes from using the distributed online dual ascent algorithm when we solve the time-varying P t 2 . As discussed in the convergence of dual variables and primal variables, i.e., Theorem 2 and Corollary 1, the magnitude of NLY relies on the underlying dynamics of the ADN, settings of utility loss functions, and the number of nodes. Since the second term is linear on N, the gap on average of numbers of nodes will not increase with the augments of nodes. We reiterate that the optimality gap B V +NLY represents a worst-case, while the magnitude of the optimality gap in real-world must be smaller. Section V will corroborate the upper bound numerically to verify the guarantees of practical performance.
V. CASE STUDY 1) Simulation Setup: We run our method from 8:00 to 19:00, while it is divided into 660 time slots, i.e., t = 1min. The IEEE 33-node test feeder [29] is unchanged topologically but modified through adding PV inverters and IACs. It is assumed that PV systems with a 500kVA rating inverter are located at node 2, 3, 10, 12, 16, and 18. The PV systems with a 750kVA rating inverter are connected to node 5, 6,7,8,9,20,24,26,29,30, and 32. The available active power generation profiles of these PV are obtained from Pecanstreet [30], [31]. We set c p i = 3 and c q i = 2 in the utility loss functions for node i ∈ K. The inelastic load profile comes from Open Energy Information [32]. All the data are pretreated to have a guaranty of 1 min sampling rate. As for the IACs, we assume that nodes 2,9,10,12,14,15, and 30 are connected by 300 IACs, while nodes 3, 6, 7, 17, 21, 25, 28, 31 and 32 are installed with 500 IACs. The maximal operating power of IAC is selected in the range of [500, 800]W with the power factor cosθ i,a = 0.95. The minimal power is set as 10% of the maximal power. The equivalent thermal capacity of the environment is selected in the range of [2000, 3000]J/ • C, The equivalent heat rate is selected in a range of [0.05, 0.08] • C/W. The ambient temperature is simulated by a function [33], given by: where T amb = 4 • C and T min amb = 35 • C. The temperature setpoint is arbitrarily selected from {23, 24, 25} • C with a bandwidth T = 2 • C. Without loss of generality, we set k i,a = 1.2, f i,a = 0 and c AC i,a = 10 −5 . As for our algorithm, we set the step size to 0.1. Some predefine parameters are set as V = 0.9V max , ϕ = 10 −4 , and h i,a = (h max i,a + h min i,a )/2. To mimic measurements, nonlinear AC power flow equations are solved by MATPOWER 7.0 [34], from which the node voltages and power at the substation are measured. In addition, thermal parameters C i,a and W i,a will fluctuate stochastically at each time slot in a range of [0.98, 1.02] relative to the predefined values. Indoor temperatures generated by the ETP model with time-varying parameters are used as measurements. Thus, voltages, substation power, and indoor temperatures demonstrated below are all measured values from non-linear systems.

A. Benchmarks
We use six different strategies to compare with the proposed method.
1) Strategy 1 (S1) operates the ADN without any control. PV systems maintain the maximal active power output and IACs operate based on the gap between the current indoor temperature and the temperature setpoint to maintain a comfortable indoor temperature, given by: where k d i,a is the droop coefficient and s b i,a is the base operating power of a given IAC.
2) Strategy 2 (S2) operates the ADN based on a modified droop control scheme. In S2, PV inverters use a linear Q-V droop control scheme to decide their VAR outputs with a slope coefficient. Note that the value of slope may significantly impact the voltage regulation performance. To strictly testify our proposed method, the slope coefficient is manually adjusted to well perform in this case study. When there is no tracking requirement at the substation, i.e., E t = 0, PV inverters keep the maximal power outputs in their feasible regions and IACs operate according to (40). In the presence of power tracking requirements, i.e., E t = 1, the active power at each node connected with PV inverters or IACs is regulated with a predefined droop coefficient γ : In practice, a proper γ is crucial to the performance of power tracking. However, it is rarely practical to get an optimal γ . Multiple values of γ will be deployed in the following case studies for a clear comparison.
3) Strategy 3 (S3) is based on the greedy optimization, as shown in (42). The main discrepancy between S3 and our proposed method is that we consider the temperature constraints for IACs in a long-term form. However, S3 directly decouples P 1 to T subproblems. The greedy algorithm is shortsighted as it optimizes the cost at each time slot without considering the future. To compare with our proposed method, the greedy optimization model is run in an online distributed form.
4) Strategy 4 (S4) uses the online distributed MPC-based algorithm developed in [17] to optimize the ADN. Similar to LOOD, S4 is a look-ahead method while the performance of S4 relies on the scale of predicted window size. Different settings of the window size will be tested. 5) Strategy 5 (S5) runs the online distributed optimization without considering IACs. Alternatively, IACs operate according their own policy, i.e., (40). This benchmark is used to verify the performance of integrating IACs into the ADN coordination. 6) Except for the above five strategies, we also consider solving the optimization problems in a centralized manner to provide benchmarks for the validation of the performance analysis in Section IV. Even though the solution to P 1 represents the theoretic optimal result, it is hard to obtain it directly due to the curse of dimensionality. Alternatively, we can use the MPC-based method with a full window size (T) to derive the solution that is very close to the optimum * of P 1 since the window size is large enough to sight the entire time horizon. Also, P t 2 is solved for all t to get the optimizer * l .

Remark 9 (Future Prediction):
To strictly testify the proposed method, we assume that the future prediction including the reference setpoint at the substation and inelastic loads used in MPC approach (S4) is accurate. On the other hand, it is worth emphasizing that our proposed algorithm only relies on information that is available at present and does not need any future prediction.

1) Voltages and Power at the Substation:
First of all, we test the node voltage under the control of S1, S2, and the proposed LOOD method. We illustrate the voltage magnitude at node 10, 12, and 15 in Fig. 3. Following S1, voltage magnitudes between 10:00 -16:00 exceed the upper limit due to lacking any control. S2 is effective for maintaining voltage security to some extent, but one oscillation of voltage occurs at 12:00. In practice, the selection of γ impacts the performance of S2 obviously. We set γ = 0.03 in this test since it performs relatively better than any other settings. Too large coefficients could bring about oscillations and even overshoots, while too small ones cannot effectively and quickly regulate both the voltage and power. To get an optimal coefficient usually calls for the global and detailed information of these customer-owned devices. In contrast, the proposed LOOD method outperforms both S1 and S2 as the voltage magnitudes maintain in an acceptable and relatively smooth region. Notably, the flat voltage profiles near to the upper limit are obtained before the tracking requirements come (before 12:00) since it is economically efficient. In the presence of tracking requirements (after 12:00), the curtailment of PV occurs to support the tracking performance. It can be seen that the voltages decrease accordingly. Fig. 4 shows the active power at the substation. To test the power tracking ability of these strategies, a reference power setpoint from 12:00 to 19:00 is applied, including a sudden increasing (12:00 -13:00), fast ramping up or down (13:00 -18:00) and keeping flat output (12:00 -13:00, 18:00 -19:00). From 12:00 to 19:00, LOOD can guarantee an effective power  tracking except for a disturbance that is caused by the sudden change of available PV outputs. In practice, it is hard to reveal this disturbance beforehand [35], and the sudden change will not cause a huge impact to the whole system. Conversely, the tracking ability of S2 depends on a proper γ . As illustrated in Fig. 4, when γ = 0.03, the tracking performs well. However, undershoot (γ = 0.003), oscillation (γ = 0.06), and overshoot (γ = 0.09) occur if γ is not well set. Because S3 -S6 are variants of LOOD, they perform well on the voltage security and power tracking.
2) Social Utility Loss: As for minimizing the utility loss, we compare the performances of LOOD and several benchmarks, as shown in Fig. 5. Intuitively, S2 brings about the largest utility loss due to the absence of consideration of optimality. S3 is a variant of our method, thus it also well assures the voltage constraints and power tracking requirement. However, the utility loss caused by S3 is obviously larger than the time-average utility loss caused by LOOD. Although S3 takes into account the flexibility of IAC, the greedy optimization is shortsighted and potential to employ the flexibility of IACs excessively. When the tracking signal keeps for a long time, IACs cannot respond to it sustainably. Compared with S3, the online MPC-based method (S4) has a better performance on minimizing the utility loss. Note that the performance of S4 relies on the selection of predicted window size. For example, when the window size is set as 5min, the utility loss of S4 is superior to that of S3 but still larger than LOOD. If the window size augments to  60min, the curve of utility loss caused by S4 almost coincides with that of LOOD, see the green and red curves in Fig. 5. In practice, a larger window size may provide a better control performance by considering the future indoor temperature of a longer time horizon. However, the increasing computational complexity and forecast requirements make the local problem extremely complicated for the HEMS. Unlike the MPC-based method, the proposed algorithm only needs algebraic operations based on current measurements without forecast, which can ensure the practical employment on the HEMS with limited computational ability.
Then, we demonstrate the magnitudes of the optimality gaps between the optimizer of LOOD and the original problem P 1 along with its time-coupling relaxed variant P t 2 . We first mark the optimal social utility loss of P 1 and P t 2 in Fig. 6, i.e., the point O 1 and O 2 . The social utility loss caused by LOOD is marked as O 3 . It is observed that * < * l < LOOD , which corresponds to the optimality gaps caused by the time decoupled reformulation (P 1 to P t 2 , ∀t) and online distributed dual ascent recursion (P t 2 to LOOD), respectively. Notably, the actual optimality gap, i.e., length of |O 1 O 3 |, accounts for 10.9% of the magnitude of the ideal optimum * , which means the proposed method achieves about 90% near-optimality of the original problem P 1 . Then, according to the upper bound of LOOD characterized in Theorem 3, we can estimate the magnitude of B V + NYL and locate the point O 4 . The analyzed upper bound represents the worstcase of the social utility loss caused by the proposed method. Since the result in Theorem 3 is conservative, the distance between the actual LOOD and its upper bound, i.e., the length of |O 3 O 4 |, is large enough, which guarantees the practical performance on minimizing the social utility loss of the proposed algorithm. S6 is another variant of LOOD, where IACs are not coordinated but operate according their own local controllers. As illustrated in Fig. 7, LOOD can reduce the curtailment and reactive power absorbing of PV inverters apparently than S6. The reason is that the voltage regulations and power tracking only rely on PV inverters in S6, while the flexibilities of numerous IACs are unlocked in LOOD. If we increase the comfortable temperature bandwidth ( T) and IACs can get more flexibility accordingly, the performance of LOOD can be enhanced. In practice, residents cannot accept too large temperature bandwidths, so we need to control the tradeoff in the practical deployment. 3

) Dynamics of Individual DERs and Incentive Signals:
We zoom in to examine the dynamics of individual DERs. The power curtailment and reactive power output with respect to the pertinent incentive signals are demonstrated in Fig. 8 and Fig. 9, respectively. In response to the negative incentive signals, positive curtailment and negative reactive power output can be observed. Even though the locations of nodes impact the incentives according to their update rules, the curves of incentive signals α t for node 5 almost coincides with that of node 32 during 14:00 − 19:00. Actually, incentives are composed by regulation requirements of both voltage and power. When the voltage regulations are dominant in the formulation of incentives (before 14:00), the divergency of incentives between nodes is obvious. Differently, if the incentives mainly come from the substation power regulation (14:00 − 19:00), the incentives for all nodes tend to be consistent since the matrices M and N both have similar entries, respectively. Fig. 10 shows the dynamics of indoor temperature at node 2 (300 IACs), node 7 (500 IACs), and node 32 (500 IACs). Although the temperature constraints are relaxed to a timeaverage form, it can obey the original strict temperature box constraints as we set the coefficients V and h i,a according to the conducted ranges. Intuitively, the temperature dynamics match the regulation requirements. For instance, IACs will raise the operating power to cool the rooms from 12:00∼14:00, where the setpoint reference at substation requires the ADN to increase the actual power consumption. Also, heterogeneous temperature setpoints will not affect the performance of the proposed method.
To further examine the impact of the selection of V on the indoor temperature, we define a metric to assess the violations   of the temperature constraints, given by: The metric Vio i,a characterizes the violations of the upper and lower temperature limits for an individual IAC. To evaluate the overall performance, the average violations over all the IACs are denoted by Vio, given by: Then, we select distinct value of V to examine the violations, demonstrated in Table II. It can be observed that the violations can be well avoided if we select a proper V according to the characterized bounds in Eq. (35c). However, if the selection of V exceeds the given V max , the actual indoor temperature obviously violates the satisfied limits no matter how the acceptable bandwidth of the temperature is set. Therefore, the performance of the proposed method can be ensured based on the analytical characterization of the pertinent coefficients.

4) Performance of the Filter of Incentive Signals:
We further discuss the power fluctuation at the substation under different φ. We define: to quantify the so-termed power fluctuation from time slot t 1 to t 2 . We divide the day into 7 fragments according to the power tracking signals. As shown in Table III, the value of ϑ is obviously smaller when φ = 0.1 than φ = 0 because incentive signals are smoothed by a filter and the DERs will not respond to the signals too drastically. When we increase φ to 0.2 and 0.4, ϑ continues to diminish since the enhanced filtering performances. Nevertheless, Corollary 1 has illustrated that φ will bring a fix discrepancy between actual solvers and optimal solvers. Thus, a relatively small φ can not only help smooth the power at the substation and also ensure the economic efficiency of the algorithm. 5) Computational Time: Finally, the advantage of computation is corroborated. We run the algorithms with MATLAB on one PC with i5 CPU (2.3GHz) and 8GB RAM. Total simulation time is counted to assess the overall consumptions. To mimic the performance of parallel computation, the total simulation time is divided by the nodes and time horizon, i.e, N × T, and the average time is obtained. Note that the average time can represent the actual time request for each HEMS for computing setpoints of governed DERs.
Firstly, we increase the scale of the test systems with more nodes and DERs. As shown in Fig. 11, the total simulation time raises up, but the average time will not be influenced by the increasing of nodes. For example, nearly 30,000 devices are coordinated in the 123-node test feeder, while only 92ms is required for the node at each time slot. Owing to the presence of distributed algebraic solvers, the proposed LOOD is pragmatic to large-scale ADN with numerous DERs.
However, the MPC-based method is more vulnerable to the computation time. Considering the 33-node test feeder, the computation time under different settings of window size is illustrated in Fig. 12. It is observed that both simulation and average time augments apparently if a larger window size is used. Based on the results in Fig. 5, to compete with LOOD on the performance of utility loss, each node needs to spend 10 times longer if S4 is applied as the window size is supposed to be 60min. In addition, more efforts have to be paid on the forecast if a larger window size is used.

VI. CONCLUSION AND FUTURE WORKS
This paper proposes an online distributed optimization algorithmic framework for ADNs to track a setpoint reference at the substation while concurrently minimizing the utility loss and assuring the security of voltages. Unlike most existing optimization methods for ADNs, the proposed LOOD algorithm can generate the setpoints for PV inverters and IACs immediately only relying on current measurements and environment conditions. In particular, the time-coupling constraints for IACs are decoupled in an online optimization framework by the Lyapunov optimization technique. Outperform most of the state-of-the-art methods for coordinating the networked DERs, the proposed algorithm has several advantageous features in the practical deployment, including: 1) Computationally affordable. The developed approach uses a distributed algebraic update to compute the next round decisions for both the ADN operator and customers, see Eq. (31), (33), and (34). This thus makes the approach scalable to large networks with numerous DERs.
2) Economically efficient. Due to the well relaxation of the time-coupling constraints, the proposed method achieves a near optimality of the ideal global optimizer, while getting a more economic result than other online solvers such as the greedy optimization and MPC-based algorithm.
3) Prediction-free. The proposed method does not rely on the future prediction that is essential for most existing methodologies including MPC, while the proposed algorithm still achieves results that are very close to the optimum. 4) Customer-oriented. To coordinate customer-owned assets, a proper incentive scheme is developed in lieu of the direct control. The local problem solved by each customer is a standard welfare maximization problem. Also, the incentive generator considers a first-order filter to alleviate high fluctuations of incentive signals and corresponding responses of customers.
The numerical test uses six different online strategies to be the benchmarks, while the results corroborate that the proposed method can save significant computational complexity and avoid the prediction of future information that is essential for other online controllers like MPC. Also, the proposed LOOD is analytically characterized and numerically tested to achieve a near optimality of the theoretical solution, for example 90% in the case study.
In this work, the ADNs are assumed to be three-phase balances. Thus, future work will extend our algorithm to a three-phase unbalanced case. Besides, the parameters of IACs model are presumed to be estimated beforehand and time-invariant in the operation. An online estimator-embedded optimization framework for networked DERs will be further studied. In addition, ongoing efforts are developing controllers for networked DERs when the communication system is partially or totally down.

APPENDIX A PROOF OF LEMMA 1
According to the definition of the virtual queue and Lyapunov drift function, we have: First of all, we recall the definition of the utility loss function U t i : R A i +2 → R 1 . The gradient function of U t i is denoted by ∇U t i : R A i +2 → R A i +2 . Then, we denote the inverse function of ∇U t i by (∇U t i ) −1 : R A i +2 → R A i +2 . Assume x 1 and x 2 are defined in the feasible set of the function ∇U t i . Let y 1 = ∇U t i (x 1 ) and y 2 = ∇U t i (x 2 ), we have x 1 = (∇U t i ) −1 (y 1 ) and x 2 = (∇U t i ) −1 (y 2 ). Since U t i is σ -strongly convex (see Assumption 1), we have: By substituting y 1 and y 2 into (47), we will get: Hence, So, (∇U t i ) −1 is 1 σ -Lipschitz continuous.

APPENDIX D PROOF OF THEOREM 2
Before proving the Theorem 2, we have the following reasoning: where ∇D m,t (d t ) = g m,t (z t ) − ϕd t is the measurementenabled gradient that is used in LOOD. The equality (a) comes from the dual variables update policy; (b) is due to the non-expansiveness property of the projection operator; (c) considers Assumption 3. For inequality (d), we consider the fact that: Eq. (51) holds because D t (·) is σ D -strongly concave and its gradient function is L D -Lipschitz continuous (see Lemma 5).
The equality (a) is according to the optimal condition of the local problem P t L,i ; (b) comes from the non-expansiveness property of the projection operator and (c) considers Lemma 4; (e) uses the results of Eq. (54). The resultant two terms also comprise a transient term that will vanish when t → ∞. Therefore, with the fixed term denoted by Y i , we can derive Corollary 1.

APPENDIX F PROOF OF THEOREM 3
First of all, we can characterize the difference between the two optimizers generated by LOOD and P t 2 at each time slot: Thus, where the equality (a) is based on the definition of timeaverage of a variable; inequality (b) and (c) come from Eq.