Data-driven risk-based scheduling of energy communities participating in day-ahead and real-time electricity markets

—This paper presents new risk-based constraints for the participation of an energy community in day-ahead and real-time energy markets. Forming communities offers indeed an effective way to manage the risk of the overall portfolio by pooling individual resources and associated uncertainties. However, the diversity of ﬂexible resources and the related user-speciﬁc comfort constraints make it difﬁcult to properly represent ﬂexibility requirements and to monetize constraint violations. To address these issues, we propose a new risk-aware probabilistic enforcement of ﬂexibility constraints using the conditional-value-at-risk (CVaR). Next, an extended version of the model is introduced to mitigate the distributional ambiguity faced by the community manager when new sites with limited information are embedded in the portfolio. This is achieved by deﬁning the worst-case CVaR based-constraint (WCVaR-BC) that differentiates the CVaR value among different sub-clusters of clients. Both reformulations are linear, thus allowing to tackle large-scale stochastic problems. The proposed risk-based constraints are then trained and evaluated on real data collected from several industrial sites. Our ﬁndings indicate that using the WCVaR-BC leads to systematically higher out-of-sample reliability, while decreasing the exposure to extreme outcomes.

Real-time forecast error at time t, in scenario θ η ch ,η dch Charging and discharging efficiency (-) λ DA t Day-ahead energy market price at time t λ RT t,θ Real-time imbalance price at time t, scenario θ E Energy capacity of the ESS (MWh) P Power capacity of the ESS (MW) a DA , a DA Upper and lower limit of the community's day-ahead market position a RT , a RT Upper and lower limit of the community's real-time market position d DA t Day-ahead demand of the community at time t OPEX DA Operational expenditure of the distributed storage Sets and indices π ∈ Π Load forecast error scenarios θ ∈ Θ Imbalance price scenarios d ∈ D Number of simulated days in the out-of-sample tests k ∈ K Clusters of load deviation scenarios n ∈ N Number of scenarios within cluster k t ∈ T Time steps Variables ∆ a RT t,θ Real-time imbalance bid of the community at time t, in scenario θ ∆ ch RT t,θ,π , ∆ dch RT t,θ,π Storage charging and discharging realtime deviations at time t, in scenario θ, π a DA t Day-ahead market bid of the community at time t ch DA t , dch DA t Charging and discharging of the storage scheduled in the day-ahead stage for time t SoC t, θ, π Storage state-of-charge at time t, in scenario θ, π

I. INTRODUCTION
Energy communities (ECs) can play a pivotal role to promote end-users empowerment, which may serve diverse benefits in future's renewable-dominated power systems. In particular, ECs are envisioned to enhance the integration of distributed energy resources (DERs) into existing electricity markets, while better incentivizing consumers and prosumers to unlock their flexibility potential [1]. Given these perspectives, and e.g., corresponding EU legislation [2], several countries have already implemented legal schemes enabling the formation of such communities.
In this paper, we focus on energy communities formed by several industrial sites who's market participation is coordinated by the energy community manager. The joint coordination between sites may allow to directly access wholesale energy markets. Although this can result in significant cost savings, sophisticated day-ahead scheduling strategies must be developed to hedge the risk associated with uncertain market prices and real-time (RT) mismatches between forecasted and actual demands.
Depending on the relation among the different stakeholders, the resulting game-theoretic problem can take different forms. First, in case of perfect cooperation wherein each actor agrees about the probability distribution of uncertainties, a centralized optimization can be implemented to maximize the social welfare of the community. Second, if privacy issues arise and each agent maximizes its own welfare, the problem is a Nash-game that can be formulated as an equilibrium model. An intermediate solution consists in complementing a centralized optimization (performed by an community manager that efficiently manages the community-level uncertainties) with a Nash-bargaining game to capture interactions among the manager and the different end users [3], [4]. The solution of the centralized optimization problems and equilibrium problems coincide under the assumption of perfect competition and complete information [5]. In this paper, a central optimization model is formulated to identify the optimal scheduling of the community of industrial sites. Without loss of generality, the different sites are individually represented as the combination of a battery storage system (BSS), capturing the on-site flexible assets, with a given inflexible residual load reflecting the non-shiftable energy exchanges (Fig. 1). Interestingly, authors of [6] argue that a set of thermostatically controlled loads (TCLs) can be well captured via a BSS model with stochastic energy bounds for demand response purposes. Modeling these optimal bounds is a complex task, which is subject to both DER and market level uncertainties. The true cost of the associated violations, e.g., thermal comfort violations or undesired state-of-charge levels of electric vehicles, is usually unknown and time-varying. For both these reasons, it may be highly beneficial to model these energy constraints in a probabilistic framework that do not rely on strong assumptions on the underlying cost structures. A first solution consists in enforcing the energy constraints for all possible scenarios, but it is likely to result in very conservative decisions [7]. Alternatively, ensuring that the constraints on the state-of-charge are respected in expectation over the uncertainty set [8] can lead to overly optimistic strategies [9]. Indeed, this method does not guarantee the feasibility of the scheduling solution [10]. Another approach thus imposes that the energy bounds are satisfied with a certain probability P ≤ 1, resulting in chance-constraints (CCs) [11], used, e.g., in [12], [13]. Such probabilistic constraints allow for infinitely large in-sample violations. Moreover, the feasible region defined by CCs is typically non-convex, such that they are often approximated with a large number of scenarios, which requires the addition of binary variables to indicate which scenarios are violated, although computationally more efficient approximations exist [14], [15].
In this paper, we rely on an alternative convex (conservative) approximation of CCs, which is called the conditional-valueat-risk based constraint (CVaR-BC) [16]. This formulation, which is for example used in finance for constraining the maximum risk of a portfolio selection problem [17], [18], enables to consider both qualitative and quantitative limitations in the multi-horizon scheduling in electricity markets. Interestingly, CVaR is a coherent risk measure [19], and thus can be recast as a linear programming problem when the corresponding function is linearly dependent on the decision variables [20].
Both CC and the presented CVaR-BC suffer from the caveat that if the forecast error distribution of a newly accommodated site in the community differs significantly from the ones of the known cluster, i.e., the community manager faces distributional ambiguity, the resulting model is likely to exhibit poor out-of-sample performance, which would result in undesired scheduling violations. To overcome the issue of over-fitting the model to the limited information of the portfolio [21], distributionally robust optimization (DRO) provides performance guarantees when the uncertainty distribution is not perfectly known [22], [23]. However, in our multi-step scheduling problem, DRO requires the joint consideration of distributionally robust chance-constraints (DROCC) [22] by explicitly capturing the temporal dependencies of the underlying time-dependent distributions, which is challenging to implement in a computationally efficient way [24]. In this paper, a new data-driven solution strategy is thus proposed. In a prepossessing step, the different clients are clustered into subclusters, and a single representative distribution (aggregating the individual forecasts) is constructed for each sub-cluster. Based on this information, we implement a worst-case CVaR based constraint (WCVaR-BC) to enforce the CVaR-BC in the worst scenario set. Such an approach offers large robustness against the limited information of newly added sites within the aggregated portfolio. A significant advantage is that WCVaR-BC remains a linear function for box discrete distributions (Section III), which makes the formulation applicable for large scale problems. The main contributions of this paper are threefold.
1) We formulate a CVaR-based probabilistic constraint to enforce probabilistic guarantees of respecting uncertain comfort bounds within an energy community participating in both day-ahead and real-time energy markets. This allows relaxing the strict enforcement of the stateof-charge bounds by a linear, hence computationally efficient, formulation. This limits both the probability and the severity of the violations, which is highly beneficial when it is difficult to explicitly quantify violation costs [20], [25]- [27]. 2) We enrich the formulation with an ambiguous representation of uncertainties, informed by an ex-ante clustering step, which is embedded within a new worst-case CVaR based constraint (WCVaR-BC) on the energy bounds of local flexibility. This gives rise to a large-scale time-series scenario-driven optimization model, which is particularly advantageous for the community's management when new clients are integrated in the portfolio. 3) First in a stylized example, we show the fundamental differences between the proposed CVaR-BC and CC. Then, in a case study, we illustrate that although CVaR-BC leads to better expected out-of-sample performance, WCVaR-BC mitigates the occurrence of large violations. For both of the above probabilistic constraints, we implement an advanced shape-based technique [28] to cluster the different clients of the community based on the similarity of their load profile. The different clusters are then represented by a single forecast distribution, and the joint consideration of these distributions can be seen as the ambiguity set of the community (which can then be fed into the WCVaR-BC). This approach opens up new opportunities to improve the quality of the input data by embedding domain specific knowledge and using advanced data analytics as a prepossessing step. Moreover, the presented risk-based constraints offer a scalable approach for other power systems applications with ambiguous uncertainty sets, particularly suitable when the cost associated with constraint violations, are difficult to be characterized.
Note that all developed models are published at https:// github.com/Dmihaly/risky community, to ease reproducibility and further developments.

II. MODEL FORMULATION
In this Section, the centralized scheduling model for the energy community is introduced. It is assumed that the community has access to both day-ahead (DA) energy and real-time (RT) imbalance markets, as shown in Fig. 1. The exchange with the imbalance market is considered to be limited to avoid the virtual trading of large amounts of electricity. The scheduling is performed by a price-taking energy community manager [3], [29], [30], which is responsible to pool the loads and their flexibility potential, thus mitigating the overall risk of the portfolio. We present a new scalable formulation based on CVaR-based probabilistic constraint to hedge the risk of infeasible schedules. The approach can be extended to form the worst-case CVaR based constraint in case of large ambiguity (e.g., for new clients with limited history). The resulting decision model is formulated as a stochastic scenariobased optimization program.
In our models, the DA market is considered as deterministic, while the imbalance market's outcome is stochastic, represented via a reduced number of time-series scenarios of imbalance price profiles. The price scenarios are obtained by clustering historical outcomes of the Belgian imbalance market. Likewise, the forecast errors associated with the industrial sites' load profiles are characterized by time-series scenarios. In particular, the clients are represented using real measurements from several industrial and commercial sites (collected by Schneider Electric [31]), including energy consumption, local production and corresponding forecast errors. The main focus of this research is on how these forecast errors can be managed using risk-based constraints on the state-of-charge limits. The community manager (CM) optimizes the collective dayahead (here-and-now) and real-time (recourse) decisions X cm = {x DA cm , x RT cm } belonging to the community, and the individual decision variables inheriting from the abstract flexibility model of each industrial site, i.e., DER agent j ∈ J on X der j = {x DA der j , x RT der j }. The collection of DER agents is represented as an aggregated asset (der ). The cost components of the objective function are defined as: Eq. (1) and Eq. (2) denotes the sourcing cost of electricity from the DA energy market and the expected sourcing cost from the RT imbalance markets, respectively. The latter's uncertainty is captured via the scenario set θ. Eq. (3) describes the operational expenditure associated with the activation of the distributed BSS at the day ahead stage, while Eq. (4) defines the same expected cost at the second, real-time stage, whose uncertainty is captured by the set π. The central optimization problem of the energy community (EC) reads as: Eq. (5b)-(5c) enforces that bids of the aggregator are within the predefined limits. This consideration reflects that in practice, the energy community is unlikely to perform virtual bidding with large amounts of electricity, i.e., its bids are limited by the capacity of the available assets. Eq. (5d) is the energy balance constraint of the community. Constraints (5e)-(5j) guarantee that both day-ahead and real-time charging and discharging decisions comply with the power bounds. Eq. (5k)-(5l) track the temporal evolution of the stored energy. Lastly, constraint (5m) ensures that the stored energy is within the technical bounds. These bounds, however, may be highly uncertain and we therefore relax their strict enforcement in Section III.

III. PROBABILISTIC CONSTRAINTS
First, the data pre-processing strategy and its connection to the uncertainty characterization is introduced (in Section III-A). Then, in Section III-B and III-C the CVaR and WC-VaR based constraints are introduced and their properties are discussed qualitatively.

A. Clustering and its connection to the model formulation
The time-series scenarios, capturing the underlying uncertainty of the load were constructed by assembling empirical daily forecast errors from an extensive data set published by Schneider Electric (SE), containing historical electric load profiles of 70 industrial sites with the corresponding rollinghorizon forecasts, inspired by [32].
As several sites are included in the data set with different magnitudes of loads, the relative (to the day-ahead forecast) real-time load deviation, i.e., the relative forecast error, is calculated as: The collection of these forecast errors serves as the input for the scenario-based stochastic optimization, and initialized by the following strategy: 1) The forecast error scenarios are ordered into K clusters based on their similarity/dissimilarity for each site i ∈ I . 2) Assuming each cluster K = {K 1 , ... , K N } captures a particular trend of the uncertain process, i.e., different underlying distributions, N sequences are selected from each cluster, resulting in Π = N · |K | scenarios. In the clustering step, we use a shape-based distance measure proposed in [28], and implemented in [33], to characterize the similarity/dissimilarity between time-series forecast error scenarios. Mentioned in [34], the employed shape-based clustering accounts for the temporal correlation in the data and is less sensitive to scale, noise and time-shifts. The dissimilarity measure used in the clustering may be also utilized to ex-ante control the distance between the K scenario clusters, leading to adjusted robustness in the optimization. In DROCC, allowing for larger dissimilarities in the training set may lead to more robust out-of-sample performance.

B. Comparing CVaR-BC and CC
This Section provides insights into the fundamental differences and similarities between chance constrains (CCs), i.e., value-at-risk (VaR) based constraints and the proposed CVAR-BC. For this purpose we connect the VaR and CVaR functions through a stylized example. A more in-depth analytical comparison of the two functions, and the resulting constraints may be found in [17], [35]. Figure 2a depicts a hypothetical probability distribution, where the confidence level is set to 0.8 for the probabilistic constraint, and 10 scenarios are considered. The height of each column corresponds to the probability of the scenarios. Furthermore, Fig. 2b shows two alternative shapes at the tail of the distribution via differentiated probabilities. Fig. 2 will be used in the following to describe the differences between VaR and CVaR.  To introduce the VaR and CVaR, a convex function f (x, ω) is used, where x ∈ X are the set of decision variables, and ω ∈ Ω composed of π ∈ Π, θ ∈ Θ are the set of stochastic scenarios. The VaR for the upper (1-)-quantile of the joint bivariate distribution (Ω) is formulated as: For lower (1-)-quantile, VaR reads as: VaR in Eq. (7a) is the largest value of η guaranteeing that the probability of having a function value greater than η is lower than or equal to (1 -). Constraining the VaR function as well as using CCs guarantees that in the green scenarios of Fig. 2, defined by an exogenous confidence level (1 -), no in-sample violation will occur. On the other hand, in the violating, orange scenarios, any level of violation is allowed by this formulation. Note that the three different distributions (Ω 1 , Ω 2 , Ω 3 ) are identical through the lens of the VaR function, i.e., CCs or VaR-based constraints are indifferent w.r.t. shape of the tail.
Contrary to Eq. (7a) and Eq. (7b), the CVaR characterizes the mean function value of the instances exceeding the VaR, i.e., the expected value of the scenarios at the tail (orange scenarios). Incorporating the degree of violation of scenarios and their corresponding probabilities, may lead to more restrictive outcomes compered to CCs. Therefore, CVaR-BC may be seen a convex approximation of the CC [36].
Following the formulation developed in [37], [38], the upper and lower CVaR functions are defined as: 0). Elimination of the plus function, as in [38], leads to the following linear programming forms: where δ ∈ R + is an auxiliary variable. The above simplification leads to significant computational advantages when modeling CVaR, compared to VaR or CC, making it even suitable to use sequential (time-series) scenarios as inputs.
Using the above functions, we recast the energy content constraints Eq. (5m) of the BSS as upper or lower conditionalvalue-at-risk constraints.
The fact that CVaR-BC can be modeled by a convex function, implies that it results in continuously increasing/decreasing, monotone in-sample expected objective values and violations. In contrast, VaR (and CC) typically exhibit discrete jumps w.r.t. the confidence level. Such jumps may translate into inconsistent out-of-sample performance and expected outcomes can be significantly sensitive to the chosen confidence level. The above characteristics of the CVaR function make it particularly beneficial over CC in problems where violation costs are hard to be characterized ex-ante, e.g., for thermal discomfort (discussed in [39]). In Table I the different expected violations are calculated, resulting from the alternative tails of Fig. 2a and Fig. 2b. The assumed violations for Scenarios {8, 9, 10} are {1.0, 1.5, 2.0}. It can be noted that in the three different cases the modeler accepts very different levels of expected violations, despite the fixed violation level, which is ignored in the formulation of CCs. On the other hand, in the CVaR-BC this property is well-captured and can be compensated, e.g., by choosing lower violation levels in the more probable scenarios.

C. Comparing WCVaR-BC and CVaR-BC
Opposed to the CVaR case, in which Ω 1 , Ω 2 , Ω 3 belong to the same scenario cluster (k = 1), the worst-case CVaR (WCVaR) function is used to allow for differentiation among a collection of scenario clusters (Ω k , k ∈ K), e.g. k = 1, 2, 3 clusters can be characterized for Ω 1 , Ω 2 , Ω 3 . As a result, instead of aggregating the three tail approximations, the probabilistic constraint enforcement can be done for the worst one (Ω 3 in the stylized example, as shown in Table I). This way one can avoid over-fitting the model to the aggregation of all tails and reduce possible exposure to extreme outcomes. For a collection of scenario sets (Ω k , k ∈ K ), the WCVaR function is defined as the CVaR belonging to the worst realization: It is shown in [40] that WCVaR remains a coherent risk measure. Furthermore, the same linear approximation may be used for WCVaR, as derived for CVaR in [16], under the assumption that f (x, ω) is linear w.r.t. x, and X is a convex polyhedron. Consequently the resulting SoC constraints may be written as a worst-case constraint that holds for all CVaR-s, belonging to set K : In the model formulation, CVaR is defined over the set of scenario clusters K , as defined in Section III-A, allowing for differentiation in its values. Then the worst-case CVaR (WCVaR) function [40], [41] is used to endogenously account for the worst realization of CVaR. Note, that when the model is supplied by a single cluster of scenarios (k = 1), CVaR and WCVaR are identical.

IV. NUMERICAL RESULTS
In this section, we evaluate the performance of our proposed CVaR and WCVaR constraints to model the energy bounds of the flexibility providers. In particular, the model is tested on an energy community composed of 5 industrial loads (denoted as MIX), referred to as {19, 35, 40, 58, 62}, randomly selected from the Schneider Electric dataset [31]. First, the clustering technique, described in Section III-A, is applied to construct the representative load error scenarios. We make the assumption that each of the 5 sites belongs to a different underlying distribution Ω k , where k = 1, 2, 3, 4, 5. For each site, the distribution is modelled through sequences that are collected from all weekdays in the time-horizon of three months, from January to March in 2016, which results in 65 sequences for each site. These sequences are clustered into 13 clusters for each site. In each cluster, the closest element (defined by using the shape-based distance function [28]) to the centroid is selected as a prototype. The assigned probability of occurrence is proportional to the size of the cluster from which the prototype is selected. Lastly, each cluster's prototype and its probability of occurrence is moved to the final set of in-sample scenarios, leading to overall 65 scenarios (13 representative sequences for the 5 industrial clients). When using CVaR-BC no differentiation is made based on which site was the root of a given scenario, whereas this information is preserved in WCVaR to make distinction among the k ∈ K scenario clusters, i.e., underlying distributions. We first introduce the data used in the numerical case studies (Section IV-A). Then, in Section IV-B, the out-ofsample reliability is compared, obtained with both CVaR-BC and WCVaR-BC. Lastly, we highlight the mean as well as the maximum violations in function of the corresponding objective value (Section IV-C).

A. Experimental data
In the case studies, we assume that the only varying (uncertain) parameters are the real-time load realizations (forecast error with respect to day-ahead expectations), whereas the dayahead forecasted load is used from a single day to focus the analysis on the effects of uncertainty. Furthermore, for the sake of simplicity, we model a single site assuming that it represents the aggregation of several DERs both in terms of fixed load and flexibility. 1 The modeled BSS, which acts as a surrogate for the flexible part of the load, has 3 MW of charging and discharging power (with 98% round-trip efficiency) and 0.4 MWh energy capacity. The daily price profiles were downloaded from ELIA's [42] (the Belgian Transmission System Operator) website. The DA market price is deterministic, and its average value over the day is λ DA t = 18.2 e/MWh. and the expected average RT imbalance market price is E( λ RT t,θ ) = 18.6 e/MWh. To model the uncertainty of real-time electricity prices, 5 scenarios are considered in the optimization, obtained by clustering the yearly data into the 5 representative clusters and selecting their prototypes. Similarly to the DA forecasted part of the demand, the deterministic DA market price and the stochastic RT market price scenarios (θ ∈ Θ) are not altered in the simulations. To avoid unrealistic levels of exchanges with the markets, and extensive virtual bidding, the DA market position of the community is limited to two times the maximal forecasted load (Eq. (5b)), whereas the RT market position is bounded by half of the maximal load deviation (Eq. (5c)) in all considered scenarios. The operational expenditure (OPEX) of the ESS's flexibility is 0.01 · λ DA t in the DA stage and 0.05 · E( λ RT t,θ ) in the RT recourse stage in e/MWh. The higher RT OPEX intends to reflect the increasing communication and scheduling burden when executing deviations closer to realtime.
In the in-sample optimization, the stochastic model is supplied with all the input data such as the DA and RT market prices (with 5 scenarios), the deterministic (fixed) DA forecasted load profile and the generated RT forecast error scenarios (65 scenarios). The 65 forecast error scenarios are constructed by applying the clustering technique (as presented in Section III-A) on the in-sample set of sites: {19, 35, 40, 58, 62}. Once optimality is reached, all DA decisions and the RT imbalance market positions are fixed in the optimization model used in the test runs. In the test runs the RT forecast error realization is being updated, which is the only altered input. Out-of-sample feasibility is not guaranteed due to the (i) in-sample violations of the probabilistic SoC constraints (Eq. 5m), (ii) the DA decisions and the RT imbalance market positions are fixed to the training model's outcome, i.e., only RT charging and discharging decisions are re-optimized, while the load forecast error takes different values. To ensure that the model is out-of-sample feasible, an ancillary slack variable (s UP , s DOWN ) is added to the SoC bounds (Eq. 5m). The non-zero values of the slack variables are penalized in the objective function with a large violation coefficient (1000), leading to the following extension of the objective function:  are non-zero were enumerated in all time steps (t ∈ T ) and in all simulated days (d ∈ D), and the total instances, i.e., number of time steps multiplied by the number of days (31200). The results indicate that WCVaR-BC (indicated by circle markers) leads to always higher reliability than CVaR-BC (triangle markers) for all subject sites. Although, the difference is more pronounced for some sites (e.g. for 70, 12), WCVaR's higher reliability is an expected outcome given the more conservative nature of the WCVaR function. A factor of crucial importance influencing the out-of-sample performance is the difference between the inputs used in the out-of-sample simulations and the inputs used in the training (in-sample) phase of the optimization. Obviously one can expect better performance if the in-sample uncertainty approximation is closer to the realized inputs. When selecting the test sites, it was taken into consideration that the test set should involve various samples w.r.t. their closeness to the insample data. To characterize the (dis)similarity between insample and out-of-sample instances, the same shape-based distance was used as in the clustering step for scenario reduction. In Fig. 5 it is shown how the average 2 sequence of a few selected test site compares to the average sequence of the in-sample training MIX. It is visible that site 12 is the most similar whereas 29 has the highest dissimilarity. The quantified distances between the prototype of each test site and the training set are summarized in Table I, confirming that site 12 is indeed the closest match, whereas site 29 is one of the highest dissimilarity. The possible correlation of these distances to the changes in the out-of-sample performance, e.g, in the number of violations for the probabilistic constraints are discussed later in this Section. The extracted prototypes using the shape extraction function of [33]. MIX indicates the prototype of the in-sample training set, whereas 12,29,70 refer to the prototype of three test sites.

C. The trade-off between violations and expected benefits
In Fig. 6, in the left y-axis the mean daily operation cost is plotted, calculated by the following function: ), shown on the right y-axis.
The WCVaR-BC by definition imposes more conservative constraints such that the higher reliability comes at the cost of lower expected mean performance. This can be understood by looking at the projections in Fig. 6. If e.g., one compares the cost obtained by WCVaR-BC at = 0.08, for site 12 a similar cost can be obtained at = 0.03 by CVaR-BC. The mean violations (indicated by the red ∆ sign), however, differ significantly at the chosen confidence levels, despite that costs are close to each other. The difference is showing that WCVaR-BC may lead to higher average violation when calibrated to achieve similar objective values as CVaR-BC. The same projections, made for site 29, show a much smaller difference, which may be explained by the different distances from the training scenario MIX. It is also visible that in lower confidence levels, the the operational costs are converging, whereas the corresponding violations remain much lower with WCVaR-BC.
The results of Fig. 6 suggests that CVaR-BC on average leads to higher reliability for a given average out-of-sample cost. However, as discussed in the motivation of WCVaR-BC, its advantage lies in the ability to reduce the exposure to extreme out-of-sample outcomes, which has the most benefit when limited historical information is available to approximate the uncertain process. The extreme outcomes, i.e., instances with the highest out-of-sample violations due to the wrong approximation of the underlying uncertainty, are not well depicted in the aggregated results of Fig. 4 and Fig. 6. Therefore, it is insightful to assess the spread of the maximum violations as a function of the corresponding profit in each violating case (Fig. 7). This metric is of great relevance when choosing one of the proposed risk-based constraints over CC. Figure 7 shows the bi-variate kernel density estimate (KDE) plot indicating the expected spectrum of the maximum observed daily violations and the corresponding profits for sites {12,70,29}. In addition, on the marginal x and y axis the histograms are plotted individually for the distribution of both maximum violations and profits. WCVaR-BC leads to lower violations and lower profits by definition. However, it was observed in the analysis that reducing the confidence level by 5-7 % often leads to similar profits by WCVaR-BC as by CVaR-BC. Therefore, to generate results from the same frontier in the comparison of maximum violations, the WCVaR-BC model was solved to lower confidence levels ( = 0.01 − 0.17, meaning 83% was the lowest confidence). The maximum violations are collected for each 65 real-time forecast error realizations and for each in-sample confidence level (#10, #17). Overall this leads to 650 possible violations with CVaR-BC and to 1105 with WCVaR-BC. Due to the difference in the number of studied instances, we normalized both the histograms and the KDE plots, such that their area always adds up to one. Fig. 7a shows that the advantage of using WCVaR-BC is the most prominent for site 29, which is the second furthest candidate from the in-sample scenario MIX (Table II). It can be observed that the maximum violations reach a more than two times higher level than for site 12 and 70 at the tail of the distribution (Fig. 7a marginal y-axis), while the profits are spread in around the same range (Fig. 7a marginal x-axis).
For this site, the in-sample scenario set was a particularly inaccurate approximation, as such the severity of the maximum violating instances are large with CVaR-BC that fits the model tighter to the training set. WCVaR-BC, on the contrary, was capable to circumvent such severe violations, remaining in the same range as for the other two sites (shown by the marginal y-axes of Fig. 7). Contrary to site 29, the outcomes of site 12 (Fig. 7b), the best match with the training set, indicate moderately lower maximum violations with CVaR-BC compared to WCVaR-BC. Site 70, which is representative for most other test sites not shown in the Fig. 7, does not show pronounced differences in using CVaR-BC or WCVaR-BC. The profits as well as the maximum violations are similarly distributed. The three different sites shown by Fig. 7 demonstrate well the trade-off faced by the modeler when choosing between the two proposed risk-based constraints.

V. CONCLUSION AND OUTLOOK
This paper proposes two data-driven risk-based constraints for the risk-aware probabilistic enforcement of the flexibility bounds of an energy community that aggregates a variety of distributed assets, and participates on day-ahead energy and imbalance markets. First, the CVaR-BC is formulated to account for both the severity and the probability of the violations when representing the energy bounds of the EC, which carry potential benefits over CCs. Next, the former constraint is extended to the WCVaR-BC that differentiates the CVaR value among the sub-clusters of clients, allowing to hedge against distributional ambiguity (inheriting from the varying nature of the on-site DER assets). The resulting timeseries scenario-driven optimization models can tackle largescale problem instances in a linear programming fashion.
After qualitatively comparing the proposed constraints to CC, in a numerical analysis it was shown that when assuming limited knowledge about the forecast errors, WCVaR-BC allows for reducing the exposure to extreme levels of violations In the marginal x and y axis the histograms for maximum violations and profits are plotted individually. From all the obtained results, we report the ones with the highest profit as they produce the highest violating instances. Therefore, the results were cut-off at the profit of 440 efor all sites. All plots are normalized such that their area individually adds up to one. via the increased robustness against distributional ambiguity. The proposed constraints can facilitate the inclusion of new sites (with scarce information) in the pooled portfolio of an energy community. Furthermore, the authors believe that the introduced constraints may be well-suited for a broad range of power systems applications with ambiguous uncertainty sets and limited knowledge on the cost of constraint violations.
As future research, the impact of clustering may be better investigated. In the interest of better understand how the number of scenarios and the distance among them can influence the out-of-sample performance of the risk-based constraints. Moreover, the convex nature of the risk-based constraints allow for interpreting the associated dual variables as prices in local energy markets. This property may ease the risk-aware trading of flexible resources in such settings.