An improved decision tree-based method for predicting overvoltage peak values integrating a model-driven scheme
Zhe Zhang 1, Boyu Qin 1*, Xin Gao1, Yixing Zhang 1, Tao Ding1
1 State Key Laboratory of Electrical Insulation and Power Equipment, School of Electrical Engineering, Xi’an Jiaotong University, Xi’an, China
*qinboyu@xjtu.edu.cn
Abstract: The commutation failure is the most prevalent fault in line-commutated converter based HVDC systems, which may result in transient overvoltage on the sending-side system. Overvoltage level evaluation has become a crucial task for power industries to assess the tripping risk of large-scale wind turbines and implement effective stability control measures. In this paper, decision tree (DT) model is adopted to extract the mapping relationship between transient overvoltage and massive electrical quantities of power grids. The common DT algorithm is transformed by modifying the error weight assignment, which reflects the error tolerances for different actual overvoltage regions. To compensate for potential inaccuracies in the data-driven method, a derivation of the mathematical relationship between the reactive power consumed by the rectifier and AC voltage is presented, along with an analytical expression for the peak value of transient overvoltage. On this basis, an overvoltage analysis method integrating the model-driven and data-driven techniques is proposed, and the improved DT algorithm is applied to fast error correction, enhancing the interpretability of regression prediction results. Case studies were performed in the actual Northwest China local region hybrid AC/DC power grid with transient overvoltage problems, and the simulation results verified the effectiveness of the proposed method.

Introduction

To achieve the ”double carbon” target and promote the construction of the new power system, renewable energy sources have ushered in leapfrog growth. Due to the advantages of large transmission capacity, low power loss, and flexible transmission power adjustment, the line-commutated converter (LCC) based high voltage direct current (HVDC) transmission technology has gained widespread adoption in China owing to its capability of long-distance and large-capacity transmission [1,2]. Currently, China has formed the typical hybrid AC/DC power grid with the largest scale and most complicated network structure in the world [3]. Commutation failure (CF) is a unique fault of LCC-HVDC transmission system [4]. During the CF, the voltage amplitude of the sending AC grid will first decrease then increase [5-7]. For the sending system with large scale renewable energy integration, transient overvoltage can cause off-grid accidents of renewable energy, jeopardizing the secure and stable operation of the hybrid AC/DC power grid [8-10]. Therefore, it has become a growing concern for the power industries to analyse the overvoltage level under typical DC faults, providing a basis for the stability analysis of hybrid AC/DC power grid and guiding the formulation of renewable energy high voltage ride through standards.
To investigate the impact of CF on AC system, simulation analysis [11] and discussion [12,13] have been conducted to study the mechanism of transient overvoltage caused by CF. However, these studies only briefly analysed the effect of the DC current rise and drop stage on transient overvoltage, without considering the impact of CF caused by different fault severity and duration on transient overvoltage. In terms of calculating the overvoltage peak value, model-driven techniques relying on power system mechanism models have been proposed, which are comprised of the AC equivalent method [14], reactive power short circuit ratio method [15], and single branch voltage drop method [16]. The AC equivalent method and reactive power short circuit ratio method are proposed based on the ratio of reactive surplus level and system short-circuit capacity during the transient period. Nevertheless, the derivation of above-mentioned two methods involves model simplifications that could lead to unacceptable computation errors. Considering the impact of active power fluctuation to the transient overvoltage, the single-branch voltage drop method is proposed to improve the prediction accuracy, while the computational burden could be challenging in practical power systems. Therefore, the trade-off needs to be made between computation accuracy and speed when adopting these model-driven methods for online overvoltage peak value prediction.
With the application of wide area measurement systems (WAMS) [17], artificial intelligence methods have shown application potential for data-driven overvoltage peak value prediction through data relationship mining [18]. Among traditional machine learning methods, neural networks (NN) have been widely used for transient overvoltage prediction due to the powerful non-linear mapping capabilities [19]. In [20], support vector machines (SVM) have also demonstrated good performance in transient overvoltage classification problems. In addition, core vector machines (CVM) are constructed to extract the mapping relationship between transient overvoltage and massive electrical quantities of power grids [21]. However, traditional machine learning algorithms typically require manual feature extraction from the data, which may affect the prediction effectiveness for complicated and unstructured data types. Consequently, deep learning methods such as long-short term memory network (LSTM) [22] and deep imbalanced learning framework [23] have been adopted to predict the overvoltage peak value due to the capability of automatic feature extraction. In spite of fast computation speed, the rigorous theoretical analysis of power system evolution mechanism is abandoned in data-driven methods, and improper feature selection will lead to over-fitting phenomenon when the number of training samples is insufficient, which affects the accuracy of prediction results. Moreover, higher prediction accuracy is crucial for high-risk scenarios in actual power system operations, while error tolerance considered in the traditional data-driven algorithms is treated equally.
To leverage the application potential of traditional data-driven methods for overvoltage prediction, this paper proposes an improved decision tree (DT) based method integrating a model-driven scheme. The main contributions of this paper are summarised as follows: (a) The decision-making principle of DT model are presented, and the application potential of DT model for predicting the overvoltage peak value is elaborated. To improve the prediction accuracy in high-risk scenarios, the traditional DT algorithm is modified by differentiating the error tolerances for different actual overvoltage regions. (b) A theoretical analysis method for overvoltage peak value of converter buses is studied, with an acceptable calculation accuracy and the potential for online application. On this basis, the data-driven method is integrated with the model-driven method to enhance the robustness to insufficient training sample and the interpretability of prediction results. The proposed DT method is adopted to reveal the association pattern between theoretical analysis results and true values. The advantages of the proposed approach include: (a) Compatibility of computation speed and accuracy for online application. (b) Strong interpretability of regression prediction results.
The remainder of this paper is organized as follows: In section 2, the traditional DT algorithm is modified to enhance the predicting performance in high-risk scenarios. Section 3 establishes an integrating method for predicting overvoltage peak value. Time-domain simulations are performed in section 4 as a verification. Section 5 concludes the paper and highlights future research directions.

Improved data-driven method for predicting overvoltage level

In this section, the decision-making principle of DT model are presented, and the application potential for predicting the overvoltage peak value is elaborated. In addition, to improve the prediction accuracy in high-risk scenarios, the traditional DT algorithm is modified by differentiating the error tolerances for different actual overvoltage regions.
DT model
DT is an powerful supervised machine learning tool to solve the classification and regression problems in high-dimensional data space [24]. The illustration of DT model is depicted in Fig. 1. The basic principle of DT is to recursively partition the input space into smaller subsets based on the values of the input features. The prediction process starts from the root node and ends at a terminal node, and the node with two successors in the DT model is considered as internal node. Each internal node of the tree represents a test on one of the input features, and each branch corresponds to one of the possible outcomes of the test. The leaves of the tree represent the final predictions or classifications for each input instance.
Fig.1 Illustration of DT model
The objective of constructing a DT is to determine the optimal sequence that minimizes the impurity measure at each split. The impurity reduction is calculated based on the difference between the impurity of the parent node and the weighted impurity of the child nodes after the split. A maximal tree is initially trained by recursively splitting a node into two purer successors, where all the available splitting rules are traversed until further splitting cannot improve overall accuracy. Eventually, splitting process partition all the samples in a multidimensional space into different subregions with homogeneous samples [25,26], and the samples in each subregion should have the same or similar prediction objective. Based on the well-trained DT model, the complicated classification or regression problem can be converted to a series of “if-then” questions based on the thresholds of partial input features or their linear combinations [27].
For the regression problem of overvoltage peak value prediction, the electrical quantities related to voltage responds of power grids as selected as input features during the off-line training process. The key factors of overvoltage peak value can then be extracted by determining the splitting rules. When applying online, DT model can achieve the overvoltage prediction according to the operation characteristics of power grids.
Improved DT algorithm
As for traditional regression DTs, the splitting rule is defined as follows. Firstly, t is an internal node in the regression DT, and the purity of node t can be obtained by .
Where N is the number of samples in the internal node t ;yi is the label of ithsample; is the average value of N samples in the node t . Next, the purity loss between the internal node s and two successors split by s is adopted to determine the splitting rules, and the branching quality index ΔR is defined to quantitatively assess the purity loss.
Where R (tR ) andR (tL ) are the purity of the right and left subtrees split by s , respectively; NR andNL are the number of samples in the right and left subtrees, respectively. Therefore, to make each subregion more homogeneous, the splitting attribute Xj and standard K 2 should be selected to maximize the purity loss of node s .
Similarly, all internal nodes in the DT model are split according to the above-mentioned process until relevant constraints such as node purity meet requirements, and the terminal nodes can be obtained. In the terminal node, the ultimate prediction value is determined as the average of all samples.
According to the actual operation requirement, higher overvoltage level poses a greater threat to the secure and stable operation of power systems. Effective control measures should be implemented to eliminate the risk of severe faults. However, it is obvious that the node purity in treats training errors of different samples equally. To address this limitation and better reflect the actual operation requirement, larger weights are assigned to prediction errors in high-risk scenarios, and the modification of the purity and branching quality indices as follows.
According to and , the improved DT algorithm limits the prediction error of high-risk samples and introduces the knowledge of risk differences in overvoltage problems.
Specific DT construction scheme
The specific construction of DT model involves two steps. In step 1, the sample set is generated offline for overvoltage peak value prediction. The composition of sample set is depicted in Fig. 2, where the key electrical quantities of power systems are selected as input features and the corresponding overvoltage peak values obtained by the PSASP software are adopted as the output label.
Fig.2 Composition of sample set
In step 2, the improved DT algorithm is applied to determining the splitting rule for each node in the DT model. Specifically, pseudo code for DT model construction is presented below.