An improved decision tree-based method for predicting overvoltage peak
values integrating a model-driven scheme
Zhe Zhang 1, Boyu Qin 1*, Xin Gao1, Yixing Zhang 1, Tao
Ding1
1 State Key Laboratory of Electrical Insulation and
Power Equipment, School of Electrical Engineering, Xi’an Jiaotong
University, Xi’an, China
*qinboyu@xjtu.edu.cn
Abstract: The commutation failure is the most prevalent fault in
line-commutated converter based HVDC systems, which may result in
transient overvoltage on the sending-side system. Overvoltage level
evaluation has become a crucial task for power industries to assess the
tripping risk of large-scale wind turbines and implement effective
stability control measures. In this paper, decision tree (DT) model is
adopted to extract the mapping relationship between transient
overvoltage and massive electrical quantities of power grids. The common
DT algorithm is transformed by modifying the error weight assignment,
which reflects the error tolerances for different actual overvoltage
regions. To compensate for potential inaccuracies in the data-driven
method, a derivation of the mathematical relationship between the
reactive power consumed by the rectifier and AC voltage is presented,
along with an analytical expression for the peak value of transient
overvoltage. On this basis, an overvoltage analysis method integrating
the model-driven and data-driven techniques is proposed, and the
improved DT algorithm is applied to fast error correction, enhancing the
interpretability of regression prediction results. Case studies were
performed in the actual Northwest China local region hybrid AC/DC power
grid with transient overvoltage problems, and the simulation results
verified the effectiveness of the proposed method.
Introduction
To
achieve the ”double carbon” target and promote the construction of the
new power system, renewable energy sources have ushered in leapfrog
growth. Due to the advantages of large transmission capacity, low power
loss, and flexible transmission power adjustment, the line-commutated
converter (LCC) based high voltage direct current (HVDC) transmission
technology has gained widespread adoption in China owing to its
capability of long-distance and large-capacity transmission [1,2].
Currently, China has formed the typical hybrid AC/DC power grid with the
largest scale and most complicated network structure in the world
[3]. Commutation failure (CF) is a unique fault of LCC-HVDC
transmission system [4]. During the CF, the voltage amplitude of the
sending AC grid will first decrease then increase [5-7]. For the
sending system with large scale renewable energy integration, transient
overvoltage can cause off-grid accidents of renewable energy,
jeopardizing the secure and stable operation of the hybrid AC/DC power
grid [8-10]. Therefore, it has become a growing concern for the
power industries to analyse the overvoltage level under typical DC
faults, providing a basis for the stability analysis of hybrid AC/DC
power grid and guiding the formulation of renewable energy high voltage
ride through standards.
To investigate the impact of CF on AC system, simulation analysis
[11] and discussion [12,13] have been conducted to study the
mechanism of transient overvoltage caused by CF. However, these studies
only briefly analysed the effect of the DC current rise and drop stage
on transient overvoltage, without considering the impact of CF caused by
different fault severity and duration on transient overvoltage. In terms
of calculating the overvoltage peak value, model-driven techniques
relying on power system mechanism models have been proposed, which are
comprised of the AC equivalent method [14], reactive power short
circuit ratio method [15], and single branch voltage drop method
[16]. The AC equivalent method and reactive power short circuit
ratio method are proposed based on the ratio of reactive surplus level
and system short-circuit capacity during the transient period.
Nevertheless, the derivation of above-mentioned two methods involves
model simplifications that could lead to unacceptable computation
errors. Considering the impact of active power fluctuation to the
transient overvoltage, the single-branch voltage drop method is proposed
to improve the prediction accuracy, while the computational burden could
be challenging in practical power systems. Therefore, the trade-off
needs to be made between computation accuracy and speed when adopting
these model-driven methods for online overvoltage peak value prediction.
With the application of wide area measurement systems (WAMS) [17],
artificial intelligence methods have shown application potential for
data-driven overvoltage peak value prediction through data relationship
mining [18]. Among traditional machine learning methods, neural
networks (NN) have been widely used for transient overvoltage prediction
due to the powerful non-linear mapping capabilities [19]. In
[20], support vector machines (SVM) have also demonstrated good
performance in transient overvoltage classification problems. In
addition, core vector machines (CVM) are constructed to extract the
mapping relationship between transient overvoltage and massive
electrical quantities of power grids [21]. However, traditional
machine learning algorithms typically require manual feature extraction
from the data, which may affect the prediction effectiveness for
complicated and unstructured data types. Consequently, deep learning
methods such as long-short term memory network (LSTM) [22] and deep
imbalanced learning framework [23] have been adopted to predict the
overvoltage peak value due to the capability of automatic feature
extraction. In spite of fast computation speed, the rigorous theoretical
analysis of power system evolution mechanism is abandoned in data-driven
methods, and improper feature selection will lead to over-fitting
phenomenon when the number of training samples is insufficient, which
affects the accuracy of prediction results. Moreover, higher prediction
accuracy is crucial for high-risk scenarios in actual power system
operations, while error tolerance considered in the traditional
data-driven algorithms is treated equally.
To leverage the application
potential of traditional data-driven methods for overvoltage prediction,
this paper proposes an improved decision tree (DT) based method
integrating a model-driven scheme. The main contributions of this paper
are summarised as follows: (a) The decision-making principle of DT model
are presented, and the application potential of DT model for predicting
the overvoltage peak value is elaborated. To improve the prediction
accuracy in high-risk scenarios, the traditional DT algorithm is
modified by differentiating the error tolerances for different actual
overvoltage regions. (b) A theoretical analysis method for overvoltage
peak value of converter buses is studied, with an acceptable calculation
accuracy and the potential for online application. On this basis, the
data-driven method is integrated with the model-driven method to enhance
the robustness to insufficient training sample and the interpretability
of prediction results. The proposed DT method is adopted to reveal the
association pattern between theoretical analysis results and true
values. The advantages of the proposed approach include: (a)
Compatibility of computation speed and accuracy for online application.
(b) Strong interpretability of regression prediction results.
The remainder of this paper is organized as follows: In section 2, the
traditional DT algorithm is modified to enhance the predicting
performance in high-risk scenarios. Section 3 establishes an integrating
method for predicting overvoltage peak value. Time-domain simulations
are performed in section 4 as a verification. Section 5 concludes the
paper and highlights future research directions.
Improved data-driven method for predicting overvoltage
level
In this section, the decision-making principle of DT model are
presented, and the application potential for predicting the overvoltage
peak value is elaborated. In addition, to improve the prediction
accuracy in high-risk scenarios, the traditional DT algorithm is
modified by differentiating the error tolerances for different actual
overvoltage regions.
DT model
DT is an powerful supervised machine learning tool to solve the
classification and regression problems in high-dimensional data space
[24]. The illustration of DT model is depicted in Fig. 1. The basic
principle of DT is to recursively partition the input space into smaller
subsets based on the values of the input features. The prediction
process starts from the root node and ends at a terminal node, and the
node with two successors in the DT model is considered as internal node.
Each internal node of the tree represents a test on one of the input
features, and each branch corresponds to one of the possible outcomes of
the test. The leaves of the tree represent the final predictions or
classifications for each input instance.
Fig.1 Illustration of DT model
The objective of constructing a DT is to determine the optimal sequence
that minimizes the impurity measure at each split. The impurity
reduction is calculated based on the difference between the impurity of
the parent node and the weighted impurity of the child nodes after the
split. A maximal tree is initially trained by recursively splitting a
node into two purer successors, where all the available splitting rules
are traversed until further splitting cannot improve overall accuracy.
Eventually, splitting process partition all the samples in a
multidimensional space into different subregions with homogeneous
samples [25,26], and the samples in each subregion should have the
same or similar prediction objective. Based on the well-trained DT
model, the complicated classification or regression problem can be
converted to a series of “if-then” questions based on the thresholds
of partial input features or their linear combinations [27].
For the regression problem of overvoltage peak value prediction, the
electrical quantities related to voltage responds of power grids as
selected as input features during the off-line training process. The key
factors of overvoltage peak value can then be extracted by determining
the splitting rules. When applying online, DT model can achieve the
overvoltage prediction according to the operation characteristics of
power grids.
Improved DT algorithm
As for traditional regression DTs, the splitting rule is defined as
follows. Firstly, t is an internal node in the regression DT, and
the purity of node t can be obtained by .
Where N is the number of samples in the internal node t ;yi is the label of ithsample; is the average value of N samples in the node t .
Next, the purity loss between the internal node s and two
successors split by s is adopted to determine the splitting
rules, and the branching quality index ΔR is defined to
quantitatively assess the purity loss.
Where R (tR ) andR (tL ) are the purity of the right and left
subtrees split by s , respectively; NR andNL are the number of samples in the right and
left subtrees, respectively. Therefore, to make each subregion more
homogeneous, the splitting attribute Xj and
standard K 2 should be selected to maximize the
purity loss of node s .
Similarly, all internal nodes in the DT model are split according to the
above-mentioned process until relevant constraints such as node purity
meet requirements, and the terminal nodes can be obtained. In the
terminal node, the ultimate prediction value is determined as the
average of all samples.
According to the actual operation requirement, higher overvoltage level
poses a greater threat to the secure and stable operation of power
systems. Effective control measures should be implemented to eliminate
the risk of severe faults. However, it is obvious that the node purity
in treats training errors of different samples equally. To address this
limitation and better reflect the actual operation requirement, larger
weights are assigned to prediction errors in high-risk scenarios, and
the modification of the purity and branching quality indices as follows.
According to and , the improved DT algorithm limits the prediction error
of high-risk samples and introduces the knowledge of risk differences in
overvoltage problems.
Specific DT construction scheme
The specific construction of DT model involves two steps. In step 1, the
sample set is generated offline for overvoltage peak value prediction.
The composition of sample set is depicted in Fig. 2, where the key
electrical quantities of power systems are selected as input features
and the corresponding overvoltage peak values obtained by the PSASP
software are adopted as the output label.
Fig.2 Composition of sample set
In step 2, the improved DT algorithm is applied to determining the
splitting rule for each node in the DT model. Specifically, pseudo code
for DT model construction is presented below.