Denoising with Singular Value Decomposition for Phase Identiﬁcation in Power Distribution Systems

—Phase identiﬁcation is the problem of determining what phase(s) that a load is connected to in a power distribution system. However, real world sensor measurements used for phase identiﬁcation have some level of noise that can hamper the ability to identify phase connections using data driven methods. Know- ing the phase connections is important to keep the distribution system balanced so that parts of the system aren’t overloaded which can lead to inefﬁcient operations, accelerated component degradation, and system destruction at worst. We use Singular Value Decomposition (SVD) with the optimal Singular Value Hard Threshold (SVHT) as part of a feature engineering pipeline to denoise data matrices of voltage magnitude measurements. This approach results in a reduction in frobenius error and an increase in average phase identiﬁcation accuracy over a year of time series data. K-medoids clustering is used on the denoised voltage magnitude measurements to perform phase identiﬁcation. Impact Statement —Phase identiﬁcation is a popular problem to apply data driven methods in power distribution systems. These methods allow phase identiﬁcation to be performed without the costly requirements of installing specialized hardware across the distribution system or using unscalable human labor to visually inspect all electrical connections. However, data from smart meters are inherently noisy and can reduce the ability of phase identiﬁcation algorithms to be effective. In this work we present a data driven approach to reduce the amount of noise in smart meter data. In some cases this approach can increase the average accuracy of phase identiﬁcation from 45 . 64% to 98 . 92% using a simple phase identiﬁcation algorithm. It can enable electric utility companies to narrow the trade-off gap between the cost and accuracy of smart meters installed in a power distribution system.


Denoising with Singular Value Decomposition for Phase Identification in Power Distribution Systems
Nicholas Zaragoza and Vittal Rao, Senior Member, IEEE Abstract-Phase identification is the problem of determining what phase(s) that a load is connected to in a power distribution system. However, real world sensor measurements used for phase identification have some level of noise that can hamper the ability to identify phase connections using data driven methods. Knowing the phase connections is important to keep the distribution system balanced so that parts of the system aren't overloaded which can lead to inefficient operations, accelerated component degradation, and system destruction at worst. We use Singular Value Decomposition (SVD) with the optimal Singular Value Hard Threshold (SVHT) as part of a feature engineering pipeline to denoise data matrices of voltage magnitude measurements. This approach results in a reduction in frobenius error and an increase in average phase identification accuracy over a year of time series data. K-medoids clustering is used on the denoised voltage magnitude measurements to perform phase identification.
Impact Statement-Phase identification is a popular problem to apply data driven methods in power distribution systems. These methods allow phase identification to be performed without the costly requirements of installing specialized hardware across the distribution system or using unscalable human labor to visually inspect all electrical connections. However, data from smart meters are inherently noisy and can reduce the ability of phase identification algorithms to be effective. In this work we present a data driven approach to reduce the amount of noise in smart meter data. In some cases this approach can increase the average accuracy of phase identification from 45.64% to 98.92% using a simple phase identification algorithm. It can enable electric utility companies to narrow the trade-off gap between the cost and accuracy of smart meters installed in a power distribution system.

I. INTRODUCTION
Phase identification is the act of determining what phase of a multi-phase power distribution system that a load (such as a home, business, etc.) is connected to. Knowing the phase connectivity of loads is important for keeping a distribution system operating under balanced conditions where the amount of power flowing on each phase is as close to equal as possible [1]. An unbalanced system can result in issues such as lower operating efficiency, component degradation (such as that of secondary distribution transformers), and overheating [2] [3] to name a few. However, maintaining accurate records of phase connections can be made difficult due to events such as maintenance, system reconfigurations, emergency power restorations (such as due to extreme weather events), and potential cyberattacks [4] [5].
As of 2018, 86.8 million out of a total of 154.1 million electric meters installed in the United States are Advanced Metering Infrastructure (AMI) meters according to data from the Energy Information Administration (EIA) [6] [7]. As AMI meter coverage increases, more streams of data will become available for use in data driven methods. Unfortunately, meters are inherently erroneous and are available at different meter classes that ensure a measurement is at least within some percentage of the true underlying value [8]. As smart meters age, the resulting degradation will lead to an increase in meter error which can require recalibration or replacement every few years [9]. This inherent error or noise can negatively affect the ability of data driven methods to effectively perform phase identification.
Phase identification can involve either directly assigning phases to loads [10] or correcting any incorrect labels in the records [11] (section II). We evaluate the effect of denoising with SVD on the problem of phase label correction using a synthetic dataset. The synthetic dataset (section III) is composed of voltage magnitude measurements sampled at 15minute intervals for an entire year generated from the Electric Power Research Institute's (EPRI) ckt5 test feeder circuit [12]. We then present a feature engineering pipeline (section IV) that incorporates Singular Value Decomposition (SVD) with the Singular Value Hard Threshold (SVHT) [13] to denoise voltage magnitude measurements for phase identification. To perform phase label correction we use the k-medoids clustering [14] algorithm with a majority vote approach (section V). Our simulation results (section VI) show a reduction in frobenius error and a corresponding increase in phase identification accuracy for different levels of injected Gaussian white noise. Though SVD has been used for various problems in power distribution systems in the past (section VII), this is the first instance of SVD with the SVHT being used for directly denoising measurements in the phase identification literature.

II. PROBLEM FORMULATION
The goal of phase identification is to determine the phase(s) a load is connected to in a power distribution system. Phase identification can involve either assigning the labels directly (such as by using substation data [10]) or label correction (when the labels are available, but some of them are incorrect [11]). In this work we tackle the label correction problem. Label correction can be performed by first grouping loads that are similar to each other based on some pairwise distance. We can accomplish this grouping using a clustering algorithm. After the loads have been clustered, their labels can be corrected using a majority vote rule. For example, if most of the loads are labeled 'Phase A', then all loads in the cluster will be given the label of 'Phase A'. As long as the majority of available labels within a cluster are correct, then any loads in the cluster with a wrong label will be corrected. However, such an approach is still speculative in nature and the actual phase connections would have to be physically inspected to verify the labeling [15].

A. Electric Power Research Institute (EPRI) ckt5 Test Circuit
The ckt5 [12] test feeder circuit made available by the Electric Power Research Institute (EPRI) is used to generate voltage magnitude measurements via their OpenDSS [16] software. The circuit is composed of a single feeder that feeds a radial network of single phase loads as can be seen in figure 1. The distribution of the different load classes and secondary distribution transformers is shown in Table I.

B. Synthetic Load Profiles
The ckt5 circuit provides a single yearly real power load profile sampled at 1-hour intervals (8760 timesteps) for each load class as shown in figure 2. The period of the seasonality in each load profile is 1-day. Since each load in the circuit is defined with a constant real power value, the OpenDSS software will vary this value over time by the percentage indicated in the profiles. Since we want to work with 15minute samples, linear interpolation is used to convert the 1hour sampled profiles to 15-minute sampled profiles (35040 timesteps). Ideally, we want each load to follow a unique load profile with its own variations. In order to synthesize a unique profile for each load, we simply inject Gaussian white noise to simulate variation in power consumption as shown in equation 1.
d t is the default load profile value at time t for each respective profile. It should be noted that the previous linear interpolation step should be performed before injecting variation. This approach allows us to maintain the yearly trend and daily seasonality of the original load profiles. The choice of σ = 0.1 is somewhat arbitrary and makes the phase identification problem slightly more difficult since the variations are smaller.
Other values such as σ = 0.3 and σ = 0.5 have been tried and result in slightly higher phase identification accuracies in some cases with the methods used in this work.

C. Voltage Magnitude Measurements
The OpenDSS software is used to run a Quasi-Static Time Series (QSTS) simulation of the ckt5 circuit using the synthetic load profiles to generate voltage magnitude measurements for x n mn = x mn + N (0, We denote the noisy measurements by the 35040×1379 matrix X n . The choice of 3σ = νx mn ensures that there is a 99.7% probability that a noisy sample will be within ν percent of the original x mn . A similar noise model has been used before [17] and allows us to model the noise as Gaussian.

A. Stationarity: Second Order Time Difference
Arya et al. [11] have shown that the Pearson correlation coefficient is a good measure for capturing the linear relationship between the voltage magnitude measurements of loads connected to the same phase. Before estimating the Pearson correlation coefficient, we need to ensure that each time series is stationary. When a time series is non-stationary with serial correlation, the estimation of classical summary statistics such as the sample mean, variance, and covariance between two time series' become meaningless [18]. We use the notion of weak stationarity which only requires that the mean of the time series is constant and that its autocovariance function only depends on the distance between time lags rather than on time itself (constant mean, constant variance, and no seasonality) [18] [19]. It should be clear from the default load profiles in figure 2 that the resulting voltage measurements obtained from simulation will not be stationary (there is obvious trend and seasonality). Below are some methods that have been used in the phase identification literature to remove trend and seasonality: • Mitra et al. [20] have used the first order time difference of each time series. • Xu et al. [21] apply the Discrete Fourier Transform (DFT) to each time series, zero out the low frequency components, and inverse DFT the series back to the time domain. • Hosseini et al. [3] use a Finite Impulse Response (FIR) high-pass filter to filter out the low frequencies in each time series. In this work we choose to use the time difference transformation as it is a non-parametric technique that is computationally fast (runs in linear time) and acts as a high-pass filter.
Before looking at the time difference transformation we need to define the backshift and difference operators [18].
Definition 1: The kth order backshift operator B k can be defined as where k is the number of timesteps to lag the input x t . Definition 2: The dth order difference operator ∇ d can be defined as With the synthetic dataset used in this work, we only consider the first (equation 5) and second (equation 6) order time difference transformations.
We use ∇ x i and ∇ 2 x i to indicate the first and second order time difference of the voltage time series of load i. It should be noted that each order of time difference will reduce the number of timesteps in the original time series by 1. ∇X and ∇ 2 X indicate the respective time differences applied to all column vectors of X To observe the effects of the time difference orders, we look at a 7-day window of data starting at the beginning of the year which gives us a 672 × 1379 matrix W . Figure 3 shows the plot of a single load w 1337 , its autocorrelation function (ACF) with 96 time lags (the period of seasonality is one day or 96 timesteps for the 15-minute samples), and a heatmap of the correlation matrix of W for different time difference orders using the Pearson correlation coefficient. Since our goal isn't to fit a time series model to w 1337 , we don't show the partial autocorrelation function (PACF) plot. In the first column of figure 3, the gradual shrink/growth of ACF( w 1337 ) is characteristic of an underlying trend and the periodic movement indicates the presence of seasonality (this is evident in figure 3a). Due to this non-stationarity it can be seen that no distinctive structure can be observed by the Pearson correlation coefficient as shown in figure 3g. In the second column of figure 3, the first order difference causes the underlying correlation structure to begin to appear. However, ACF(∇ w 1337 ) shows significant serial correlation is still present and periodic in nature. The plot of ∇ w 1337 in figure 3b shows that a seasonal component still exists. In the third column of figure 3, the correlation structure is much more distinct with the second order difference. It can be seen that ∇ 2 w 1337 looks closely like white noise and that most of the serial correlation has been removed. However, there is still a significant negative peak in ACF(∇ 2 w 1337 ) at the first time lag with a sudden drop afterwards. This is indicative of a possible underlying Moving Average MA(1) process which is itself stationary. A second order time difference indicates that the underlying trend of w 1337 (figure 3a) within the 7-day window was quadratic rather than linear (in the case of a first order difference). Going forward we will utilize the second order time difference.

B. Singular Value Decomposition (SVD)
The Singular Value Decomposition (SVD) [22] of an M ×N matrix A is shown in equation 7.
U is the M × r matrix of left singular vectors.Σ is the first diagonal r × r block of Σ. Since there can only ever be r singular values, there is no reason to burn CPU cycles and waste memory on calculating the other M − r left singular vectors.
The singular values inΣ will be in descending order where the value of σ i indicates the importance / amount of information contained within u i and v i . The idea behind denoising using SVD is to determine the number of singular values to keep in order to only retain the underlying information while discarding the noise. We use the optimal Singular Value Hard Threshold (SVHT) with unknown noise level first proposed by Gavish et al. [13] to determine the number of singular values to keep as shown in equation 9.
σ median is the median singular value. ω(β) is the optimal hard threshold coefficient which can be approximated using equation 10.
ω(β) ≈ 0.56β 3 − 0.95β 2 + 1.82β + 1.43 (10) β is simply the ratio of the number of rows to number of columns β = M/N . We only keep the first k singular values where σ i ≥ τ and discard all the others. The denoised matrix A d can then be reconstructed using the first k singular values as shown in equation 11.
In order to evaluate how well the denoised matrix approximates the original noiseless data, we use the frobenius error [22] between two matrices A and B as shown in equation 12.
|| · || F is the frobenius norm of a matrix. In figure 4 we examine the effectiveness of the denoising process by looking at a 7-day window starting from the beginning of the year with a second order time difference transformation for different noise levels using the noise model in equation 2. The noise levels correspond to the ANSI C12.20-2015 [8] 0.1, 0.2, and 0.5 meter classes as well as a class 1 meter. In the first row of figure 4 it can be seen that the correlation structure begins to fade away as the noise level increases. Any phase identification algorithm that relies on exploiting the correlations between the voltage magnitude measurements of loads would essentially be neutralized starting at ν = 0.5% for a 7-day window. This fading structure behavior has also been observed by Modarresi et al. [23]. In the second row of figure 4 we can see that the SVD + SVHT is able to recover a good approximation of the original correlation structure for ν = 0.1%, ν = 0.2%, and ν = 0.5%. However, we can observe a limitation of the SVHT with the approximation of ω(β) for ν = 1%. The SVHT determined that only the first three singular values should be kept, but as can be seen in figure 4i the correlation structure isn't recovered in any meaningful manner despite a significant decrease in frobenius error (783.77 down to 179.88). In figure 4j we are able to recover a moderate amount of the structure by manually selecting k = 5 which is just past the elbow in figure 4o.

C. Pairwise Distances
The zero time lagged sample cross-correlation of two jointly stationary time series can be simplified to the sample Pearson correlation coefficient as shown in equation 13 [18].
x andȳ are the sample means of the two stationary time series' x and y, respectively. n is the number of timesteps in the window used to estimate the correlation. Once the correlation is calculated it is converted into a measure of distance by using the Rooted Normalized One-Minus-Correlation [24] as shown in equation 14.
This gives us a distance in the range [0, 1]. We can then calculate a 1379 × 1379 symmetric distance matrix D c where each element of the matrix is calculated from d c xy .
D. Pipeline Figure 5 shows the feature engineering pipeline that describes how the data should be preprocessed. The SVD of a window of noisy data W n will be computed and the denoised matrix W d will be reconstructed (equation 11) using the number of singular values determined by the SVHT. Once W d is built, the second order time difference is applied to each time series (column vector) in W d to make them stationary.
An important thing to note in equation 9 is that the SVHT is a data driven threshold since it depends on the median of the singular values. This means that the SVHT can be used as part of an automated process to automatically select the number of singular values for a given context without requiring human intervention. The number of singular values kept will be specifically tailored for each window W n that is fed into the pipeline.

V. CLUSTERING
Since we have some domain knowledge regarding the dataset (3-phase system with single phase loads and 519 transformers), we know that we should at least be looking for three clusters for phase identification [11] in the dataset. For phase identification we use k-medoids clustering with the d c xy pairwise distance to find three clusters.

A. K-Medoids Clustering
K-medoids clustering [14] is a partitioning algorithm that uses an actual observation in the data as the center or medoid of each cluster. This is opposed to k-means clustering [25] which uses the mean of observations in a cluster as the center or centroid. Because of this, k-medoids clustering is more robust to outliers than k-means clustering. K-medoids clustering can also be used with any precomputed arbitrary pairwise distance. We use the 'alternate' k-medoids algorithm by Park et al [26] that is implemented in the software package [27] used. The steps for k-medoids are listed below: 1) Step 1: Initialization (via k-means++ [28]) a) Randomly (and uniformly) select an observation to be the medoid of the initial cluster. b) Calculate the distance d xc to the cluster c with the closest medoid for each observation x. c) Randomly sample the probability mass function to obtain the observation that will be the medoid of the next cluster. The observation that is furthest from all current cluster medoids will have the highest probability of being selected as the next cluster medoid. d) Repeat from step 1b until k initial clusters have been selected. 2) Step 2: Update (via 'alternate' method [26]) a) The observation within a cluster that minimizes the distance to all other observations in the cluster is chosen as the new medoid for that cluster. b) Repeat step 2a for all clusters.

A. Setup
All voltage magnitude measurements are time synchronized and no topology change events are injected into the yearly simulation. Sliding window widths from 1 to 7 days are used. Since smart meters will send measurement data at least daily back to the utility company [7], we use a window stride of one day. This gives 365−w+1 windows of width w in days for the year. We inject noise into the voltage magnitude measurements using ν = 0.1%, ν = 0.2%, and ν = 0.5% corresponding to the ANSI C12.20-2015 [8] meter accuracy classes of 0.1%, 0.2%, and 0.5%, respectively. We also consider 1% for a class 1 meter.
B. Discussion Figure 6 shows a strip plot of the accuracies and frobenius errors for the 365 − w + 1 windows superimposed on a violin plot. Table II shows the arithmetic mean, median, standard deviation, minimum, maximum, skewness, and fisher's kurtosis of the accuracies. Table III shows the same summary statistics for the frobenius errors.
In the left column of figure 6 we can see that the accuracies for ν = 0.1% and ν = 0.2% are generally separated into two distributions for the noisy data. However, once the data is denoised the majority of the accuracies fall within a single distribution centered above 95% with some outliers below 75%. For ν = 0.5% the majority of accuracies are below 50% for the noisy data. At a window width of four days and above we see that the distribution of accuracies are centered above 95% with a small number of outliers. Upon examination of the outliers, these are simply due to the randomness when selecting initial medoids. The ν = 1% case shows only a minor improvement in accuracy after denoising. However, it is still a dramatic drop compared to the ν = 0.5% case. This makes sense as we saw that the SVHT for ν = 1% selects a number of singular values that is unable to effectively reconstruct the correlation structure as shown in figure 4i.
In the right column of figure 6 we can see that the frobenius errors are quite stable over the year. The center of the error distributions increase as the window width increases. This is due to the fact that the frobenius norm sums over all elements of a matrix. The larger the matrix, the more noisy elements that will contribute to the sum. However, despite this increase in error we still see a relative reduction in error after denoising. An interesting result occurs for ν = 0.1% where the denoised error is actually higher than the noisy error. However, despite this increase in error we can see in figure 6a that there is not only an increase in average accuracy, but also more stable performance over the year. VII. RELATED WORKS SVD is a classical linear algebra technique that has been used fairly extensively in the power systems literature. We summarize a few recent works that utilize SVD. Wei et al. [29] use SVD to denoise Transient Zero Sequence Currents (TZSC) measurements for detecting fault lines using the euclidean distance between ideal clustering curves. Choqueuse et al. [30] used SVD to determine if a system is unbalanced and to estimate angular frequencies from 3-phase signals. Zhao et al. [31] used SVD to compress the phasor data from Phasor Measurement Units (PMUs). Gupta et al. [32] have used SVD to show that the ratio correction factor (RCF) for capacitive voltage transformers (CVTs) can be inferred from the first rank one approximation of voltage synchrophasor data from PMUs.
Surprisingly, there have not been many attempts in the phase identification literature to directly remove noise from sensor measurements as a preprocessing step. Modarresi et al. [23] model the IEEE 13-bus test feeder using a neural network and use the voltage magnitude measurements that are generated by the network as denoised measurements. The network is composed of a 9-node input dense layer, 10-node dense hidden layer, and an 18-node dense output layer (they don't indicate the activation functions used). They modify the test feeder by adding 18 loads to the system with 6 loads on each phase. The low/high voltage and current measurements for all three phases at the substation transformer (9 total measurements) are used as input to the network. The voltage magnitude measurements from the 18 loads are used as the targets for regression with noise injected into three of the loads. Their approach requires pre-training the network on historical data and requires retraining in the event of a topology change (however, one could potentially store a bank of pre-trained networks for each topology configuration). They use k-means clustering on the covariance matrix estimated from the voltages generated by the network to perform phase identification. In our approach we use the voltage magnitude measurements of loads, inject noise into all loads, and it doesn't require pre-training on historical data. Our approach also has no hyperparameters to tune since the selection of singular values is automated by the SVHT.
Many different clustering algorithms have been used in the phase identification literature such as k-means clustering [11], k-medoids clustering [20], Gaussian Mixture Model (GMM) [33], Density-Based Spatial Clustering of Applications with Noise (DBSCAN) [15], and spectral clustering [34]. In this work we chose to use k-medoids clustering for its use with any precomputed arbitrary pairwise distance, robustness to outliers, and simplicity.

VIII. CONCLUSION
We showed that using Singular Value Decomposition (SVD) with the optimal Singular Value Hard Threshold (SVHT) as a component of a feature engineering pipeline is effective for denoising a matrix of voltage magnitude measurements. An average reduction in frobenius error of over 2 times was achieved for window widths ranging from 1-day to 7-days for a white Gaussian noise level of ν = 0.5%. An average phase identification accuracy of 45.64% for noisy data was improved to 98.92% after denoising for a 7-day sliding window over