A Peak Detection Algorithm for Localization and Classiﬁcation of Heart Sounds in PCG Signals using K-means Clustering

—Traditionally, the heart sound classiﬁcation process is performed by ﬁrst ﬁnding the elementary heart sounds of the phonocardiogram (PCG) signal. After detect- ing sounds S1 and S2, the features like envelograms, Mel frequency cepstral coefﬁcients (MFCC), kurtosis, etc., of these sounds are extracted. These features are used for the classiﬁcation of normal and abnormal heart sounds, which leads to an increase in computational complexity. In this paper, we have proposed a fully automated algorithm to localize heart sounds using K-means clustering. The K-means clustering model can differentiate between the primitive heart sounds like S1, S2, S3, S4 and the rest of the insigniﬁcant sounds like murmurs without requiring the excessive pre-processing of data. The peaks detected from the noisy data are validated by implementing ﬁve classiﬁcation models with 30 fold cross-validation. These models have been implemented on a publicly available PhysioNet/Cinc challenge 2016 database. Lastly, to classify between normal and abnormal heart sounds, the localized labelled peaks from all the datasets were fed as an input to the various classiﬁers such as support vector machine (SVM), K-nearest neighbours (KNN), logistic regression, stochastic gradient descent (SGD) and multi-layer percep- tron (MLP). To validate the superiority of the proposed work, we have compared our reported metrics with the latest state-of-the-art works. Simulation results show that the highest classiﬁcation accuracy of 94.75% is achieved by the SVM classiﬁer among all other classiﬁers.

6,59,041 deaths in U.S [28]. To detect heart disease with the use of computers, one have to first understand the four fundamental heart sounds S1, S2, S3 and S4. The first heart sound S1 is produced by the closing of the mitral and tricuspid valves. The second heart sound S2, results from the closure of the aortic and pulmonic valves. The third heart sound S3 also termed as ventricular gallop, is observed just after S2 when the mitral valve opens, to allow passive filling of the left ventricle (LV). Lastly, the fourth heart sound S4, also known as the atrial gallop, comes just before S1 when the atria contract to force blood into the LV [27].
The quickest, simplest and most economical initial line of screening is cardiac auscultation. It is performed for the patients with heart problems, including valve disease, arrhythmia, heart failure, pulmonary hypertension, etc. Contrary to this, heart sounds are tough to recognize and examine by physician, because the important events are closely positioned in time, the sounds are feeble, and their frequency content is at the bottom end of the hearable frequency span [1]. Largescale tutoring is needed to properly grasp cardiac auscultation. It was found that, on an average, only twenty percent of the medical trainees can successfully detect heart problems using auscultation [2].
In the area of biomedical engineering, computerized abnormal heart sound localization can be reviewed as a crucial preceding step to cardiovascular disease recognition [3]. The procedure of recognizing whether a given heart sound is normal or abnormal, can be split into 3 important steps: localization, feature extraction, and classification [4]. First of all, the localization technique locates the elementary heart sounds of the PCG signal S1 and S2 as demonstrated in Fig.  1.
But predicting the elementary heart sounds is altogether a composite task, and can be altered by other sounds such as the existence of S3 and S4 heart sounds and noise [5]. But we have also tried to classify S3 and S4 heart sounds which are present in the abnormal signal as shown in Fig. 2. Previously, Messener et al. [23] have extracted the envelope and spectral features using the same dataset, and for detecting the sequence they have implemented deep recurrent neural network (DRNN). In [24] by employing ensemble empirical mode decomposition (EEMD) along with the kurtosis features is used to localize the presence of S1 and S2 heart sounds. Springer et al. [25] have proposed a method for correct segmentation of the first and second heart sound within PCG signals using an hidden semi-markov model (HSMM) and logistic regression. In [26], the Hilbert vibration decomposition system is used to decompose the heart sound into a fixed number of sub-components while conserving the phase information to further expedite the localization and identification of the signal. However, the accuracies reported in these two works are relatively low.
In order to improve the classification accuracy and automate the process, we proposed an algorithm in which we have used the labelled peaks as the features for the five classifiers, that have been implemented to recognize normal and abnormal heart sounds. Our work presents a novel approach using unsupervised learning model which has been implemented on the PhsioBank ATM Challenge 2016 for the localization of heart sounds. Being unsupervised it may be able to recognize the fundamental sounds in real time noisy PCG signals. Also it saves time by using the labelled peaks as the feature for the classification models, instead of manually finding the features of the peaks. We have attained trailblazing outcomes compared with the start-of-the-art methods for numerous databases present in PhsioBank ATM Challenge 2016.

II. LITERATURE REVIEW
Much of the prior work was focused on PCG segmentation using deep learning and machine learning algorithms. In [6], Fernando et al. proposed a technique to detect the heart state from the PCG signals by using recurrent neural networks (RNNs). They have used attention based learning for the extraction of salient features from noisy and irregular heart signals. Their proposed algorithm was implemented on two databases, which are "PhysioNet/CinC challenge 2016 (PCC)" and "M3dicine database", which is a privately collected dataset consisting of both human and animal heart sounds. For segmentation tasks, numerous feature combinations were evaluated.
Dissanayake et al. in [7] exploited segmentation of heart sound before proceeding for classification by using three different deep learning architectures. In [8], the proposed procedure uses shapley additive explanations (SHAP) and occlusion map algorithm which aided the segmentation to play Fig. 2. Position of S1, S2, S3 and S4 in an abnormal signal a vital role in classification and detection of abnormal and normal heart sounds. Further, the use of MFCC features can easily identify the location of S1 and S2 in PCG signals. The new proposed classifiers in the above work can classify the heart signals with an accuracy of approximately 100%.
Renna et. al. [9] have used a deep convolutional neural network in concurrence with the underlying hidden markov models (HMM), and HSMM, to deduce emission distribution. The outcome of the work has given an average sensitivity of 93% along with an average true positive rate of 94%.
Gojoreski et al. [10] have described a novel approach for the prediction of chronic heart failure using PCG signals. In this method, classical machine learning models have been trained from numerous expert designed features, whereas, deep learning models have been trained from a spatiotemporal and spectral representation of signals. The model has then been implemented on their chronic heart failure (CHF) database along with an open-source training dataset available at "Phys-ioNet Cardiology Challenge". This methodology attains 89.3% accuracy compared to the baseline model. They infer, 15 expert features that can be useful for the detection of CHF phases with a claimed accuracy of 93.2%.
Naami et al. [11] describe adaptive network-based fuzzy inference system (ANFIS) for the classification of abnormal signals. The de-noising of the PCG signals has been performed using notch and butterworth filtering, and the denoised signal is then fed to discrete Fourier transform (DFT) and from third cumulant using high order spectral (HOS) analysis, five features were extracted. These five features have been used with neural network systems to classify heart signals into abnormal and normal signals. The accuracy of their proposed model ranges from 63-89% for the classification task.
Tien-En Chen et al. [12] have proposed a DNN based method for the detection of S1 and S2 heart sounds. The sounds have been converted to MFFCs. The MFFCs features have been clustered into two categories which are then fed into DNN, that further classifies S1 and S2 with an accuracy of 91%. Mishra et. al. [13] have considered morphological characteristics, derived for the PCG signals for recognition of S1 and S2. This method is implemented on publicly available databases along with which they have implemented this on experimentally collected HSS recording in the supervision of the physicians. Quantifying the non-linear and non-stationary nature of PCG signals was inferred from variational mode decomposition (VMD), which is based on quantifying the nonstationary and non-linear nature of PCG signals. Experiments conducted by them conclude a notable improvement in the classification task as compared to the prior works.
All the prior works discussed above are based on the classification of heart sounds in PCG Signals. Most of them are focused on supervised learning for signal localization. Moreover, deep learning requires an enormous amount of data for training purposes and is also a time-consuming task. To overcome this limitation, in this work we have proposed a novel algorithm using unsupervised learning which is time efficient and reports high accuracy. Additionally, most of the other works have used feature extraction technique, which makes the process more computationally expensive and laborious. Therefore, in our work we have not statistically calculated the features and have instead used an automated process for finding the peaks in the signal which further acts as features for the classification models. Simulation results show that the proposed algorithm exhibits better performance as compared Localised peaks of abnormal PCG Signal using K-means clustering to the above state-of-the-art works.

III. PROPOSED METHODOLOGY
Firstly, we have collected the PCG data from PhsioBank ATM. Then we computed the absolute value of the amplitude, which is applied as the input to K-means clustering algorithm. This machine learning algorithm clusters the amplitude of the PCG signals into two clusters and labels it as 0 and 1. Further, to identify the cluster having peaks, we compare the size of the cluster. Whichever cluster is smaller in size, we have considered it to contain the peaks of the signals. Then we have labelled those peaks using a pattern of -1 and 1, and the rest of the signal is labelled as 0. Next, we have balanced this dataset using synthetic minority over-sampling technique (SMOTE). After which we have fed this pattern to classification models, which predict the normal and abnormal heart sounds. The system architecture of the proposed methodology is shown in Fig. 7 and is described in the following subsections.
A. Peak detection using K-means clustering For a normal cardiac signal, the cycle consists of four major regions that are the first heart sound (S1), the systole pause interval (SPI), the second heart sound (S2) and the diastole pause interval [13]. But the pathological signals may contain a third heart sound (S3) and a fourth heart sound (S4) [14]. To identify the four sounds using machine learning, we have used an unsupervised K-means clustering method. We have considered both abnormal and normal datasets for the implementation of the proposed algorithm. The representation of one of the normal signals and abnormal signals present in the dataset is shown in Fig. 3. and Fig. 4. respectively.
Next, to identify the peaks in PCG signals, we have proposed a novel algorithm which is based on K-means clustering as described in Algorithm 1. In Algorithm 1, we first take the absolute value or modulus of the amplitude of the PCG signals. Then, we implement K-means clustering where the value of k is taken as 2. The equation of K-means clustering is as follows: Where J is the objective function, k is the number of clusters, n is the number of cases, x i is the i th case and c j is the centroid for cluster j. The euclidean distance is calculated by the function ||x j i −c j || 2 whose sum is for all the amplitudes of the PCG signal. The distance is summed up for both the centroids. Then we consider the data points for the centroid which has the minimum distance.
As there are two clusters, the cluster with more data points is suppressed as it does not contain the peaks. This can be inferred by looking at the signal, that the data points of peaks in the signal is less than the data points in the rest of the signal.
To suppress the signal we substitute all the non-peak signals with 0 and peaks with a pattern of 1 and -1. The representation of 1 and -1 depends on representation of latest peak example, if 1 has already been assigned to previous peak then -1 would be assigned to current peak.
After running the peak detection algorithm for both normal and abnormal PCG signals, we have considered the new values of the peak as mentioned in step 5 of Algorithm 1 for the classification of the signals. The detected peaks of normal and abnormal PCG signals are shown in Fig. 5. and Fig. 6. respectively.

B. Classification of Peaks using Machine Learning models
For the classification of the PCG signals, using labelled peaks, we have used KNN, logistic regression, SVM, SGD and MLP. But before inputting the data into the classifiers we have used the SMOTE algorithm for oversampling the labelled peaks synthetically. As the data provided in the dataset was imbalanced.
To implement the models, we have taken X as peaks detected through algorithm 1 and Y as the target value. Where the target value is either normal or abnormal heartbeat. We have divided the training and testing data in the ratio of 8:2.
1) K-Nearest Neighbors: To find in which class the signal will lie using KNN we calculate the distance between the testing data and for each training data one by one.
2) Logistic Regression: For implementing logistic regression a weight matrix is created for all the values of input and is then multiplied with it. The result of which is used to predict the class label.
3) Support Vector Machine: SVM uses hyperplane to classify the data into normal and abnormal PCG signals. 4) Stochastic Gradient Descent: The goal of stochastic gradient descent for binary classification is to put input which is the labelled peaks in the linear scoring function and then learn from it. To find the model parameters we minimize E(w,b) given in the equation below: where we have taken L as: For the second hidden layer: For the third hidden layer: For the fourth hidden layer: For the fifth hidden layer: And finally for the output layer: where

A. Dataset description
The proposed methodology is implemented and evaluated on the Challenge 2016 training set A, B, C and F from the Physiobank database [15]. The data has been collected from both healthy subjects and unhealthy patients who are either in clinical or non-clinical environment, and the heart sound signals are taken from diverse locations on the body. Also, the normal signals were from healthy patients and the abnormal recordings were from subjects with an established cardiac problems. Both healthy patients and unhealthy subjects included adults and children. One important point to be noted is that the dataset is unbalanced, which means that the count of the normal signals does not match that of abnormal signals. Also as the recordings were taken in a non-controlled environment most of the recordings are disrupted by various noises. Further, we divided the dataset into 80% and 20%. Where 80% of the dataset is considered for training the models and the rest 20% for validation.

B. Peak Localization
In this work, K-means algorithm has been implemented on the modulus of the amplitude of the signals for localization task. Two clusters are formed from the signals. It can be inferred from the data points of the clusters that a large number of data points are not representing the fundamental heart sounds in the signals. We have labelled the amplitudes of the signal into the pattern of 0, -1 & 1. Allocation of 1 & -1 is totally dependent on the their prior existence in the signal and changes alternatively wherever the amplitude of a peak occurs. For the insignificant amplitudes in PCG signals, we have labelled them as 0. Then machine learning models were implemented to classify the signals into its true positive class.

C. Classification
Five models are implemented for classification of the heart sound signals. Before feeding the features into the classification model SMOTE algorithm was used to over-sample the data synthetically. The main reason behind this step is imbalanced nature of the available data set. The pattern which has been formed after the clustering of the signals are fed as the features to perform classification on signals and the target value is whether the provided signal is normal or abnormal signal, for which abnormal signal is consider as 1 and normal signal is consider as 0. All the data is divided into training and test data in the ratio of 8:2. The models used for classification are SVM, Logistic Regression, KNN, SGD, MLP algorithm. The steps considered in implementing the models are as follows: • The data set was annotated and then it was shuffled so each point contributes independently and there is no biasing. • The data set was divided into training and testing data set. The size of training data set is 80% of the whole data set and test is about 20% of the whole data set.
• After the deduction of pattern from the signal, we have implemented five classification models to classify the signal into the abnormal and normal signal.

D. Classification Results
In this work, the classification task has been evaluated on various metrics i.e. Precision, Accuracy, F-1 Score and Recall. To further validate and visualize the performance, we have plotted receiver operating characteristic (ROC) curve for the SVM classifier because it achieved highest overall accuracy, shown in Fig 8. The evaluation is conducted on all the datasets whose result is given in Table I, Table II, Table III, Table IV  and Table V. The clustering and classification of abnormal and normal signals are implemented using python 3.8.5 on a PC having 8GB RAM and powered with an i5 processor.

V. DISCUSSION
The accuracy of SVM given in Table V is compared with other state-of-the-art methods, which have been implemented to classify PCG signals as normal or abnormal. In [16] Noman et al. have extended the MSAR model to a switching linear dynamic system (SLDS). They have also made a new algorithm via mixing of duration-dependent Viterbi algorithm and the switching Kalman filter. To evaluate the results of their methodology they have used the same dataset and have achieved an accuracy of 91.2%.
Other authors such as Humanyu et al. [17] have proposed a new CNN layer, having time-convolutional (tConv) units, that copy Finite Impulse Response (FIR) filters. They have evaluated their work on publicly available multi-domain datasets and their results have shown the competency of the proposed learnable filterbank CNN architecture in getting durability towards sensor/domain variability in PCG signals by attaing an accuarcy of 82.27%. Further, a new heart sound classification system based on technologies that use deep learning, have been implemented for cardiovascular disease prediction by Xiao et al. [18]. To evaluate their model they have also used the same dataset that has been used in this paper. They have achieved state-of-the-art results, accuracy of 93%. In paper [11], to scrutinize the execution of ANFIS, which is started by spectral analysis features used to detect abnormal heart sounds, Al-Naami et al. have used the same dataset mentioned in this paper. They have got a classification accuracy ranging from 63-89%. Using the same dataset Shuvo et al. [19] have implemented CardioXNet, a new light end-to-end CRNN structure for automatic recognition of 5 target values of cardiac auscultation using unprocessed PCG signals. The proposed model has achieved an accuracy of 86%.
Boulares et al. [20] have shown experimental outcomes based on fine-tuning pre-trained CNN models. The accuracy they have got is 89%. In paper [21] Chen et al. have proposed a new system that takes together CNN and modified frequency slice wavelet transform (MFSWT) for the classification of heart sounds as normal and abnormal. A concealed Markov model is taken into consideration to detect the location of every cardiac cycle in the signal. They have combined two CNN models for training by using sample entropy (SampEn) to see which model is used for classification. For evaluation, they have also used PhysioNet Computing in Cardiology Challenge 2016 and have achieved efficiency and a high accuracy of 93.91% for the classification of normal and abnormal signals. A combination of standard feature engineering method with deep learning algorithms has been implemented by Li et al. [22] for automatic classification of normal and abnormal heartbeats. Stratified five-fold cross-validation was used on the PhysioNet/CinC Challenge 2016 dataset for evaluation, giving an accuracy of 86.8%.
From the above discussed work, it can be contrasted that the proposed method can produce a higher accuracy and is working without taking any features of the fundamental heart sounds for classification. Moreover, the proposed approach has used an unsupervised machine learning method for the localization of heart sounds which does not need to know the location of signals before hand nor requires any pre-processing of the data.

VI. CONCLUSION
In this paper, we have introduced a novel approach for peak localization using K-means Clustering. We demonstrated how the above mentioned algorithm is more efficient and also predicts the peaks accurately. By only using the knowledge of the peaks detected, it is enough for the machine learning and deep learning models to classify normal and abnormal PCG signals. The presented method not only attains high-level performance compared with state-of-the-art baseline systems but is also remarkably more efficient in terms of the steps it requires to classify the signals as abnormal or normal. Moreover, we have used an algorithm using an unsupervised machine learning model which is faster and can automatically localize the fundamental heart sounds without calculating any features. As per our experimental reports, SVM achieved an highest accuracy of 94.75% mentioned in Table V. To validate the performance of our proposed algorithm, we have compared our work with other state-of-the-art works shown in Table VI. The high-level outcomes produced by the proposed method avers its usefulness in computer supported heart sound examination and gives a stage for applications having localization of murmurs and ejection clicks and a variety of subsequent examination. In future, we can use this algorithm on real-time data to check its efficiency and robustness. Shrey Agarwal is presently pursuing a bachelor's degree in information technology at Kalinga Institute of Industrial Technology, Bhubaneswar, India. He is expecting to complete his bachelor's by June 2022. He completed a research internships at IoT Lab KIIT, Bhubaneswar, India and an industrial internship at InfoAxon, Noida, India. He is presently working on multiple research projects one of which is with Samsung. His research interests include Machine learning and Deep learning.
Yashaswi Upmon is presently pursuing bachelor's degree in Information Technology at Kalinga Institute of Industrial Technology, Bhubaneswar, India. He is expecting to complete his bachelor's by June 2022. He is presently working on multiple research projects. His research interests include Natural Language Processing, Machine Learning and Deep Learning.