A Low Complex Algorithm for Detection of Sleep Apnea from ECG Signals using Sliding SSA Combined with PCA, KPCA, and SPPCA

Sleep apnea is a potentially life-threatening sleep condition in which breathing stops and resumes repeatedly. It is caused by breathing pauses during sleep, which leads to frequent awakenings. As we all know, computational time and efficiency are essential in the healthcare industry; to address this issue, we proposed an algorithm that performs more computations in less time without compromising the machine learning model’s performance. This study employs a unique technique called Sliding Singular Spectrum Analysis (SSSA) to decompose and de-noise the ECG signals. To identify the significant apnea and non-apnea components from the pre-processed ECG data and to reduce the dimensionality, we used Principal Component Analysis (PCA), Kernal PCA (KPCA), and Sub-Pattern based PCA (SPPCA). These characteristics were then used to train and evaluate various machine learning models, including KNN, SVM, GaussianNB, SGD, and XGBoost, to distinguish between apnea and non-apnea ECG data. The publicly available Physionet Apnea-ECG database is used for the simulation of the proposed algorithm. To verify the performance of machine learning models, we have calculated various metrics like accuracy, precision, recall and F1 score. The validation of the proposed method is done by comparing the classification metrics with the latest state-of-the-art works.


INTRODUCTION
S LEEP apnea is among the most prevalent sleep disorders, characterised by breathing pauses while sleeping, also termed as apneaic events, which cause frequent awakenings [1]. Obstructive Sleep Apnea (OSA) occurs when the airway is obstructed by the throat muscles, Central Sleep Apnea (CSA) occurs when the signals that govern breathing are disrupted, and hypopnea occurs when breathing becomes short. Hypopnea can be classified as either obstructive or central hypopnea. Although some studies have shown that around 49.7% of male and 23.4% of female people have sleep-disordered breath conditions [2], many cases remain undetected, as patients never know their illness. These people are predisposed to hypertension, cardiac arrhythmias, heart attacks, and strokes [3] [4]. Some studies also suggest that people with sleep apnea are more likely to engage in collisions with the motor vehicle [5].
Polysomnography (PSG), a sleep test conducted in a hospital or at home that requires the assessment and supervision of a physician, is now used to evaluate sleeprelated respiratory problems. Despite the fact that PSG is the most essential diagnostic tool, its high price and lack of comfort limit its diagnostic value. Because the number of beds available for PSG recording and the number of qualified sleep specialists available for analysis are both restricted, wait periods can be exorbitant. In the United Kingdom, these wait periods range from 2 to 10 months, whereas in the United States, they range from 7 to 60 months [6]. Furthermore, there is a lot of intra-and inter-scorer variability. As a result, the development of technologies that are more portable and less invasive is critical [7] [8] [9]. Despite the significant time and money invested on manual sleep staging, there are still flaws. Generally, the agreement of two distinct ratings is unacceptable [10]- [11] [12] [13] [14] [15]. The inter-rater reliability (IRR) between two scorers using the current sleep scoring criteria is typically about 0.78 [10], as assessed by Cohen's kappa. However, due to poor N1 sleep scoring, the dependability amongst worldwide sleep centres can be as low as 0.58 to 0.63 [11] [12] [13]. The agreement of N1 amongst sleep laboratories in Europe is only about 0.46, and it might be as low as 0.19 to 0.31 between international centres. Furthermore, if a person has a medical condition, the general reliability of manual sleep staging may be reduced even further. For example, the dependability of OSA patients is lower than that of healthy people [14] [15]. Automatic scoring approaches have the potential to enhance sleep staging uniformity across hospitals and healthcare systems. Furthermore, automated techniques capable of precise sleep staging with a small number of observed signals might make the measuring process easier to follow and save money.
As a result, we present a thorough benchmark of several classifiers and four chosen features based on a single ECG signal. Five alternative classifiers for detecting sleep apnea from ECG data were examined in accordance with this. Furthermore, these classifiers were put to the test in three separate dimensionality reduction scenarios, each with its own set of characteristics (also extracted from the signal). In comparison to PSG's traditional screening schemes, the proposed approaches might provide practitioners with a reliable, easy, and cost-effective diagnostic tool.

LITERATURE REVIEW
Sleep apnea can be diagnosed from ECG and EEG signals. In [16], an algorithm for the automatic detection of sleep apnea from single-lead ECG was developed using two features derived from the ECG; namely the standard deviation and the serial correlation coefficients of the RR interval time series along with LS-SVM classifier, which achieved an accuracy of 0.84% on Physionet and 0.83% on Leuven datasets. In [17], an automatic apnea detection algorithm is proposed using single lead EEG Signal based on Rician Modeling of Feature Variation in Multi-band EEG Signal along with KNN Classifier and achieved an accuracy of 0.91% on Physionet and 0.86 on MIT-BIH datasets. In [18], they proposed a wearable, accurate, and energy efficient system for monitoring obstructive sleep apnea on a longterm basis. based on monitoring the patient using a singlechannel electrocardiogram achieved an accuracy of 0.88% on Physionet data.
Other techniques based on machine learning, which have a better learning capacity, can automatically discover more complex structures and generate more accurate predictions. Commonly used methods include SVM [19], Logistic Regression (LR) [20], KNN [19], [20], Linear Discriminant Analysis (LDA) [20], [21], Gaussian Processes (GP) [22] and Artificial Neural Networks (ANN) [23]. The classifying rules or characteristics of each sleep stage were not clearly stated in these systems. Previous research, on the other hand, have often depended on extensive pre-processing, such as converting signals into 2D pictures that contain spectral information or limiting signals to a small number of predetermined characteristics. Furthermore, deep learning models based on healthy people's study datasets have usually lost accuracy when applied to populations with sleep disorders like OSA.
In [24] the following classifiers were built using a singlelead ECG signal: kNN, Multilayer Perceptron Neural Network (MLPNN), and SVM, Least-Square Support Vector Machine (LS-SVM). The RBF kernel-based LS-SVM produced better accuracy, according to the experimental data. Similarly, in [25] and [26] Adaptive boosting (AdaBoost) and SVM were developed to identify normal and apnea episodes, respectively, based on a single-lead ECG. To detect sleep apnea episodes, the authors in [27] retrieved the EDR signal from a single-lead ECG and used the following classifiers: ANN, SVM, kNN, Linear Discriminant (LD), and Quadratic Discriminant (QD). As per their reports, the ANN with two hidden layers gives better performance. Similarly, in their proposal for sleep apnea diagnosis [28], the authors utilised the HRV signal and applied the following classifiers: ANN, BN, kNN, and SVM. The linear kernel SVM achieved the best results. Furthermore, the authors pointed out that the effectiveness of a feature extraction technique varies depending on the categorization method. In [29] with edge computing ideas in mind, the authors suggested a sleep apnea detection system. The authors used RF, Extremely Randomized Trees, SVM, NB, AdaBoost, kNN, and LR to assess the system's performance using data from a singlechannel ECG sensor. Despite the limited amount of features available, the SVM combined with RBF kernel was shown to have the best accuracy. The authors of [30] suggested a sleep apnea detection sensor based on a micro electro-mechanical system (MEMS). The primary objective was to track diaphragm motions during breathing exercise. The suggested model's classifier was the artificial neural network (ANN). In addition, in [31], the authors presented a wearable for ambulatory sleep apnea monitoring. To differentiate between normal and apnea episodes, the model employed a single-lead ECG and an SVM classifier. In contrast, authors suggested a method based on the nasal airflow signal in [32]. The SVM was chosen as the classifier for evaluating the system's performance.
The following deep learning models were constructed using the ECG data in [33]: DNN, 1D-CNN, 2D-CNN, Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Gated-Recurrent Unit (GRU). In addition, the authors recommended combining time series signals with either the 1D CNN or the GRU. In [34], on the other hand, authors utilised a CNN model to analyse a single-channel nasal pressure signal. Furthermore, [35] used LSTM-RNN to integrate the IHR and SpO2 (LSTM-RNN).
Computational time and efficiency are essential in the healthcare industry. Here, we proposed an algorithm that performs more computations in less time without compromising the performance of machine learning models. Usually, reducing the number of variables in a data set reduces accuracy; nevertheless, the answer to dimensionality reduction algorithms (DRA) is to trade little accuracy for simplicity. Because smaller data sets are simpler to examine and visualise, and because ML models can analyse data more easily and quickly without having to deal with unnecessary factors. To summarise, our proposed DRA's purpose is to reduce the number of variables in a data set while retaining as much information as possible.
The purpose of this study is to provide an automated approach for detecting sleep apnea utilising the ECG signal and sliding singular spectrum analysis (SSSA). The SSSA decomposition of non-stationary signals into reconstructed components (RCs) or modes is a variation of the singular spectrum analysis (SSA) method. Mono-component separation from simulated and actual non-stationary signals has been shown using this approach. Because it is a data-driven approach, the SSSA delivers efficient results for multi-scale analysis of non-stationary signals without making any assumptions about the basis functions.

PROPOSED METHODOLOGY
The proposed method mainly comprises of five steps, (i) Pre-processing of ECG signals using sliding SSA, (ii) Extraction of four attribute's from sliding SSA output such as dominant frequency, energy, spectral entropy, and spike rhythmicity, (iii) Calculating the principal components of apneaic and non-apneaic ECG signals using PCA, KPCA, and SPPCA on the four attributes individually, (iv) Classification of apneaic and non-apneaic ECG signals using various machine learning models (KNN, SVM, GaussianNB, SGD, and XGBoost), and (v) Comparision of computational time and performance measures (accuracy, precision, recall and F1 Score) between SSSA-BR, SSSA-PCA, SSSA-KPCA, and SSSA-SPPCA. The hierarchy of proposed sliding SSA combined with KPCA, PCA and SPPCA algorithm are shown in Fig. 3.

Data description
This study made use of the apnea-ECG database, which is a public database available at Physionet [36]. This database comprises 70 ECG datasets, each of which lasts around 8 hours.Each ECG dataset has a 100 Hz sample frequency. For the patients in this database, the Apnea-Hypopnea-Index (AHI) ranges from 0 to 83. An expert assessed 35 of the 70 ECG recordings as apnea (labelled as 1) or normal (labelled as 0) minute by minute. furthermore, based on the number of apnea minutes and AHI, each recording is categorised as "apnea," "borderline," and "normal." We have considered 35 single lead ECG recordings from this database. In this work, we have used each one minute segment of the ECG signal which consist of 6000 samples. The one minute segment of apneaic and non-apneaic ECG signal is shown in Fig. 1. And Singular spectrum's of apnea and non-apnea episodes extracted using SSSA (for L=50) is shown in Fig. 2.

Sliding singular spectrum analysis (SSSA)
The S-SSA can be derived from the standard SSA algorithm [37]. The SSA algorithm is a mixture of two important steps known as: (i) Decomposition and (ii) Reconstruction. The apnea or non-apnea ECG signal (Y n ) is usually corrupted with other artifacts. To remove these unwanted sources from the ECG signal, we perform decomposition at the first stage and the reconstruction of the noise-free ECG signal (X n ) is performed at the second stage.

Decomposition
The SSA decomposition stage is broken down into two parts: embedding and Singular Value Decomposition (SVD). Let us consider an ECG data matrix E of size (K × M ) as mentioned in (1), where (K = N − L + 1), N and L reflect the number of samples and the length of the window, respectively.
Secondly, SVD is performed on the matrix E. The implementation of SVD on E can be represented as: where, V and U are the unitary matrices and D is a diagonal matrix.
Most of the time, SVD is used to find the symmetrical eigen vectors. C = EE T gives the covariance matrix C for the information matrix E, which corresponds to the eigenvalues and eigenvectors as λ 1 , λ 2 , ..., λ M and V 1 , V 2 , ..., V M individually. The i th component of U can be expressed as in (3), where, i = 1, 2, .., M . The trajectory matrix can therefore be represented as follows, where trajectory matrix E i can be defined as The term v i v T i in (5) forms a subspace, which is formed by the eigenvector v i .

Reconstruction
Constructing the SSA again comprises of two steps, which are mainly grouping and diagonal averaging. It has been observed that to construct again each elementary component for any value of L, the usual SSA fails. So in this paper we apply clustering after rebuilding each elementary component, and for reconstruction we have used the Hankelization process.
where, k and p are the indices of the elements of E i and To initialize this algorithm, we allocate each reconstructed elementary component to a unique class. Then to build a novel class the algorithm looks for the closest two classes and puts them together. And we keep on doing this process until the desired number of classes or the maximum dissimilitude between two classes is achieved. For comparing the classes C i and C j which have a lot of time series data, the minimum distance between the two distinct time series of each class gives the dissimilarity.
the distance function d is: where, < x, y >= x T y and x = √ <x, x > The autoSSA function is used to perform the hierarchical clustering on the s i (n) components which are, [s 1 (n), s 2 (n), s 3 (n), ..., s R (n)]. Finally, the j th cluster components are assigned as indices of J j where, j = 1, 2, 3..., r.
In the last step, the components within each cluster are designated as the j th reconstructed components [38].

Dominant frequency
DF and SE, two frequency domain characteristics, have been explored. To get frequency domain characteristics, first convert the time domain signal into the frequency domain using DFT as described in (9).
The maximum peak frequency of each sub-domain is known as dominant frequency. This characteristic is particularly beneficial in the identification of sleep apnea since it yields superior outcomes. For an ECG signal, we computed the dominant frequency of each epoch of size 32.

Spectral entropy
For each sub-pattern, spectral entropy is computed between two frequencies, f 1 and f 2 . It may be written as follows: (10).
Where, f i is the frequency component ranges from f 1 to f 2 , and P n (f i ) is the normalized power spectrum component at f i .
After that, by dividing the SE by log(N ), the normalised spectral entropy (NSE) is calculated. In this work, SE of an ECG signal is calculated using 32-point FFT.

Energy of the signal
Summing the square values of each sub-pattern sample as shown in (11) may be used to compute the energy of each sub-pattern.
where, D can be number of samples in each sub-pattern.

PCA
PCA is most commonly used dimensionality reduction algorithm to compress larger dimensions into smaller ones while keeping the majority of the original dataset's information, which is crucial when classifying the Attributes. The following approach is used to apply PCA on data of dimension p × q (where p is the number of data and q is the number of dimensions of each element): • To begin, the extracted Attributes are used to build a covariance matrix of size (p × p).
• Calculate the covariance matrix's eigenvalues (λ i ) and associated eigenvectors E i .
• Choose the greatest eigenvalues and eigenvectors for 'k' (k < m) eigenvalues. Let E i represent the set of these 'k' chosen eigenvectors.
• To obtain the PCs for attribute vectors, projection of attribute vectors S i onto E i is carried out in order to generate attributes with a reduced dimension according to the following relationship stated in (13). Fig. 4 depicts the Spectral Entropy of ECG signal before dimensionality reduction and Fig. 5 depicts the output ECG signal after dimensionality reduction using PCA. We employed two PCA variations to minimise the dimension- ality of attributes in this study. The following are the two algorithms:

SPPCA
The signal is initially split into several equal sub-parts in SPPCA. We separated an ECG signal into eight parts because they produced the desired results. PCA is applied to each sub-part of the signal in the second phase, and a k number of principal components (PCs) is chosen.The k should be smaller than the signal's dimension in this case. All of the sub-parts are horizontally concatenated after the dimensionality reduction to provide adequate global data. Finally, the classifier is fed with the reduced dimension signal [39]. Fig. 6 depicts the output ECG signal after dimensionality reduction using SPPCA.

KPCA
KPCA is a technique for enhancing the characteristics of ECG data. It also offers a variety of kernel selection algorithms. Gaussian Kernel was employed in this case. KPCA works on the premise of mapping data to a higherdimensional component space F with a nonlinear relationship to the input space [40]. Because KPCA is a circle and PCA is a straight line for the biggest difference between the projections of the points onto the eigenvector (new coordinates), KPCA has greater variance than PCA. Fig. 7 depicts the output ECG signal after dimensionality reduction using KPCA.
KPCA is a PCA method that accounts for non-linearities in ECG data by performing a non-linear mapping of the data onto a higher dimensional space using a kernel function, after which PCA is performed.
In this method, if X is the input space, and ψ is a nonlinear transformation, the kernel K is constructed from their inner product, i.e., K = k(x p , x q ) = ψ(x p ) T ψ(x q ). The eigen-delineation of the centered kernel matrix is given in (14) where, H = I − 1/N 11 T is the centering matrix, I is the q × q identity matrix, I = [1, 1, ..., 1] T is a q × 1 vector, where, T is the matrix containing the eigenvectors and Λ = diag(λ 1 , ..., λ q ) contains the corresponding eigenvalues. Finding the eigenvectors and the eigenvalues, the reconstructed signal is calculated by a procedure similar to that used in the PCA method.

Time complexity reduction using PCA, SPPCA and KPCA
Consider the model parameters of class P with dimension d as P 1 , P 2 , P 3 , ..., P N . The temporal complexity may be calculated using PCA as follows: The input patterns are split into L sub-patterns in the proposed SPPCA algorithm. As a result, the temporal complexity of SPPCA is:

EXPERIMENTAL RESULTS
Initially, from 1000000 we took 900000 samples of apnea-ECG data, and segmentation of the ECG signal is performed using SSSA. The original signal was divided into   To identify ECG data as apnea or non-apnea, we used a variety of machine learning algorithms such as KNN, SVM, GaussianNB, SGD, and XGBoost. The steps taken for implementing the models is as follows: • The dataset was annotated and then it was shuffled so that each data point contributes independently and there is no biasing.
• The attribute extracted in the previous selection are fed into ML models for classification.

•
The classification data was divided in the ratio of 8:2, i.e, 80% for training and 20% for testing.

•
The above mentioned models were then trained and their performance metrics were calculated using the testing data.
For gradient boosting, we took number of estimators as 100, learning rate as 1.0, max depth as 1 and random state as a 0. Similarly, for AdaBoost classifier we have taken number of estimators as 100 and random state as 0. We have considered RBF kernel to perform SVM. For XGBosst the number of 100 estimators are considered. Finally, we have calculated the computational time of all the classifiers for SSSA-BR, SSSA-KPCA, SSSA-PCA, and SSSA-SPPCA algorithms which is presented in Table 3. The classification was done using Python 3.8.5 on a PC loaded with 4GB of RAM and powered with intel pentium processor. Performance metrics (such as accuracy, precision, recall, and F1 score) for SSSA-BR, and SSSA-KPCA are shown in Table 1. Performance metrics for SSSA-PCA, and SSSA-SPPCA are shown in Table 2. Fig. 8 depicts the computational time of spectral entropy for various machine learning models.

DISCUSSION
Several automated sleep apnea diagnosis techniques based on ML classification have recently been developed to replace traditional approaches. As shown in Table IV, the performance of our suggested technique was compared to that of existing methodologies for the classification of apnea and non-apnea classes from the ECG signal. In the recent study, Qi Shen et al. [41] has developed an algorithm to detect Sleep Apnea. In this work OSA detection method based on a multiscale dilation attention 1-DCNN (MSDA-1DCNN) and a weighted-loss time-dependent (WLTD) classification model were employed and obtained an accuracy of 89.4%  [44] have also developed an algorithm using two features in heart rate variability analysis; The first feature uses the principal components of the QRS complexes and The second feature extracts the information shared between respiration and heart rate using orthogonal subspace projections. And achieved an accuracy of 85% using least-squares SVM classifier, using an RBF kernel. In another study, Himali Singh et al. [45] have developed an algorithm using sliding mode SSA combined with classifiers, such as the stacked autoencoder based deep neural network (SAE-DNN), and SVM. They obtained accuracy values of 94.3%, and 72% respectively. Moreover, R.K.Tripathy et al. [46] have developed an automated sleep apnea detection using using bivariate fast and adaptive empirical mode decomposition (FAEMD) combined with SVM and the random forest (RF) classifiers achieved an average accuracy of 78.67% and using the 10fold cross-validation method achieved an average accuracy of 73.13% respectively. Moreover, Michael Sokolovsky et al. [47] have designed a DCNN architecture for automated sleep stage classiffication of human sleep EEG and EOG signals and the accuracy obtained by this method was 81%. Similarly, AsgharZarei et al. [48] have developed an algorithm using sequential forward feature selection (SFFS) technique and fed into the random forest for classification and achieved an average accuracy of 93.9% in per-segment classification. Moreover, P.Janbakhshi et al. [49] have designed an algorithm using ECG variance and phase space reconstruction area. The accuracy obtained by this method was 100% in subject-based apnea detection on independent test data. Similarly, R.K.Tripathy [50] have developed automated detection of sleep apnea using by extraction of features from the intrinsic band functions (IBFs) of both EDR and HRV signals, and the classification using kernel extreme learning machine (KELM) and achieved an accuracy of 74.64% respectively.
More focus has been placed in the proposed study on developing an ML framework uses the time-frequency domain characteristics acquired from the ECG data. Fig. 9 depicts the Average accuracy plot for all four features. To minimise the apnea feature dimensions, we proposed three DRAs: SSSA-PCA, SSSA-KPCA, and SSSA-SPPCA.Furthermore, we employed five different machine learning models to validate the efficiency of the proposed framework, which produced better results when compared to [41] - [50] and had a 100% accuracy rate. In Table III, all of the classification parameters are compared in detail. The main objective of our work is to reduce the computational time while classification of the ECG signal with less complexity and high accuracy. which reduces the burden on the neurologist to perform evaluation of the test results. This complexity is gradually decreased by employing suggested DRA; which shorten the time it takes to detect seizures and reduce human error while examining ECG signals.

CONCLUSION
In this paper, we proposed an DRA's combined with SSSA that performs more computations in less time without compromising the ML model's performance. Here, more emphasis is given to reduce computational time and give better efficiency as they are very essential in the healthcare. We combined sliding SSA with PCA, SPPCA, and KPCA to identify the apnea attributes from the ECG signals. Initially, the ECG signals were decomposed using sliding SSA technique. Then, the Principal components of decomposed ECG signal were calculated using PCA, SPPCA and KPCA. Finally, the apnea and non-apnea dimension reduced attributes were given as an input to five ML classifiers. The classification metrics of all the models are shown in Table I, and Table II. And compared the computational time of ML classifiers using SSSA-BR, SSSA-PCA, SSSA-KPCA, and SSSA-SPPCA as shown in Table I and Table II. The proposed algorithm achieved an highest accuracy of 100% on XGBoost classifier. To validate the proposed algorithm, we have compared our work with the latest state-of-the-art works as shown in Table IV. Average of accuracy for all the features are shown in Table V. In future, we are planning to detect sleep apnea in ECG signals using deep learning algorithms based on multivariate features extracted using convolutional autoencoders.