Morphology Extraction of Fetal Electrocardiogram by Slow-Fast LSTM Network

—The morphology of Fetal Electrocardiogram (FECG) plays an important role in the early diagnosis of fetal health condition. However, it is intractable to extract the clean morphology of FECG signals, which are usually contaminated by Maternal ECG (MECG) and various noises. To extract the clean morphology of FECG signals from noninvasive abdominal ECG records, a high-performance and high-efﬁcient two-stage Slow-Fast Long Short Term Memory (SFLSTM) based architecture is proposed. The MECG elimination and the FECG enhancement are realized by the elaborately designed slow LSTM and fast LSTM to ﬁlter out the MECG and the residual noise components, respectively. Qualitative and quantitative experiments are conducted on the records from two public databases. The experimental results show that our proposed MECG elimination and FECG enhancement schemes improve the Signal-to-Noise Ratio (SNR) by 3.09 dB and 1.81 dB, respectively. The proposed fast LSTM reduces the amount of computation by approximately 50%, without any degradation in performance. Our proposed method may leverage the noninvasive FECG monitoring for the early detection of fetal heart diseases.

Abstract-The morphology of Fetal Electrocardiogram (FECG) plays an important role in the early diagnosis of fetal health condition. However, it is intractable to extract the clean morphology of FECG signals, which are usually contaminated by Maternal ECG (MECG) and various noises. To extract the clean morphology of FECG signals from noninvasive abdominal ECG records, a high-performance and high-efficient two-stage Slow-Fast Long Short Term Memory (SFLSTM) based architecture is proposed. The MECG elimination and the FECG enhancement are realized by the elaborately designed slow LSTM and fast LSTM to filter out the MECG and the residual noise components, respectively. Qualitative and quantitative experiments are conducted on the records from two public databases. The experimental results show that our proposed MECG elimination and FECG enhancement schemes improve the Signal-to-Noise Ratio (SNR) by 3.09 dB and 1.81 dB, respectively. The proposed fast LSTM reduces the amount of computation by approximately 50%, without any degradation in performance. Our proposed method may leverage the noninvasive FECG monitoring for the early detection of fetal heart diseases.

I. INTRODUCTION
F ETAL Electrocardiogram (FECG) presents the electrical activity of the fetal heart, which is one of the most important biomedical signals to help diagnose fetal heart health condition. FECG signals are extracted from Abdominal Electrocardiogram (AECG) signals, which are usually sensed by the noninvasive sensors attached to the mother's abdomen. Noninvasive FECG provides a stable long-term continuous heartbeat monitoring and morphological analysis [1].
Unfortunately, the weak FECG signals are susceptible to the strong Maternal ECG (MECG) signals from the mother's heartbeat. Moreover, AECG signals are usually interfered and deformed by various noises, such as the maternal Electromyogram (EMG), Power Line Interference (PLI), baseline wandering, and random electronic noises. Moreover, because of the overlap between the MECG and FECG components in time and frequency and the short of ground truth annotation, the FECG morphology extraction will be full of challenges and difficulties. As a result, it is an intractable job to extract the clean morphology of FECG signals from AECG signals. In the majority of the existing works, the R-peak detection algorithm has been widely used to extract the Fetal Heartbeat Z. Zhou, K. Huang, Y. Qiu  Rate (FHR) by enhancing the fetal R-peak or eliminating the noise components [2], [3]. Most of the existing works, such as the most well-known wavelet transform methods, aim to extract accurate FHR, which may distort the original waveform of FECG signals [4]. The deformation is usually caused by the R-peak enhancement to obtain accurate FHR and the overlap between the MECG and FECG components in time and frequency.
In this way, there is still some space to improve the performance of conventional methods in the task of the morphology extraction of FECG. Some methods such as Template Subtraction (TS) [5], [6] may achieve a poor performance when dealing with the appreciable overlap between FECG and MECG and highly depend on the mQRS detection performance. Moreover, many parameters need to be set carefully in the parts of existing methods such as the Wavelet Transform (WT). Some methods, such as Blind Source Separation (BSS) techniques [7], need a large number of abdominal channels which may cause the discomfort to the mothers and the challenge of clinical utilization. Besides, obtaining the real data for training is difficult for the learning methods such as Adaptive Neuro-Fuzzy Inference Systems (ANFIS) and Least Square Support Vector Machine (LSSVM).
To address the issues mentioned above, a two-stage architecture based on Slow-Fast Long Short Term Memory (SFLSTM) is proposed, where one stage is to eliminate the MECG components adaptively through an elaborately designed Long Short Term Memory (LSTM) unit, and the other stage is to enhance FECG through a fast LSTM unit (also named as Compact LSTM). Compared with LSTM, the proposed fast LSTM reduces the amount of computation by approximately 50%, with no performance degeneration in the stage of enhancement.
Qualitative and quantitative experiments are conducted on the Database for the Identification of Systems (DaISy) and the Fetal ECG Synthetic Database (FECGSYNDB). Our proposed system can adapt to different noise conditions, and thus the performance is much improved. The experimental results show that our method extracts much cleaner FECG components. Compared with the conventional FECG extraction methods, the proposed method improves the Signal-to-Noise Ratio (SNR) by 3.09 dB in MECG elimination and another 1.81 dB in FECG enhancement and get better performance in lower Root Mean Square Error (RMSE) and latency time. Approximately 4.9 dB gain in total will significantly increase the accuracy of the fetal heart condition diagnosis.
The rest of this paper is organized as follows. Section II briefly introduces the ground works related to our work.  [24], can TS FECG is extrapolated by the templates from the input mixed adaptively estimate the changing of the input signal and filter signals, which depends on the quality of mQRS detection [4]. out FECG components.
The parametric formulation got from a regular quasi-periodic LMS Only the current stage is used to minimize the Mean Square Error heart rate pattern would hamper the detection of events such as (MSE) between the output and the target. extra systoles.

RLS
Historical signals are used to reduce the total squared error. WT It is limited by cross-over in the spectral domain [25]. Parameters ESN It works as a nonlinear medium for the reference signals to must be selected carefully, including wavelet basis, thresholding propagate through. MECG signals are estimated and subtracted rules and level of decomposition. by neural network. [4] BSS The most popular spatial method [4] to detect and extract ECG ANFIS A Sugeno fuzzy inference system based neuro-adaptive learning patterns [7]. It requires a large number of abdominal algorithm is utilized to determine the relationship between MECG channels, which cause discomfort to the mother and the clinical and FECG. utilization challenging.
Section III describes our proposed two-stage LSTM network in detail. Section IV provides the experimental results. Finally, the conclusion is drawn in Section V.
Most of the existing works aim at the accurate extraction of FHR. The preprocessing steps for better FHR extraction always significantly distort the original waveform of FECG signals. Therefore, the morphology extraction of FECG signals is still an unsolved problem. The two obstacles to the realization of the task are the residual MECG components and the inevitable residual noises. We found the existing methods are still suffering from extra systoles, maternal and fetal beats overlapping, low SNR, or low quality of mQRS detection [4]. The resultant FECG components usually contain plenty of noises at the same order of magnitude and frequency as the desired FECG components, which will severely disrupt the morphology analysis. Therefore, the enhancement of FECG to eliminate resided noises is vital. In summary, two stages of processing are indispensable for the morphology extraction of FECG signals. The first stage is the MECG elimination to eliminate the MECG components from the AECG records without disturbing the FECG components. The second stage is the FECG enhancement, to extract the clean morphology of FECG components from the filtered results contaminated by residual noises.
The ANC technique is a widely used method to estimate signals corrupted by additive noise or interference [8], i.e., eliminating MECG from the AECG records. It requires neither the prior knowledge of the MECG components nor the accurate locations of R peaks in MECG for signal slicing and alignment. The vanilla ANC only supports linear transformation, whereas the correlation between the abdominal MECG and the thoracic MECG is usually nonlinear. Hence, the defect of ANC will cause that the filtered FECG contains nonnegligible MECG components. A method used to ameliorate the problem is to involve a "primary" input with a corrupted signal and a "reference" input containing pure noise signals correlated in some unknown way with the noise components in the primary input. The method transforms the reference input adaptively and estimates the signal by subtracting the transformed reference input from the primary input [26].
The principle ANC plays a role like a discriminator to eliminate the components from the primary input, which are linearly correlated to the reference input. It adjusts an adaptive filter to remove the MECG components from the AECG records by sending the thoracic MECG signal to the reference input [8]. However, since the characteristic of linear correlation, the vanilla ANC demands the reference input morphologically similar to the abdominal MECG waveform in the AECG records to work properly. Previous works have confirmed this phenomenon in the literature [8]. Unfortunately, the signal mapping from the maternal heart to the mother's abdomen is nonlinear in reality, and the morphology of the sensed ECG waveforms are highly related to the locations of electrodes and the intrinsic characteristics of the heart. Hence, the required similarity cannot be guaranteed. As a result, ANC with linear filters has limited performance in MECG elimination.
Numerous works have been done to address the defects of ANC [27], [28]. Since the approximation of nonlinear transformation is required in the process of MECG elimination, these methods tried to extend the linear transformation provided by the linear adaptive filter to a nonlinear transformation, with the help of differential nonlinear mapping tools. These nonlinear ANC filter out the signal components in the primary input nonlinearly correlated to the reference input and retain the "independent" components. Nevertheless, through the experi-ments, we found that the filtered FECG signals by the existing methods still have non-negligible residual MECG components, which will severely interfere with the morphology analysis of FECG.
The process of FECG enhancement eliminates the residual noises that reside in the filtered FECG, which is an indispensable step for the morphology extraction. Despite the numerous works done for the MECG elimination, very few works have been done to enhance FECG signals. [29] proposed a timesequenced ANC method for the FECG enhancement. However, this method requires the accurate location of R peaks in FECG, which will significantly affect the performance. Moreover, it needs to adjust hundreds of linear filters, which would challenge the memory space and computation complexity.
In summary, the conventional methods for MECG elimination and FECG enhancement are still unqualified, more efficient methods are desired.

A. Adaptive Nonlinear Noise Cancelling
The concept of ANC is shown in Fig. 1. Signal s is transmitted over a channel to an electrode sensor, which will also inevitably receive an unrelated noise n 0 . s + n 0 forms the primary input of the canceller. A second sensor receives a noise n 1 , which is uncorrelated with the signal s but correlated in some unknown way with the noise n 0 . This sensor provides the reference input to the canceller. The reference input is transformed through an adaptive filter to produce an output y. The adaptive filter is expected to make y as close as to n 0 . Then the output y is deducted from the primary input s + n 0 to produce the system output z = s + n 0 − y, which is also the estimation of signal s. More details of the ANC concept can be found in [8], including the theoretical proof. However, the strict independence is only theoretically true if the processed signal has finite length. In this case, it is more suitable to describe the relationship "independent" as the nonlinear correlation with high complexity, which is very difficult to be approximated. Thus, the relationship between the two records is either linearly correlated or nonlinearly correlated.
For the MECG elimination and FECG enhancement, the correlation is approximated nonlinearly in reality. Thus, the purpose of elimination and enhancement is redefined as to distinguish the complexity of the correlation between the signal components and the reference input. Hence, a "screener" is expected, whose screening criteria is the complexity of nonlinear correlation.
To assess and demonstrate the complexity of correlation, we test the correlation or independent relationship between different signal components such as Abdominal MECG (AMECG), Throracic MECG (TMECG), Abdominal FECG (AFECG), Throracic FECG (TFECG) and noise through two indexes: Pearson product-moment correlation coefficient (PPMCC) and Hilbert-Schmidt Independent Criterion (HSIC) [30]. HSIC is used to measure the statistical dependence between two random vectors. [31] We randomly select signals from different channels in FECGSYNDB of all individual models for many times to guarantee the completeness. The signal correlation coefficients and the results of HSIC between different components are calculated in two cases. The first case is that indexes between pure maternal and fetal signals (AMECG, TMECG, AFECG, TFECG) got from abdomen and thorax are calculated without noise. In the other case, extra noise is added in correlation analysis. A strong correlation between MECG signals from abdominal and thorax or between FECG signals from these two parts can been observed in Table II. It is also obvious that MECG signals and FECG signals have very weak correlation. It means that the correlation between MECGs (AMECG and TMECG) or between FECGs (AFECG and TFECG) is much greater than the correlation between MECG and FECG. The original concept of ANC is based on linear correlation. With the linear adaptive filters, the vanilla ANC distinguishes the linear correlation from the non-linear correlation between the signal components and the reference. Hence, an adaptive non-linear noise canceller is required to address the non-linear problem. Since the screening criteria of ANC is dependent on the adaptive filter, it is natural to replace the linear filter with non-linear mapping tools. An adaptive non-linear noise canceller is derived to distinguish non-linear correlations with different complexity, and the vanilla ANC is transformed into the expected screener. The rationality of this conceptual expansion has been verified in some degree by the success of former works [3], [32], which have tried to combine ANC with different non-linear mapping tools.
Non-linear mapping tools with sufficient capacity and flexibility are desired to get an ideal screener. Furthermore, the intrinsic sequential characteristic of signal data also contains useful information, which has not been fully utilized by the previous works. In this paper, we apply LSTM as the nonlinear mapping tool, which is adept in solving the problems involving sequential data and is capable of approximating any non-linear relationships with sufficient model capacity [33]. A fast LSTM cell is also proposed to improve significantly the computational efficiency, which will be discussed in Section III-C.

B. Long Short-Term Memory
Recurrent Neural Network (RNN) has been widely used in sequential nonlinear problems in recent years. The recurrent connections between neurons provide the network with inherent recurrent and causal architecture, which enables the model to transform the acquired historical information into desired results. However, the vanilla RNN usually has poor performance, which is caused by the vanishing gradient problem [34]. Fortunately, LSTM was proposed as a unique architecture of RNN cells to solve this problem, which maintains a more constant error to allow the model to learn over many steps, thereby opening a channel to link causes and effects remotely [33]. The architecture of the LSTM unit is shown in Fig. 2a. The formulas of LSTM (including the principle of RNN) are described in the following composite functions: where x t is the input vectors, σ is the logistic sigmoid function, f t , i t , o t and C t , which have the same size as the hidden vector h t , are the forget gate, input gate, output gate and cell activation vectors, respectively.
are the weight and bias matrices from the cell to forget gate, input gate, output gate and cell activation vectors, respectively. It is worth noting that in the architecture of LSTM, the requirement of the input dimension is flexible. The RNN model with LSTM is applicable for computations of multi-dimension vectors, which is an instrumental feature for our method.
We have proposed a novel method based on the technique of LSTM to eliminate the PLI from the ECG records [35], which is a critical pre-processing step for the morphology extraction of FECG signals. The performance we obtained reflects the potential of these techniques in the field of signal processing. The problem of eliminating PLI from ECG signals can be attributed to the category that extracting the signal components with known characteristics from non-stationary signals. However, in an AECG record, both FECG and MECG components are non-stationary. Thus in this paper, the problem is totally different, and a specific system scheme is required to solve this problem.

C. Fast LSTM
The architecture of LSTM was originally designed for long term dependencies in data sequences, such as speech recognition and machine translation, which frequently utilize the dependencies between words with long word span [36]. However, the internal mapping system of the FECG signals is causal and locally consistent on the timeline. Thus, the attenuation coefficient of the hidden state should be smoothly variant or constant, which is different from the mechanism controlled by the original forget gate. To verify our hypothesis, we employed the LSTM cell stated above for the experiment of the FECG enhancement. Concretely, we recorded all the values of the three gates during the process. The value of the input and output gates after the convergence of the LSTM model is tabulated in Table III.  Table III illustrates the mean and standard deviation of the values at the input and the output of the three gates. Despite the dramatic change at input vector x t , it can be observed that the output of the forget gate has a much smaller variance than the other two. It means that the value of f t fluctuates within a very small range. This result is automatically learned by the LSTM model, which could be regarded as strong evidence for our hypothesis.
To improve the computing efficiency, a fast LSTM is proposed. More concretely, the forget gate is switched to a relatively constant value, which obtains the gate value without taking x t and h t−1 as input. We also abandoned the output gate, since we found that the output gate has little effect on the function of LSTM in the FECG enhancement stage, which has also been proved in other applications [37]. The definition of the input gate is kept intact to maintain the function and flexibility of LSTM. The architecture of the fast LSTM is shown in Fig. 2b and the formulas of the modification we made are provided in (7) and (8).
The overall complexity comparisons of these two cell architectures are presented in Table IV. The term n represents the dimension of the input hidden vector h t−1 (the number of the cell units), and m denotes the dimension of the input vector x t . It can be observed that fast LSTM reduces the number of recurrently updating parameters by approximately 50%. The computation complexity of each architecture is presented by the number of multiplication for one propagation in the cell unit. These indexes show that fast LSTM only has approximately half of the computation complexity compared with slow LSTM in both forward and backpropagation. Our proposed fast LSTM not only greatly reduces the computations but also outperforms or keeps the same performance as the slow LSTM in the enhancement stage. One disadvantage of our proposed fast LSTM is that it is inferior to the slow LSTM in the extraction stage, where the strong nonlinear mapping ability is required. Therefore, the slow LSTM structure is used in the extraction stage for high performance.
Moreover, the information contained in each data point is influenced by the sampling rate in the task of signal processing, which is very different from the data processed by the LSTM in conventional applications. The information in each signal data point will be attenuated if the sampling rate is too high. Despite the function provided by LSTM, the model could still suffer from the vanishing gradient problem if the input sequence is too long. The computational complexity will also be challenged. Thus, for this ECG processing problem, we downsampled the original records to 250 Hz if necessary, which is still higher than the Nyquist rate of FECG signals.

D. MECG Elimination
Generally, noninvasive AECG signals A(t) sensed from the maternal abdominal wall consists of several signal components, which can be modeled by the following equation: where t is the time index, M (t) represents the maternal ECG signal components, F (t) represents the pure FECG , and w(t) represents all the noise signals, such as power line interference, baseline wandering, motion artifact, and electromyogram. From the perspective of ANC, the component M (t) + w(t) in A(t) is the noise resided in the primary input. However, in the process of MECG elimination, usually only the reference input correlated to M (t) is available. Thus, we leave the cancellation of w(t) to the FECG enhancement stage.
To eliminate M (t) from A(t), we use a set of signals r(t) as the reference input to approximate the MECG components M (t) in A(t) through the nonlinear function R[·] supplied by the proposed slow LSTM based model. The block diagram of our method for MECG elimination is shown in Fig. 4. After the approximation, the desired FECG F (t) could be filtered by subtracting the estimated MECG componentsM (t) from A(t). Similar to ANC methods, the reference input is set to be the maternal ECG signal synchronously obtained from the mother's thoracic wall, and the FECG components resided in it should be ignored. In some cases, more than one thoracic channels are available. In our method, we make the definition that the reference input r(t) contains all the available thoracic ECG signals inspired by [38]. This definition is made possible by the flexible input dimension requirement of our model. All the signals in r(t) are simultaneously and sequentially sent into the model. Thus, x(t) and r(t) are derived as follows: where T (t) is the available maternal thoracic ECG signals and m is the dimension of r(t) (corresponding to x t in the second stage). Since the noninvasive ECG signals sensed from the body surface are the projections of vectorcardiographic loop, sending more than one channel could provide more information of the cardiac electric activities, which could make the transformation be approximated by the model easier.
The flow of the model computation is shown in Fig. 3. Similar to the traditional filtering method, we need a segment of historical signal points to estimate the desired output. Analogy to the definition of the "order" of a digital filter, we supply a segment with the length of t o to function R[·] to estimatex(t). The formula is shown in (12). The error (t) from the subtraction between the estimatedx(t) and x(t) leads to the adaptation process of the model, as shown in (13). For the MECG elimination, the estimatedx(t) and the MECG components M (t) in A(t) after the convergence of the model should be equal. Thus, (t) also gives the desired filtered FECG output.
The parameter adaptation of our model needs an objective function for guidance. Considering the concept of ANC, the objective function for the adaptation is defined as the MSE betweenx(t) and x(t). Moreover, referring to the RLS adjustment method broadly applied by ANC [8], we defined the objective function as follows: The step length of the signal segment for evaluation is denoted Fig. 3: Flow of the model computation. as t e . It can be seen in (14) and Fig. 3 that we have defined the evaluation scopes both before and after the processed point. The reason is based on the fact that the desired mapping is nonstationary. If we only evaluate the objective function on the data points before the processed one like in the conventional RLS, the resultant mapping approximation will be biased. Furthermore, through the experiments, we found that the input x(t) contains unpredictable noises and distortions in reality. By adjusting the model based on the evaluation results after the processed point, the model is given the ability to adapt to the unwanted interferences in advance. As a side effect, a fixed delay will be introduced to the output. The architecture of our model is shown in Fig. 5, which is unrolled to demonstrate the recurrent relationship in the time domain. We need to adapt the model to approximate the desired nonlinear function R[·]. The model contains three layers in total, including an input layer, a hidden layer, and an output layer. The input layer holds the sequential input signals. The hidden vector h(t) is the representation of the historical information of the reference input. These hidden states are combined linearly in the output layer to producex(t). W hx and b x are the weight and bias matrices from the hidden vector to the output, respectively. The formula of the output layer is provided in (15)

E. Adaptive Updating Process of Model Parameters
Parameters of our model are only adaptively updated for the current record under processing, without the usual separation of training and testing phases, which is a challenging machine learning task. The reason is that most of the desired approximations in different records are disparate, and the converged models are non-universal. This mechanism is also consistent with the concept of ANC. The RNN based model can be abstracted as the function f with parameters Θ: where Θ represents all the adjustable parameters in the model Θ is continuously optimized to minimize the objective function J(Θ, t) and to track the variance of the non-stationary mapping. By applying the training algorithm Back-Propagation Through Time (BPTT) [39], the following numerical computation is performed to update the parameters to lower the objective function: where Θ denotes the updated Θ and γ denotes the learning rate. We use Adam [40] to adjust γ automatically. In the process of adaptive updating, the learning rate is an important variable that would impact the performance of tracking the nonstationary system. The system with faster variance requires a larger learning rate. The optimization algorithm Adam also has a hyperparameter Γ to configure the basis of automatic adjustment, which could be used to set the adaptation speed of our model.
Considering the principle of mini-batch Stochastic Gradient Descent (SGD) method and enormous former works [41], the convergence of the model can be guaranteed by the adaptive updating process stated above. The construction and the adjustment process of the model are both implemented on the open-source framework Tensorflow [42].

F. FECG Enhancement
In the FECG enhancement, we expect to remove the noise components w(t) resided in the filtered FECG signals, which would always severely corrupt the morphology of FECG and affect the diagnosis results.
We propose to use the subtly changed method introduced in Section III-C and Section III-D, to enhance the FECG signals. In this case, r(t) and x(t) are the filtered FECG signals from different channels, which contain nonlinearly correlated FECG components F (t), and "independent" noise components w(t).
Concretely, the filtered FECG signal to be enhanced is sent to the primary input x(t) and the available ones from the other channels are sent to the reference input r(t). The formulas are shown as follows: where signals F 1 (t), F 2 (t), . . ., F m (t) are nonlinearly correlated FECG components from different channels and w 1 (t), w 2 (t), . . ., w m (t) are the corresponding noises. The system architecture for FECG enhancement is shown in Fig. 6. In the process of FECG enhancement, the model is adaptively updated to produce the output as close as possible to F 1 (t).
The main difference in the FECG enhancement is thatx(t) is the desired outputF 1 (t), instead of the error signal (t). Again, the proposed adaptive nonlinear noise canceller is more like a screener, which sorts out the signal components from the primary input x(t) nonlinearly correlated to the reference input r(t) with lower complexity.

A. Database
We use the data from the realtime Database for the Identification of Systems (DaISy) [43], and synthesized database Fetal ECG Synthetic Database (FECGSYNDB) [44], [45].
As shown in Fig. 7, there are five AECG channels and three Throracic ECG (TECG) channels in DaISy. All eight channels are sampled at 250 Hz synchronously. The duration of each signal is 10 seconds. The AECG signals are denoted as "A i (t)", and the TECG signals are denoted as "T i (t)", where the subscript i is used to represent the index of the channel.
FECGSYNDB is a large database comprised of simulated adult and noninvasive FECG signals, generated by the FECGSYN simulator [44]. Ten different pregnancies were created by simulating maternal and fetal heartbeats as punctual dipoles with different magnitudes and spatial positions. For each simulated pregnant subject, seven different physiological events (one of these is the base signal without noise) were simulated, with five different levels of additive noise (SNR = 00, 03, 06, 09, and 12 dB) and five repetitions for each

B. Experiment Settings
There are four hyper-parameters in our proposed method: t o , t e , the number of the slow LSTM and fast LSTM cells in the LSTM layer n, and the hyper learning rate Γ. We summarize the settings in Table V. Through the experiments, we found that there is a trade-off between the speed and the extent of convergence, which is depended on the principle of the adaptive updating process, as mentioned above. Larger Γ helps to speed up the convergence and enables the model to track fast variances of the target non-stationary mapping. However, it will introduce a worse extent of convergence and result in ringing output. All the hyperparameter settings are optimized by experiments. To compare the performance, we implemented Template Subtraction CERUTTI (TS-CERUTTI), Template Subtraction Extended Kalman Filter (TS-EKF), Template Subtraction Principle Component Analysis (TS-PCA), LMS adaptive filter, RLS adaptive filter, and ESN. These methods were implemented with the FECGSYN toolbox [46] to extract the FECG components from the same MECG records as baselines. Fas-tICA [23] is used to filter out the MECG components for FECG extraction where the output is named as FastICA.. To show the benefit of our proposed fast LSTM, FastICA was also used to replace our proposed FECG enhancement method in the second stage (named as FastICA-Fil). ANFIS and LSSVM are added only in FECGSYNDB where the source signals are available for learning-based methods. The bell-shaped membership function was used in ANFIS to achieve the best extraction, as described in [47]. The nonlinear transformation of MECG was realized by LSSVM, and then the FECG was obtained by subtracting MECG components from the original signals [48]. When train both methods, the training data and labels were from the first ten seconds of MECG and FECG signals. The testing data and labels were from the same signals, with an interval of more than ten seconds between the training and testing data. Both methods were trained separately for each simulated pregnant subject with 35000 times to achieve convergence. Moreover, to show the performance of MECG elimination and FECG enhancement, we present the outputs of these two stages in all experimental results. Here t s is the sampling interval time of 4ms; t w is the waiting time for sampling data of enough length to start system; t d is the time for displaying the waveform of output sequences, which is equal to the length of the output data multiplied by t s ; t c is the time for calculating the output; t c/point is the computing time for each point, which can be calculated by dividing t c by the total length of one output; and t l is the system latency, which is equivalent to the delay time between output data points and corresponding input points. The blue block represents the input of sampled data points. The purple block represents the corresponding output.
All experiments are conducted on Intel Xeon CPU E5-2678 v3 @ 2.50GHz with Python v3.5 and Matlab 2019b. The timesequence diagram in Fig. 8 shows the feasibility of our system in real-time practice. As illustrated in Fig. 8a, the conventional methods have to accumulate sufficient data points (a few seconds) for each processing, resulting in long latency from the AECG signal capturing to the FECG morphology extraction. In contrast, our proposed SFLSTM shown in Fig. 8b can extract FECG signals at every data point, which significantly reduces the latency caused by processing length of the input Here t l (last) is the system's delay for the last point, which is equivalent to the delay time between the last output data point and corresponding input point. The same t l (f irst) is for the first output point. data. Therefore, our proposed method can efficiently connect to the component or template analysis or making inductive inference, which is more suitable for real-time FECG extraction. As shown in Table VI, though the processing speed of our proposed method is slower than the conventional counterparts, it is still much faster than the data capturing, because t c/point of our proposed method is smaller than the sampling interval. Therefore, the processing throughput is sufficient for real-time processing.

C. Evaluation Indexes
In our experiments, since the pure FECG components are unavailable in DaISy, we can't deliver the accurate quantitative measure results on the filtered signals, e.g., the SNR. Instead, we employ the Signal Quality Index (SQI) to quantify the quality of the extracted FECG components. Concretely, we adopt the kSQI, who is the fourth moment (kurtosis) of the signal [49], which is defined as follows: where X is the signal vector considered as the random variable, µ X and σ X are the mean and the standard deviation of X, is the expected value of the quantity (X − µ X ) 4 . Good quality ECG is considered as highly non-Gaussian since it is not a random signal. Whereas random noises such as muscle artifacts and baseline wander are prone to have normal distributions, and their kurtosis is less than five [49].
Besides, to quantitatively evaluate the performance of our method, the SNR, RMSE, Peak Signal-to-Noise Ratio (PSNR) and SNR based on cross correlation SNR CC of the output signals are adopted.
The SNR is derived as: SNR = 10 log P signal P noise , where P signal and P noise represent the power of the original noise-free FECG components and the residual noise components in the output, respectively. The residual noise components are obtained by the subtraction between the output and the corresponding original FECG signal from the FECGSYNDB. The definition of the power of signals (noted as X) is employed as the second line, where L is the data length of the signal X(t).
The RMSE is derived as: where L , X and Y are the data length and the matrices of output and label, respectively. The PSNR is derived as: where L, MAX, X and Y are the length of data, the max value of the label matrix and the matrices of output and label, respectively. The cross correlation R CC (S1,S2) between the matrices S1 and S2 is derived as: where S1, S2 and L represent the signal sequences and the data length.
The SNR CC is derived as: SNR CC = 10 log P CC signal P CCnoise , where P CC signal and P CCnoise represent the power of the cross correlation sequences. The definition of the power (noted as P CC * ) is employed as the second line. The CC signal (i) represents the signal of cross correlation sequence, which is equal to the autocorrelation coefficient of the original noise-free FECG component label, CC noise (i) represents the noise of cross correlation sequence, which is equal to the subtraction of CC signal (i) and the cross correlation coefficient R CC (output,label) (i) between the output and label. The R CC (S1,S2) (i) can be calculated by (25). Besides, we use SNR or kSQI as the monitoring condition in the iterative process. When the hyper-parameters were set as described above, we found that SNR and kSQI would not increase significantly after ten iterations, which can be seen in Fig. 9. Therefore, the convergence condition is defined as when the mean of the SNR or kSQI have a sufficiently small change η: SNR i∼(i+3) or kSQI i∼(i+3) means the average of SNR or kSQI of the iterations from i to (i + 3). In the figure on the lower left corner, it needs to be explained that the high kSQI index at the beginning is due to the initialization of the model, which results in small initial coefficients and makes the output signal contain high MECG residual. As can be seen in Fig. 9, the system usually reaches convergence in the first ten to fifteen iterations.

D. Qualitative Experiment
For qualitative evaluation, we implemented the processes of MECG elimination and FECG enhancement on the ECG records from the DaISy and FECGSYNDB. In the first database of DaISy, the three channels of TECG signals were employed as one reference signal expressed as r(t) as well as two TECG channels in the database of FECGSYNDB.
The waveform results from the two databases are shown in Fig. 10. The performance of the five AECG channels from DaISy and FECGSYNDB under different case or noise condition can be observed in Fig. 10a and Fig. 10b, respectively. F (t) and Y (t) are the FECG signals after MECG elimination and FECG enhancement, respectively. It can be observed from Fig. 10, our proposed method achieves a much better performance with lower residual noise and MECG and less deformation of FECG.
The effect of each method to do the FECG extraction can be represented intuitively in Fig. 10. From the comparison of source signals (MECG or pure FECG) and outputs extracted by methods, it can be seen that our proposed method achieves much lower residual noise and MECG and much less deformation of FECG at the situations of different channels, noise or cases.
In addition, to quantify the quality of the extracted FECG components, the kSQI of output records from DaISy calculated by (21) is shown in Table VII. Though some methods have the effect in filtering out random noise, but the poor performance in MECG elimination leads to large residual MECG components, as shown in Fig. 10a. Therefor, the methods like FastICA, TS-CERUTTI, TS-EKF, TS-PCA, and LMS may have a higher value of kSQI than RLS and ESN because of the influence of residual MECG components. Though RLS and ESN can remove MECG components more completely than the other methods mentioned above, they still contain some random noise, resulting in low kSQI, as shown in Table VII. The resultsF (t) and Y (t) filtered and enhanced by our method have much better performance because we take both residual MECG signal and FECG signal quality into consideration, especially in those MECG signal or noise dominated cases. Both waveform and kSQI show that the enhanced FECG signal extracted by our SFLSTM network contains much cleaner morphological characteristics for medical diagnosis than the conventional methods.

E. Quantitative Experiment
Besides the qualitative evaluation, we also provided quantitative analysis. Only the synthetic ECG recordings in FECGSYNDB were conducted in this experiment, because there is no source signal provided for SNR, RMSE, PSNR, and SNR CC calculation in DaISy [44], [45].
The mean value of SNR is calculated by (22), which is the average of SNR from the output records of all subjects mentioned in Table VIII. The results in different cases and levels of additive noise are tabulated in Table VIII. It can be observed that there are a few irregularities in the results generated by the conventional methods because the FECG components are overwhelmed by the noise in some randomly selected channels. In contrast, our method has noticeably higher SNR than the counterparts in both elimination and enhancement processes. The mean and standard deviation of RMSE, PSNR and SNR CC calculated by (23), (24), (26) are tabulated in Table IX. It can be seen from Table IX, the output of our proposed method achieves much smaller mean of RMSE and higher mean of PSNR SNR CC, which is an evidence to show the better performance of our method. The Coefficient of Variation (C.V) can be calculated by the ratio of the standard deviation to the mean. In this way, the smaller standard deviation and the smallest C.V obtained through our method show the greater adaptability under different situations.
To minimize the computation cost, we need only to carry out the enhancement when the kSQI after MECG elimination is smaller than 10. Compared with ESN, ANFIS, and LSSVM, our proposed SFLSTM network obtains more than 3.09 dB and 1.81 dB SNR improvement in MECG elimination and FECG enhancement, respectively. The approximate 4.9 dB gain in total will significantly enhance the clinical usefulness of noninvasive FECG monitoring and help the early detection of fetal heart diseases. Our SFLSTM achieves higher means of SNR, lower means of RMSE, and smaller standard deviations, indicating the highly adaptive ability.  V. CONCLUSION In this paper, a two-stage SFLSTM network has been proposed to extract the morphology of FECG. The MECG signals and FECG enhancement are done by the slow LSTM and fast LSTM, respectively. The fast LSTM is designed to reduce the computation cost while keeping the performance of FECG enhancement. Through the experiment results, we can conclude that the proposed method has greatly improved the performance and shown the highly adaptive ability under different noise or case conditions. The extracted clean FECG components could efficiently support the FECG morphologybased medical diagnosis.