Data Cleansing With Minimum Distortion for ML-Based Equipment Anomaly Detection

Semiconductor manufacturing has been extensively exploiting machine-learning (ML) to process equipment sensory data (ESD) for near-real time anomaly detection (AD). ESD characteristics are highly diversified and data lengths vary among processing steps and cycles. Cleansing ESD with minimum distortion (CMD) to fit the fixed-length input requirement by ML-based AD is critical to AD effectiveness and is challenging. This paper presents a novel CMD method of four innovations: i) statistical mode-based equalization of step data lengths for the least number of step data length changes, ii) importance indicator value (IIV) of a data sample based on its relative difference with the subsequent sample, and iii) step data segmentation into groups based on samples of significant IIVs and the least-entropy-group-to-cleanse-first rule, and iv) cleansing the least IIV sample(s) in the selected group for step data length equalization. CMD application to ESD demonstrates its characteristics preservation property. Simulation experiments are on an integration of data cleansing with an unsupervised ML-based AD system, STALAD. Comparisons with two benchmark methods over AD scenarios of small-scale drifts and shifts show that CMD not only is superior in facilitating accurate detection by STALAD but also helps detect anomaly much earlier than using the two benchmarks.


Data Cleansing With Minimum Distortion for ML-Based Equipment Anomaly Detection
Yun-Che Hsieh, Chieh-Yu Chen, Da-Yin Liao, Member, IEEE, Kuan-Chun Lin, and Shi-Chung Chang , Member, IEEE Abstract-Semiconductor manufacturing has been extensively exploiting machine-learning (ML) to process equipment sensory data (ESD) for near-real time anomaly detection (AD).ESD characteristics are highly diversified and data lengths vary among processing steps and cycles.Cleansing ESD with minimum distortion (CMD) to fit the fixed-length input requirement by ML-based AD is critical to AD effectiveness and is challenging.This paper presents a novel CMD method of four innovations: i) statistical mode-based equalization of step data lengths for the least number of step data length changes, ii) importance indicator value (IIV) of a data sample based on its relative difference with the subsequent sample, and iii) step data segmentation into groups based on samples of significant IIVs and the least-entropy-groupto-cleanse-first rule, and iv) cleansing the least IIV sample(s) in the selected group for step data length equalization.CMD application to ESD demonstrates its characteristics preservation property.Simulation experiments are on an integration of data cleansing with an unsupervised ML-based AD system, STALAD.Comparisons with two benchmark methods over AD scenarios of small-scale drifts and shifts show that CMD not only is superior in facilitating accurate detection by STALAD but also helps detect anomaly much earlier than using the two benchmarks.
equipment sensory data (ESD) to facilitate near-real time monitoring of the fabrication process and equipment health conditions [1], [2].However, ESD usually suffer with a variety of errors such as data inconsistency, noises, missing values, or outliers, which results in varying lengths of the collected ESD data among processing steps and cycles of recipe steps.Unreliable data for machine learning could lead to misguided decision-making.Data cleansing (or data cleaning) is thus used in the preprocessing step of data-driven machine learning [3], [4], [5], [6] to help improve the quality of collected data by defining, identifying, and correcting errors to ensure the integrity of the data [7].
Semiconductor manufacturing has widely used the traditional SEMI SECS/GEM/HSMS standards [8] for fab automation on equipment communication, control and configuration.In addition, semiconductor fabs adopt the equipment data acquisition (EDA, a.k.a.Interface A) standards [9] to collect and then analyze the specific data from equipment to increase productivity, improve quality, and reduce costs.For each process tool, hundreds of types of sensory data tagged with their State Variable Identifications (SVIDs) are collected.Their data collection rates are on the order of hundreds or thousands of data points per second, generating tens or hundreds of megabytes of data every day.Correlating such huge amount of data from multiple sources provides a clearer picture of the health conditions of the tool or process.Particularly, semiconductor manufacturing applies advanced machine-learning (or deep-learning) approaches for yield enhancement and intelligent anomaly detection (AD) of fab tools [10], [11], with the help of vast ESD collected from the equipment sensors.The characteristics of ESD is highly diversified by the types of sensors, tools, recipes and data value ranges.Cleansing of ESD with minimum distortion (CMD) to fit the fixed-length input requirement by ML-based AD is critical to AD effectiveness and is challenging.
For an ML approach, data are as much important as the ML algorithm itself.The overall effectiveness of ML heavily depends on the quality of both the datasets for training and testing.Although ML largely depends on its data, its model training mechanism is resilient to data errors.In practice, data errors in ESD may simply decrease the accuracy of the ML model, rather than crashing the whole solution.In the ML pipeline, data cleansing is a very critical step that creates quality datasets at its very first stage to avoid data errors from propagating to model training and testing steps [12], [13].
This research deals with the development of an effective data cleansing approach for collected ESD that are stored in the fab FDC (Fault Detection and Classification) database and are used for ML-based anomaly detection of the fabrication process and equipment conditions.The difficulties arise from (i) variable lengths in both step and cycle data that cannot be put into the ML algorithms directly, (ii) the demand of effective data cleansing while preserving the important features of the data, and (iii) the definition and identification of the points of feature importance among the data points.
This paper presents a novel CMD method that is developed based on our prior research efforts in ML-based, anomaly detection in fab tools [11], [13], [14], [15].As an extended research result of [15], the design innovation in our proposed CMD method has fourfold: i) statistical mode-based equalization of step data lengths among cycles or a recipe for the least number of steps to change a length; ii) importance indicator value (IIV) of a data sample based on its relative difference with the subsequent samples, iii) step data segmentation into data group based on samples of significant IIV changes and calculated information entropy of each groups, and iv) deletion or imputation of the least IIV sample(s), i.e., those with the least information entropy, in the group to cleanse.We design experiments of Monte-Carlo simulations on an integration of data cleansing in the STALAD system [11], a framework that is designed for inline anomaly detection in the hugevolume, time-sequenced ESD tagged with hundreds of SVIDS.Comparisons with two frequently used data cleansing methods of cycle length equalization over the AD scenarios of smallscale drifts and shifts in the ESD show that CMD not only is superior in facilitating accurate detection by STALAD but also helps detect anomaly much earlier than using the two benchmarks.
The remaining of this paper is organized as follows: Section II describes the characteristics of ESD, reviews the STALAD framework, and points out the problems and challenges for ML-based AD applications.Section III presents the use of statistical modes to equalize step data lengths and defines the important indicator value (IIV) of ESD.Section III then proposes an IIV-based deletion and imputation approach that first uses the IIV to determine the groups of no importance, and then deletes and/or imputes the data on the least-entropy group to achieve minimum distortion on the importance of the ESD.Section IV describes the experiment designs.Section V explores the CMD property and the impacts of data cleansing on effective ML-based AD.Finally, Section VI concludes this paper.

II. PROBLEMS AND CHALLENGES OF ESD CLEANSING
FOR ML-BASED ANOMALY DETECTION In automated semiconductor fabrication, there is abundant equipment sensory data (ESD) available for in-line anomaly detection.Although data collection is automated and in standard formats, the number of data samples collected from a tool processing a repetitive process actually vary among repetitions and ESD data items.However, most applications of machine learning to ESD processing for anomaly detection   (AD) require a fixed input data length.In this Section, we shall characterize ESD, introduce an unsupervised learning-based framework, STALAD, of detecting in-line equipment anomaly, and define the ESD cleansing problems and challenges.

A. ESD Characteristics
Each tool repeats a series of processing steps of a product according to its process recipe.The SVID sample-data collected by a sensor of the same tool during repetitive processing of a specific recipe is largely cyclic.

1) SVID Data Cycles and Step Data:
A cycle is the time period of an SVID collected from processing a specific recipe over a wafer by a tool.A sample sequence is a time-stamped series of sampled sensory data values from the SVID of a cycle and its length is the number of sampling points in a cycle.Table I illustrates an ESD schema.Fig. 1 shows a cycle of one SVID collected from a tool processing a specific recipe, which consists of a few process steps.Different process steps result in segments in an SVID data cycle.The sample sequence of a process step in cycle data is referred to as step data.Fig. 2 depicts two cyclic SVIDs from concatenating the ESD of 5 wafer processing, where they all show the same and apparent periodicity because of the repetitive processing of one same recipe by the tool [11].

2) Variations of Step and Cycle Data Lengths:
Step data of one same step among different cycles may have different lengths, i.e., different numbers of sampling points, because of engineers' adjustments of process parameters and/or some uncertainties in data acquisition and transmission processes.Variations of step lengths then lead to different lengths among  cycles of a process recipe.For example, such variations appeared in an SVID set of 97 cycles obtained from field data.Fig. 3 gives the distribution of the number of step 1 samples of the 97 cycles and Fig. 4 shows the length distribution of the 97 cycles.

B. STALAD for Anomaly Detection
Many fabs have, as a part of advanced process control, a Fault Detection and Classification (FDC) database [16].The FDC database constantly collects ESD from individual tools and stores them in a time series form for further access.In [11], the authors proposed the STALAD framework of detecting in-line equipment anomaly, which accesses SVIDs from the FDC database.There are two phases in STALAD that require cleansed SVID as input: (1) the unsupervised normal feature learning phase requiring historical SVIDs of a batch of wafers, and (2) the real-time feature testing phase requiring SVIDs of a newly processed wafer.
Fig. 5 depicts the STALAD framework and its interactions with the FDC database and data cleansing modules.STALAD periodically posts a query for SVID data in the learning phase and sends a query triggered upon completing a recipe processing cycle of a wafer.In response, the FDC database returns the requested SVIDs to the data cleansing modules respectively, which then sends the cleansed SVID data as input to the corresponding phases of STALAD.In either phase, the input SVID data length needs to be the same among various cycles of processing one specific recipe.

C. Problem Definition and Challenges
Effective data cleansing should provide fixed-length SVID data inputs to ML-based AD without distorting SVID data characteristics.This requirement holds for not only the STALAD framework but also ML-based process control methods in general.To design an effective data cleansing method, Fig. 5.The STALAD Framework [11].
one needs to resolve the following three issues, I1∼I3, and their corresponding challenges C1∼C3: I1) What should be the fixed data lengths of a recipe cycle and of each step in the cycle?
C1) The relationship between the number of sampling points in the data and the value changes of individual sampling points is unclear.
I2) What is the characteristic importance of an individual group of SVID data samples in a recipe cycle?One may then know where to delete or impute sample data for achieving the fixed length with the least distortions.
C2) Combinations of sensor and tool types, recipes and data value ranges make SVID cycle/step data characteristics highly diversified.Identifying common characteristics among individual SVIDs is very challenging.To design indicator quantifying the importance of a sampling point to cycle/step data characteristics poses further difficulty.
I3) With an importance indicator of sampling points, how does one delete or impute data samples in cycle data to achieve the desirable input length(s) for effective ML-based AD? C3) AD identifies abnormal patterns of cycle data.To facilitate effective AD by un-supervised ML, challenges lie in realizing cleansing with minimal distortions and in demonstrating resultant performance improvements in AD.

III. CMD METHOD DESIGN
This section will first address issues I1) in Section III-A.We shall first propose using statistical mode to equalize step data lengths.To address issue I2), we define in Section III-B the IIV of a data sample based on absolute value of relative difference with its subsequent sample.We then propose an IIV and information entropy-based two-stage cleansing approach in Sections III-C and III-D for solving issue I3).
Let us define the notations that will be used by this Section in Table II.

A. Mode as Length for Minimum Number of Changes
One may equalize SVID data cycle length among cycles of a recipe, i.e., set n r w = n r for all the W r data cycles, w = 1, . . ., W r , of recipe r [11], [17].Data length of step j in data cycle w, i.e., n r w,j , needs not equal n r w ,j , the data length of Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II DEFINITION OF THE VARIABLES IN FOLLOWINGS
step j in data cycle w .As SVID data of a step in a cycle corresponds to the SVID samples from a recipe step, ideally SVID data length of a step, say step j, should be equal among all the data cycles of the recipe, namely, n r w,j = l r j for all the W r data cycles.As such, to better preserve data characteristics among data cycles from processing one same recipe r, one may also consider equalizing data length of each step among all cycles.An equal length for data cycles from a recipe is then determined.
In statistics, mode is the most frequently occurring value of experiment outcomes.Let n r * j be the mode of {n r w,j , w = 1, . . ., W r }.We propose to equalize for all steps of a recipe r by setting: It can be easily shown that the equalization in (1) minimizes the number of sample length adjustments of each step j among the W r data cycles from recipe r.We skip the detailed proof for presentation conciseness.

B. Sample Data Importance Calculation
Data cycles {X r s,w , w = 1, . . .W r } of SVID s from processing recipe r should have similar time series patterns when the equipment is normal.The time series patterns are characterized by value changes and inter-change time durations, as illustrated by the cycles of two SVIDs in Fig. 2. A significant value change between two consecutive samples can be characteristic to a data cycle and the two samples are considered important to the cycle.Either one should be deleted or an imputation in between unless there are no other options.Motivated by such reasoning, we define the importance indicator value (IIV) d s,w,j,k for each sample data X r s,w,j,k as the absolute value of its relative difference with its subsequent sample data X r s,w,j,k+1 , i.e., and Note that IIV d s,w,j,k may also be defined between X s,w,j,k−1 and X s,w,j,k based on the same idea.
Consider the data cycle depicted in Fig. 1 for example.In Fig. 6 illustrates IIVs of {d s,w,j,k , k = 1, . . ., K r w }.It is clear that important samples are a few and unimportant samples, the ones with IIV = 0 or near, are the majority.To preserve the characteristics of a data cycle in the process of data cleansing, one should start with the data samples with IIV = 0.

C. Determination of Samples to Cleanse in a Data Cycle
Analysis of IIVs of various cycles indicates that there are data segments of low IIV data samples as potential samples to cleanse for step length equalization.What should one choose?We shall first exploit IIV to define data groups in step data, indicating the ranges of sample sequence numbers that contain no important characteristics and the samples in them are candidates to cleanse.We then adopt the notion of information entropy to evaluate the entropy of these groups and pick the least-entropy group to cleanse for step data length equalization.Further details are as follows.
1) Delineating Candidate Groups in Step Data to Cleanse: A group in step data is a range of sample sequence numbers, where all samples in it have IIVs smaller than a threshold IIV.The threshold IIV of a step is the mean plus three times standard deviation of all IIVs in this step data.Practically, we delineate groups by identifying boundary points which are the samples with IIVs larger than the threshold IIV.In the exemplary step data in Fig. 7, the red dots are boundary points.The sequence numbers of boundary points are arranged in an ascending order.Samples between two boundary points adjacent in order form a group.
2) Least Entropy Group to Cleanse First: Entropy is a measure of uncertainty and we consider uncertainty of data sample values as probably having some characteristics of a data cycle.We thus adopt the notion of entropy [19]  and compute information entropies of individual groups of samples.The group of the least entropy value corresponds to unimportant samples, which are of least characteristic information.We choose the least-entropy group of samples to cleanse first because these samples are least likely to contain important characteristics and the cleansing minimizes possible characteristic distortions.

D. Sample Cleansing for Step Data Length Equalization
Once the group of samples to cleanse is identified for a step, we design the following deletion or imputation procedure to equalize the step length with minimum distortions to the step.
1) Deletion: If step j data of a cycle has samples more than the mode of the step n r * j , some samples in the least-entropy group need to be deleted.To achieve minimum distortion, we repeatedly delete the sample with the lowest IIV in the group until the resulted step data has length equal to the mode.
2) Imputation: If the step j data of a cycle has samples less than the n r * j , we need to impute samples into the least-entropy group.We first compute the required number of samples to be imputed as n r w,j − n r * j .We then place the filling positions of these imputed samples evenly into the least-entropy group, preventing the distortion caused by imputation accumulates at the same place.We set the filling values to be the same values as the sample preceding the filling position so that the imputations create only zero IIVs.

IV. EXPERIMENT DESIGN FOR CLEANSING IMPACT EVALUATION
Will the algorithm of cleansing with minimum distortions, called CMD thereafter, contribute to the design goal of effective AD when integrated with an ML-based approach?In this section, STALAD serves as the unsupervised MLbased AD function for evaluating the resultant effectiveness when its SVID data inputs are cleansed by CMD designed in Section III.Our evaluation experiment design consists of exploiting realistic data set, three representative anomaly scenarios and two commonly adopted cleansing heuristics for comparisons.

A. Experiment Settings
The basic settings for experiment design consist of an ESD set collected from fab operations, a synthesized abnormal data set, and two commonly adopted cleansing heuristics in practice.
1) ESD Data Set: This dataset is collected from operations of a high density plasma chemical vapor deposition (HDP-CVD) tool in the same schema as that in III.In specific, the set contains 97 of 8 SVIDs, which correspond to repetitive processing of 97 wafers of a single recipe.There are 8×21398 samples, where no sample values are missing.Denote X s,w as the sample sequence of w-th cycle of SVID s, w = 1, . . ., 97, and X s,w ≡ X s,w,k , k = 1, . . ., K w . ( where k denotes the k-th sample, and K w is the total number of samples in cycle w.Only is the 97 th cycle in labeled abnormal in this data set.

2) Anomaly Models for Synthesizing Abnormal Data:
To evaluate how data cleansing impacts on effectiveness of STALAD by Monte-Carlo simulations, there require multiple abnormal data cycles.The actual number of cycles needed by the experiment depends on the accuracy and confidence specifications [18].
The evaluations adopt models of three prominent types of anomaly, drift, shift and spike, to generate abnormal ESD for data cleansing and AD [14].Table III gives the mathematical models of drift, shift, and spike anomalies in terms of D k , S k and P k respectively, where u(•) denotes a unit step function and δ(•) denotes a unit impulse function and α and β are parameters modeling the scale and the occurrence time of the anomalies.
Let A s,w be synthesized abnormal cycle data.Superposing a normal cycle data with a sequence generated by an anomaly model forms an abnormal cycle data, i.e., A s,w,k ≡ X s,w,k + A where X s,w is labeled normal and 3) Benchmark Cleansing Methods: The benchmark cleansing methods for our design CMD include a trajectory alignment method [17], CM1, and another heuristic method, CM2.CM1 equalizes data length based on the least number of sampling points in all input data vectors.So, CM1 removes over-length data samples from individual cycles.CM2 equalizes data length by removing data samples from unimportant ESD suggested by equipment engineers.In our experiment, equipment engineers' suggestions are: 1) removing the step 1 data which denotes that the wafer is transporting, not being Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.processed, and 2) removing enough samples starting from the back end until every cycle has the same length as the shortest cycle.

B. Experiment Designs
Fig. 8 provides a sketch of the experiment system that integrates the three different data cleansing algorithms with STALAD [14].The experiment runs Monte-Carlo simulations of drift, shift, and spike anomaly scenarios for evaluating AD by STALAD in conjunction with CM1, CM2 or CMD.

1) Experiment System and Two Abnormal Scenarios:
Normality model learning: The system first uses the normal cycles of the ESD data set and cleanses them as inputs to the normal feature learning phase of STALAD.As shown in Fig. 5, the learned SAE weights form a normality model of the normal cycles and the test thresholds derived serve as AD test parameters.
Abnormal data cycle generation: The experiment system then generates abnormal data cycles by superposing, according to (4), the normal cycles of the ESD data set with the two anomaly models in Table III.The abnormal data cycles are inputs to the cleansing module.Cleansing outputs of fixedlength step data and data cycles finally go to the real-time feature testing phase of STALAD for AD.
Dependent variables of this experiment are effectiveness metrics of AD, which will be defined next.Analyses of the AD effectiveness metrics obtained from Monte-Carlo simulations demonstrate properties and provide comparisons of the CM1, CM2, and CMD.
2) Effectiveness Metrics: Effectiveness metrics of data cleaning and STALAD are sensitivity, false alarm rate, and anomaly detection time.Sensitivity, p sens , is the proportion of correctly detected cycle among all the testing data cycles, and false alarm rate, p FA , is the proportion of detected cycle among all the normal testing data cycle.Mathematically, they are defined as follows: # of detected cycles in all abnormal testing cycles # of abnormal cycles in all testing cycles , ( p FA ≡ # of detected cycles among all normal testing cycles # of normal cycles among all testing cycles , ( Lastly, Anomaly detection time is defined as the first cycle of anomaly detection among all the sequential testing data cycle. 3) Control Variables: Table IV lists notations and definitions of control variables in all scenarios.As the testing data is the superposition of normal SVID cycles from the ESD data set and the anomaly model-generated data, we set both test and training data to 50 cycles.These all cycles have the similar pattern as Fig. 1 depicts.The parameters α 1 and α 2 are used  to analyze the impacts of the cleansing algorithm to different scales of drift or shift anomalies.Note that the largest deviation caused by a drift is 3 (max α 1 ) × 230 (cleansed cycle length) = 690, and that caused by a shift is 300 (max α 2 ).Therefore, both α 1 and α 2 settings are small-scale as their largest deviation is no more than 17% to the scale of the ESD values which is at least 4200.

V. CMD PROPERTY AND IMPACTS ON AD EFFECTIVENESS
Experiments run Monte-Carlo simulations of the two abnormal scenarios over the experiment system in Fig. 8. Comprehensive analyses of the results will demonstrate that CMD indeed has a characteristics preserving property and how CMD impacts on STALAD effectiveness as compared to the two benchmark cleansing methods.When drift slope is 3, all cleansing methods again yield close sensitivity, which is higher than 0.9.Drift at this scale is large enough so that STALAD with any cleansing methods can detect it.Fig. 11 demonstrates that when drift slope is in [1], [2], sensitivity of STALAD with CMD input is 0.98, which much outperforms those using CM1 or CM2 input.Reasons that CMD can help STALAD detect small shifts are similar to those of raising the sensitivity for detecting small drifts.Moreover, ESD value changes resulted from the changes between two consecutive steps in a data cycle are usually significant rises or drops as depicted in Fig. 9.These steep changes have similar characteristics to a shift.Step data length equalization helps make the changes occurs at Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

D. Detection Speedup
CMD can also speed up small drift detection in time-domain by STALAD.We express the detection time as the number of undetected cycles from the time the drift anomaly is injected to the data until the time the drift anomaly is detected.
In this experiment, input ESD consists of 100 drift cycles.Each cycle is a real cycle plus a drift with the given slope and an accumulated intercept up to the previous cycle, which simulates the real case that deviation caused by a slow drift will accumulate.That is, the k-th sample of the w-th drift cycle of SVID s can be represented as A s,w,k : A s,w,k = X s,w,k + αk + (w − 1)αK, (7) where X s,w,k is the k-th sample of the w-th real cycle of SVID s, K is the equalized length of a cycle, and α is the slope of drift.For each α, we replicate the experiment, both learning and testing phases by STALAD, for 30 times with randomly generated initial weights for the learning phase in STALAD.Fig. 13 shows the comparisons among CMD, CM1 and CM2, where the horizontal axis is drift slope and the vertical axis is detection time.All methods are similar and have detection time about 2 cycles when drift slope is larger than 1.0.These drifts are so significant that STALAD can detect them irrelevant of cleansing methods.When drift slope is smaller than 0.5, especially for slope 0.05, CMD significantly help speed up detection time of STALAD in detecting a drift for about 50 cycles than CM1 and CM2.
The reason is again that CMD alignment and equalization of step and cycle data lengths with minimum distortion yield tighter anomaly test threshold setting for STALAD as that describe in Section V-B for small drift detection.Thus

VI. CONCLUSION
This paper has presented a novel cleansing with minimum distortion (CMD) approach to cleanse equipment sensory data (ESD) and facilitate accurate and early detections by using ML-based AD for semiconductor manufacturing.CMD has four innovations to achieve such effectiveness: i) statistical mode-based equalization of step lengths among cycles of a recipe to achieve least changes, ii) importance indicator value (IIV) of a data sample measuring its absolute value of relative difference with the subsequent sample, iii) step data segmentation into data groups based on samples of significant IIV changes, and the least-entropy-group-to-cleanse-first rule, and iv) deletions or imputations of the least IIV sample(s) in the group to cleanse.Inspections on original and cleansed ESD demonstrate that CMD aligns and equalizes step length over all cycles and preserves data characteristics by minimum changes to IIVs.Experiments integrating with an unsupervised learning-based AD framework, STALAD, show that CMD increases sensitivity of STALAD by up to 80% for detecting small-scale drifts and shifts as compared to two frequently used benchmark cleansing methods.CMD also speeds up the detection of STALAD for very small-scale drift by up to 50 cycles earlier than benchmarks.

Fig. 9
Fig. 9 puts a SVID data cycle before and after cleansing by CMD in contrast.The blue line is before and the orange line is after.The two almost overlap with each other, where the orange line barely shifts left because of three data sample deletions and one imputation as indicated in Fig. 10.Note that all the four changes, either deletion or imputation, are in flat segments and keep the segments flat at the same values respectively.The IIVs of the four changes are all zero.So in Table V, data cycle cleansed by CMD has the same sum of IIVs as that of the uncleansed data cycle while the CM1 and CM2 lead to

Fig. 12
demonstrates how cleansing affects detection sensitivity of small shifts by time-domain STALAD.In the experiment, shift scale falls in [0, 300].A shift occurs randomly at a specific sample and remains unchanged afterwards.Simulation results show that when the shift scale is less than 100, STALAD has about the same sensitivity of AD by taking cleansed input from either method.STALAD with CMD input achieves significant superiority over CM1 and CM2 for shifts in [100, 300].
the same sequence numbers in a data cycle.It in tern facilitates STALAD to learn the data cycle characteristics of "steep changes as step changes."As such, STALAD can differentiate a shift anomaly from "shifts" at step changes more easily than CM1 and CM2.C. False Alarm Rate ReductionWill three cleansing methods affect false alarm rate of timedomain AD by STALAD?We also experiment on 50 testing cycles for estimating false alarm rate.Table VI shows the comparisons, where CMD achieves the lowest false alarm rate among three cleansing algorithms.The reason is that CMD align steps and increases the similarity across SVID training data cycles for normality learning, as described in Section V-B.

TABLE IV CONTROL
VARIABLES OF THE EXPERIMENT

TABLE V IMPACT
ON SUM OF IIVS BY CLEANSING

TABLE VI CLEANSING
IMPACT ON AD FALSE ALARM RATE