loading page

Data Cleansing with Minimum Distortion for ML-Based Equipment Anomaly Detection
  • +2
  • Yun-Cheng Hsieh ,
  • Chieh-Yu Chen ,
  • Da-Yin Liao ,
  • Chung-Kuang Lin ,
  • Shi-Chung Chang
Yun-Cheng Hsieh
Author Profile
Chieh-Yu Chen
Author Profile
Da-Yin Liao
Author Profile
Chung-Kuang Lin
Author Profile
Shi-Chung Chang
National Taiwan University

Corresponding Author:[email protected]

Author Profile


Semiconductor manufacturing has been extensively exploiting machine-learning (ML) to process equipment sensory data (ESD) for near-real time anomaly detection (AD). ESD characteristics are highly diversified and data lengths vary among processing steps and cycles. Cleansing ESD with minimum distortion (CMD) to fit the fixed-length input requirement by ML-based AD is critical to AD effectiveness and is challenging. This paper presents a novel CMD method of four innovations: i) statistical mode-based equalization of step data lengths for the least number of step data length changes, ii) importance indicator value (IIV) of a data sample based on its relative difference with the subsequent sample, and iii) step data segmentation into groups based on samples of significant IIVs and  the least-entropy-group-to-cleanse-first rule, and iv) cleansing the least IIV sample(s) in the selected group for step data length equalization. CMD application to ESD demonstrates its characteristics preservation property. Simulation experiments are on an integration of data cleansing with an unsupervised ML-based AD system, STALAD. Comparisons with two benchmark methods over AD scenarios of small-scale drifts and shifts show that CMD not only is superior in facilitating accurate detection by STALAD but also helps detect anomaly much earlier than using the two benchmarks.
2023Published in IEEE Transactions on Semiconductor Manufacturing on pages 1-1. 10.1109/TSM.2023.3262957