A Novel Active Anomaly Discovery Method and its Applications in Additive Manufacturing
Anomaly detection aims to identify the true anomalies from a given set of data instances. Unsupervised anomaly detection algorithms are applied to an unlabeled dataset by producing a ranked list based on anomaly scores. Unfortunately, due to the inherent limitations, many of the top-ranked instances by unsupervised algorithms are not anomalies or not interesting from an application perspective, which leads to high false-positive rates. Active anomaly discovery (AAD) is proposed to overcome this deficiency, which sequentially selects instances to get the labeling information and incorporate it into the anomaly detection algorithm to improve the detection accuracy iteratively. However, labeling is often costly. Therefore, the way to balance detection accuracy and labeling cost is essential. Along this line, this paper proposes a novel AAD method to achieve the goal. Our approach is based on the state-of-the-art unsupervised anomaly detection algorithm, namely, Isolation Forest, to extract features. Thereafter, the sparsity of the extracted features is utilized to improve its anomaly detection performance. To enforce the sparsity of the features and subsequent improvement of the detection analysis, a new algorithm based on online gradient descent, namely, Sparse Approximated Linear Anomaly Discovery (SALAD), is proposed with its theoretical Regret analysis. Extensive experiments on both open-source and additive manufacturing datasets demonstrate that the proposed algorithm significantly outperforms the state-of-the-art algorithms for anomaly detection.