Statistical Outlier Curation Kernel Software (SOCKS): A Modern,
Efficient Outlier Detection and Curation Suite
Abstract
Real world signal acquisition through sensors, is at the heart of modern
digital revolution. However, almost every signal acquisition systems are
contaminated with noise and outliers. Precise detection, and curation of
data is an essential step to reveal the true-nature of the uncorrupted
observations. With the exploding volumes of digital data sources, there
is a critical need for a robust but easy-to-operate, low-latency,
generic yet highly customizable, outlier detection and curation tool,
easily accessible, adaptable to diverse types of data sources. Existing
methods often boil down to data smoothing that inherently cause valuable
information loss. We have developed a C++ based, software tool to
decontaminate time- series and matrix like data sources, with the goal
of recovering the ground-truth. The SOCKS tool would be made available
as an open-source software for broader adoption in the scientific
community. Our work calls for a philosophical shift in the design
pipelines of real- world data processing. We propose, raw data should be
decontaminated first, through conditional flagging of outliers, curation
of flagged points, followed by iterative, parametrically tuned,
asymptotic converge to the ground-truth as accurately as possible,
before performing traditional data processing tasks.