Prasanta pal - TechRxiv

DFT21: Discrete Fourier Transform in the 21st century

Prasanta Pal

and 4 more

September 07, 2021

Knowingly or unknowingly, digital-data is an integral part of our day-to-day lives. Realistically, there is probably not a single day when we do not encounter some form of digital-data. Typically, data originates from diverse sources in various formats out of which time-series is a special kind of data that captures the information about the time-evolution of a system under observation. How- ever, capturing the temporal-information in the context of data-analysis is a highly non-trivial challenge. Discrete Fourier-Transform is one of the most widely used methods that capture the very essence of time-series data. While this nearly 200-year-old mathematical transform, survived the test of time, however, the nature of real-world data sources violates some of the intrinsic properties presumed to be present to be able to be processed by DFT. Adhoc noise and outliers fundamentally alter the true signature of the frequency domain behavior of the signal of interest and as a result, the frequency-domain representation gets corrupted as well. We demonstrate that the application of traditional digital filters as is, may not often reveal an accurate description of the pristine time-series characteristics of the system under study. In this work, we analyze the issues of DFT with real-world data as well as propose a method to address it by taking advantage of insights from modern data-science techniques and particularly our previous work SOCKS. Our results reveal that a dramatic, never-before-seen improvement is possible by re-imagining DFT in the context of real-world data with appropriate curation protocols. We argue that our proposed transformation DFT21 would revolutionize the digital world in terms of accuracy, reliability, and information retrievability from raw-data.

Aestheticization of Ultrasound Images

Prasanta Pal

and 3 more

August 31, 2021

Ultrasound imaging is one of the most versatile imaging method in order to observe inner workings of human- body. Due to its simplicity, cost-effectiveness, easy availability and portability, a diverse set of applications are influenced by this very popular imaging modality. Despite its popularity as one of the most widely used imaging techniques, it has some serious limitations including lack of image clarity as well as complete absence of any visual aesthetics. Although, commonplace data filters can potentially make ultrasound images smoother looking, however, there is a significant loss of information introduced by the smoothing filters. In this article, we developed a method to enhance the image clarity as well as a protocol for enhancing image aesthetics for ultrasound modality using modern data-curation tool SOCKS. We performed few case studies using various color schemas applied on a publicly available fetal ultrasound image. The outlined technique can be easily generalized to any other kind of ultrasound images. We hypothesize that, our method would not only provide us with enhanced scientific accuracy, visual clarity of ultrasound images but also add additional layers of visual clarity coupled with artistic and aesthetic values. Our method calls for an complete rethinking of how we present ultrasound images

GSFRS: Giant Signal File Random Sampler

Prasanta Pal

and 1 more

July 26, 2021

In the modern world, it is hard to imagine a day without some form of interaction with digital data. Real world data originating from signal generating transducers or communication channels are often recorded as streams of data samples separated by time stamps, sample counters or simply data record delimiter e.g. newline (\n), comma (,) etc. Sampling is the basis of statistical estimation from any data source containing signal records. The process of random sampling has been in practice since time immemorial. However, rapid scale of data generation processes working in tandem with of computing infrastructures , the volume of data is getting quite unmanageably large in nearly every discipline of science. On the other hand, mere volume of data is of no consequence if we can’t extract effective intelligence out of it on an “on demand” basis. Of particular interest is the case where data is stored in a file as a record separated by newline(or any other delimiter) character. When the number of records in the file is greater than a threshold, random sampling is a formidable task. It is nearly impossible to pragmatically load the entire file in the computer memory or even if theoretically possible, the time it takes to load the data in its entirety from natural data sources can be overwhelmingly long or often unnecessary! We can strategically bypass these problems by carefully designing a data interface tool such that any part of a given file can be instantly accessed for random sampling or other kinds of processing tasks by loading only the necessary parts of the data. With this goal, we created a novel, portable and highly efficient rapid data access tool named GSFRS: Giant Signal File Random Sampler, written in modern C++ language to enable near real-time access to any part of an arbitrarily large sized data file that is almost independent of the file size for all practical scenarios. Also, big-data processing would become relatively commonplace and cost effective even in commodity hardwares once the indices are made available through its indexing protocol. This capability would potentially revolutionize the way we gather intelligence from files containing large samples. Adaptation of GSFRS at the source level of various data generators, processing times and energy footprints of various computations can be dramatically reduced.

Statistical Outlier Curation Kernel Software (SOCKS): A Modern, Efficient Outlier Det...

Prasanta Pal

and 4 more

July 28, 2021

Real world signal acquisition through sensors, is at the heart of modern digital revolution. However, almost every signal acquisition systems are contaminated with noise and outliers. Precise detection, and curation of data is an essential step to reveal the true-nature of the uncorrupted observations. With the exploding volumes of digital data sources, there is a critical need for a robust but easy-to-operate, low-latency, generic yet highly customizable, outlier detection and curation tool, easily accessible, adaptable to diverse types of data sources. Existing methods often boil down to data smoothing that inherently cause valuable information loss. We have developed a C++ based, software tool to decontaminate time- series and matrix like data sources, with the goal of recovering the ground-truth. The SOCKS tool would be made available as an open-source software for broader adoption in the scientific community. Our work calls for a philosophical shift in the design pipelines of real- world data processing. We propose, raw data should be decontaminated first, through conditional flagging of outliers, curation of flagged points, followed by iterative, parametrically tuned, asymptotic converge to the ground-truth as accurately as possible, before performing traditional data processing tasks.