loading page

Sensitive data detection in structured datasets using large language models
  • Atul Anand
Atul Anand

Corresponding Author:[email protected]

Author Profile

Abstract

Detecting personal identifiable information (PII), personal healthcare information(PHI) and similar sensitive data within structured datasets is imperative for various reasons. This has become an important problem for organizations to solve automated detection especially amidst high volumes of data collection with the rise of data protection laws and user privacy importance. Multiple sensitive data detection methods exist in structured and unstructured datasets. Structured data pose a greater challenge for such problems due to lack of external context and general understanding of the data. We describe the importance of the problem's existence. We list out existing solutions in the space with their limitations and direct towards the solution using generative artificial intelligence. We want to focus on how the latest advancements in LLMs can provide a truly global solution to the global problem such as sensitive data detection.
04 Apr 2024Submitted to TechRxiv
08 Apr 2024Published in TechRxiv