A new global measure to simultaneously evaluate data utility and privacy
AbstractMeasuring data utility and privacy risk embedded in synthetic or other
de-identified datasets is an increasingly important research area.
Existing measures in the data privacy literature however are one-sided
in that they either measure utility or privacy risk only. In this paper
we propose a new measure that can evaluate both data utility and
privacy, a well-known trade-off relationship in data synthesis. The
proposed measure employs the notion of relative distance between the
synthetic and original datasets at the dataset level, and can identify
the optimally balanced position of the synthetic data in terms of both
utility and privacy. In addition, we devise a graphical tool that
visually reveals the current utility-privacy trade-off position of the
synthetic data. Numerical studies show our new measure consistently
performs better and offers richer interpretations than other existing
global data utility measures, for both simulated and real datasets,
confirming its distinctive advantages.