Abstract
Published in IEEE
Transactions on Emerging Topics in Computing
Copyright transferred
Cross-matching data stored on separate files is an everyday activity in
the scientific domain. However sometimes the relation between attributes
may not be obvious. The discovery of foreign keys on relational
databases is a similar problem. Thus techniques devised for this problem
can be adapted. Nonetheless, given the different nature of the data,
which can be subject to uncertainty, this adaptation is not trivial.
This paper firstly introduces the concept of Equally-Distributed
Dependencies, which is similar to the Inclusion Dependencies from the
relational domain. We describe a correspondence in order to bridge
existing ideas. We then propose PresQ: a new algorithm based on the
search of maximal quasi-cliques on hyper-graphs to make it more robust
to the nature of uncertain numerical data. This algorithm has been
tested on three public datasets, showing promising results both in its
capacity to find multidimensional equally-distributed sets of attributes
and in run-time.