TechRxiv
presq.pdf (571.39 kB)
Download file

Noise Resistant Multidimensional Data Fusion via Quasi-Cliques on Hypergraphs

Download (571.39 kB)
preprint
posted on 17.12.2021, 17:05 authored by Alejandro Alvarez-AyllonAlejandro Alvarez-Ayllon, Manuel Palomo-duarteManuel Palomo-duarte, Juan Manuel Dodero

Published in IEEE Transactions on Emerging Topics in Computing 


Copyright transferred


Cross-matching data stored on separate files is an everyday activity in the scientific domain. However sometimes the relation between attributes may not be obvious. The discovery of foreign keys on relational databases is a similar problem. Thus techniques devised for this problem can be adapted. Nonetheless, given the different nature of the data, which can be subject to uncertainty, this adaptation is not trivial.
This paper firstly introduces the concept of Equally-Distributed Dependencies, which is similar to the Inclusion Dependencies from the relational domain. We describe a correspondence in order to bridge existing ideas. We then propose PresQ: a new algorithm based on the search of maximal quasi-cliques on hyper-graphs to make it more robust to the nature of uncertain numerical data. This algorithm has been tested on three public datasets, showing promising results both in its capacity to find multidimensional equally-distributed sets of attributes and in run-time.

Funding

Spanish AEI through the project CRÊPES

History

Email Address of Submitting Author

alejandro.alvarezayllon@unige.ch

ORCID of Submitting Author

0000-0002-1353-7929

Submitting Author's Institution

Université de Genève

Submitting Author's Country

Switzerland

Usage metrics

Licence

Exports