TechRxiv
DML_techarxiv.pdf (2.62 MB)
Download file

Deep Multi-Representation Learning for Data Clustering

Download (2.62 MB)
preprint
posted on 19.03.2022, 10:01 authored by Mohammadreza SadeghiMohammadreza Sadeghi, Narges Armanfard
Deep clustering incorporates embedding into clustering in order to find a lower-dimensional space suitable for clustering task. Conventional deep clustering methods aim to obtain a single global embedding subspace (aka latent space) for all the data clusters. In contrast, in this paper, we propose a deep multi-representation learning (DML) framework for data clustering whereby each difficult to cluster data group is associated with its own distinct optimized latent space, and all the easy to cluster data groups are associated to a general common latent space. Autoencoders are employed for generating the cluster-specific and general latent spaces. To specialize each autoencoder in its associated data cluster(s), we propose a novel and effective loss function which consists of weighted reconstruction and clustering losses of the data points, where higher weights are assigned to the samples more probable to belong to the corresponding cluster(s). Experimental results on benchmark datasets demonstrate that the proposed DML framework and loss function outperform state-of-the-art clustering approaches. In addition, the results show that the DML method significantly outperforms the SOTA on imbalanced datasets as a result of assigning an individual latent space to the difficult clusters.

Funding

Natural Sciences and Engineering Research Council of Canada (NSERC)

History

Email Address of Submitting Author

Mohammadreza.sadeghi@mcgill.ca

Submitting Author's Institution

McGill University

Submitting Author's Country

Canada

Usage metrics

Licence

Exports