TechRxiv
document.pdf (7.29 MB)
Download file

VDPC: Variational Density Peak Clustering Algorithm

Download (7.29 MB)
preprint
posted on 29.12.2021, 07:10 by Yizhang WangYizhang Wang, Di WangDi Wang, You Zhou, Chai Quek, Xiaofeng Zhang
Clustering is an important unsupervised knowledge acquisition method, which divides the unlabeled data into different groups \cite{atilgan2021efficient,d2021automatic}. Different clustering algorithms make different assumptions on the cluster formation, thus, most clustering algorithms are able to well handle at least one particular type of data distribution but may not well handle the other types of distributions. For example, K-means identifies convex clusters well \cite{bai2017fast}, and DBSCAN is able to find clusters with similar densities \cite{DBSCAN}.
Therefore, most clustering methods may not work well on data distribution patterns that are different from the assumptions being made and on a mixture of different distribution patterns. Taking DBSCAN as an example, it is sensitive to the loosely connected points between dense natural clusters as illustrated in Figure~\ref{figconnect}. The density of the connected points shown in Figure~\ref{figconnect} is different from the natural clusters on both ends, however, DBSCAN with fixed global parameter values may wrongly assign these connected points and consider all the data points in Figure~\ref{figconnect} as one big cluster.

History

Email Address of Submitting Author

wyzhang_new@sina.com

ORCID of Submitting Author

0000-0002-0687-7802.

Submitting Author's Institution

Yangzhou University

Submitting Author's Country

China

Usage metrics

Exports