ARCHIVE
1/1
AC: A data generator for evaluation of clustering
preprint
posted on 2022-02-10, 05:31 authored by Wenke LiWenke Li, Zhou ZhouClustering has important applications in many fields. However, there are not enough benchmark datasets with rich characteristics for the development and evaluation of clustering algorithms, so the clustering performance cannot be truly evaluated. Neither real data nor manually synthetic data can solve this problem. We propose a new data generator, Artificial Cluster (AC, http://ac.fwgenetics.org), that can thoroughly customize the cluster characteristics that affect clustering, such as sample size, density, overlap and shape. The randomization of the default parameters enables AC to efficiently generate benchmark datasets with different characteristic combinations that can be used to evaluate the robustness of clustering algorithms. We evaluated nine popular clustering algorithms using an example benchmark dataset generated by AC. From the results, the advantages and disadvantages of these algorithms can be clearly seen. AC is expected to provide sufficient data support for clustering research.
Funding
Fundamental Research Funds for the Central Universities (3332020019)
History
Email Address of Submitting Author
wk1lian@126.comORCID of Submitting Author
0000-0003-0984-9468Submitting Author's Institution
Fuwai hospitalSubmitting Author's Country
- China