AC: A data generator for evaluation of clustering
preprintposted on 2022-02-10, 05:31 authored by Wenke LiWenke Li, Zhou Zhou
Clustering has important applications in many fields. However, there are not enough benchmark datasets with rich characteristics for the development and evaluation of clustering algorithms, so the clustering performance cannot be truly evaluated. Neither real data nor manually synthetic data can solve this problem. We propose a new data generator, Artificial Cluster (AC, http://ac.fwgenetics.org), that can thoroughly customize the cluster characteristics that affect clustering, such as sample size, density, overlap and shape. The randomization of the default parameters enables AC to efficiently generate benchmark datasets with different characteristic combinations that can be used to evaluate the robustness of clustering algorithms. We evaluated nine popular clustering algorithms using an example benchmark dataset generated by AC. From the results, the advantages and disadvantages of these algorithms can be clearly seen. AC is expected to provide sufficient data support for clustering research.
Fundamental Research Funds for the Central Universities (3332020019)
Email Address of Submitting Authorwk1lian@126.com
ORCID of Submitting Author0000-0003-0984-9468
Submitting Author's InstitutionFuwai hospital
Submitting Author's Country