2 files

AC: A data generator for evaluation of clustering

Download all (4.68 MB)
posted on 2022-02-10, 05:31 authored by Wenke LiWenke Li, Zhou Zhou
Clustering has important applications in many fields. However, there are not enough benchmark datasets with rich characteristics for the development and evaluation of clustering algorithms, so the clustering performance cannot be truly evaluated. Neither real data nor manually synthetic data can solve this problem. We propose a new data generator, Artificial Cluster (AC,, that can thoroughly customize the cluster characteristics that affect clustering, such as sample size, density, overlap and shape. The randomization of the default parameters enables AC to efficiently generate benchmark datasets with different characteristic combinations that can be used to evaluate the robustness of clustering algorithms. We evaluated nine popular clustering algorithms using an example benchmark dataset generated by AC. From the results, the advantages and disadvantages of these algorithms can be clearly seen. AC is expected to provide sufficient data support for clustering research.


Fundamental Research Funds for the Central Universities (3332020019)


Email Address of Submitting Author

ORCID of Submitting Author


Submitting Author's Institution

Fuwai hospital

Submitting Author's Country

  • China

Usage metrics