TechRxiv
1/1
2 files

AC: A data generator for evaluation of clustering

Download all (4.68 MB)
preprint
posted on 2022-02-10, 05:31 authored by Wenke LiWenke Li, Zhou Zhou
Clustering has important applications in many fields. However, there are not enough benchmark datasets with rich characteristics for the development and evaluation of clustering algorithms, so the clustering performance cannot be truly evaluated. Neither real data nor manually synthetic data can solve this problem. We propose a new data generator, Artificial Cluster (AC, http://ac.fwgenetics.org), that can thoroughly customize the cluster characteristics that affect clustering, such as sample size, density, overlap and shape. The randomization of the default parameters enables AC to efficiently generate benchmark datasets with different characteristic combinations that can be used to evaluate the robustness of clustering algorithms. We evaluated nine popular clustering algorithms using an example benchmark dataset generated by AC. From the results, the advantages and disadvantages of these algorithms can be clearly seen. AC is expected to provide sufficient data support for clustering research.

Funding

Fundamental Research Funds for the Central Universities (3332020019)

History

Email Address of Submitting Author

wk1lian@126.com

ORCID of Submitting Author

0000-0003-0984-9468

Submitting Author's Institution

Fuwai hospital

Submitting Author's Country

  • China

Usage metrics

    Exports