loading page

Prayatul Matrix for Evaluating Clustering Algorithms: A Direct Comparison Approach
  • Anupam Biswas
Anupam Biswas
National Institute of Technology Silchar, National Institute of Technology Silchar, National Institute of Technology

Corresponding Author:[email protected]

Author Profile

Abstract

Performance comparison of clustering algorithms are often done in terms of different confusion matrix based scores obtained on test datasets when ground truth is available. However, a dataset comprises several instances having different difficulty levels. Therefore, it is more logical to compare effectiveness of clustering algorithms on individual instances instead of comparing scores obtained for the entire dataset. In this paper, an alternative approach is proposed for direct comparison of clustering algorithms in terms of individual instances within the dataset. A direct comparison matrix called \emph{Prayatul Matrix} is prepared, which accounts for comparative outcome of two clustering algorithms on different instances of a dataset. Five different performance measures are designed based on prayatul matrix. Theoretical analysis shows proposed measures satisfy five important properties such as scale invariance, data invariance, permutation invariance,  monotonicity and continuity.  Efficacy of the proposed approach as well as  designed measures is analyzed empirically with four clustering algorithms on widely used standard datasets. Indications of proposed measures are compared with confusion matrix-based measures as well as other three permutation invariant measures. Results are evident that the newly designed measures are capable of giving some important insight about the clustering algorithms, which were impossible with the existing measures.