Prayatul Matrix for Evaluating Clustering Algorithms: A Direct
Comparison Approach
Abstract
Performance comparison of clustering algorithms are often done in terms
of different confusion matrix based scores obtained on test datasets
when ground truth is available. However, a dataset comprises several
instances having different difficulty levels. Therefore, it is more
logical to compare effectiveness of clustering algorithms on individual
instances instead of comparing scores obtained for the entire dataset.
In this paper, an alternative approach is proposed for direct comparison
of clustering algorithms in terms of individual instances within the
dataset. A direct comparison matrix called
\emph{Prayatul Matrix} is prepared, which accounts for
comparative outcome of two clustering algorithms on different instances
of a dataset. Five different performance measures are designed based on
prayatul matrix. Theoretical analysis shows proposed measures satisfy
five important properties such as scale invariance, data invariance,
permutation invariance, monotonicity and continuity. Efficacy of the
proposed approach as well as designed measures is analyzed empirically
with four clustering algorithms on widely used standard datasets.
Indications of proposed measures are compared with confusion
matrix-based measures as well as other three permutation invariant
measures. Results are evident that the newly designed measures are
capable of giving some important insight about the clustering
algorithms, which were impossible with the existing measures.