loading page

β Equation: Toward clustering the overlap data
  • +3
  • Kadhim Mustafa Raad Kadhim,
  • Tian Ling,
  • Zheng Xu,
  • Zhao Kang,
  • Shi Yinong,
  • Wang Jianbo
Kadhim Mustafa Raad Kadhim

Corresponding Author:[email protected]

Author Profile
Tian Ling
School of Computer Science and Engineering (School of Cyber Security, University of Electronic Science and Technology of China Chengdu

Corresponding Author:

Zheng Xu
School of Computer Science and Engineering (School of Cyber Security, University of Electronic Science and Technology of China Chengdu
Zhao Kang
School of Computer Science and Engineering (School of Cyber Security, University of Electronic Science and Technology of China Chengdu
Shi Yinong
School of Computer Science and Engineering (School of Cyber Security, University of Electronic Science and Technology of China Chengdu
Wang Jianbo
School of Computer Science and Engineering (School of Cyber Security, University of Electronic Science and Technology of China Chengdu

Abstract

The robust unsupervised framework of clustering models is essential in numerous machine learning tasks due to its ability to identify hidden relations between samples, resulting in a comprehensive understanding and interpretation. Many datasets contain samples that naturally cannot link to each other, but the degree of similarity between them makes it almost impossible for clustering models to distinguish the differences; this is called the overlapping issue. However, integrating support vector machines, feature selection, and dimensional reduction techniques with clustering models might still be incapable of providing an optimal solution. As a result, it adversely affects performance and leads to inconsistent partitioning, unreasonable interpretation, and false confidence. This study addresses these issues by proposing a novel unsupervised data separation equation that is based on the concepts of tension and separation gained by finding the cannot-link relations based on cluster centroids. The equation validated in diverse scenarios to demonstrate its ability to improve outcomes. The experimental results prove that the proposed equation assists in reducing the reliance on parameter tuning and constraints, thereby enhancing performance and effectively addressing the challenge of outliers.
30 May 2024Submitted to TechRxiv
07 Jun 2024Published in TechRxiv