Decision Boundary Computation-based Over-sampling for Imbalance Learning
preprintposted on 08.05.2021, 05:27 by Yi SunYi Sun, Lijun Cai, JunLin Xu
Imbalanced problem, one significant challenge in data mining, occurs when the number of samples in one class (minority) is obviously smaller than the other one (majority). Over-sampling methods that generating new synthetic samples for the minority class have been proven to be effective. But rare over-sampling methods focus on the decision boundary between classes and none of them are proposed to directly compute the certain area of decision boundary for imbalanced problem. Thus, one novel method named Decision Boundary Computation-based Oversampling is proposed to fill this gap. The novel method employs the intuitive observation, that both boundary samples and their surrounding areas corporately constitute the decision boundary, to compute the partition belonging to the minority class by subtracting the partition of majority class from their corporate one. Which greatly enhancing the full use of boundary information brought by both boundary individuals and their near areas, and implicitly complement the nature information insufficiency of minority class at the same time. Finally, new synthetic samples are generated in the partition of decision boundary of minority class. Extensive experiments indicate the good performance of proposed method when compared with other state-of-art methods.