TechRxiv
1/1
2 files

Searchlight-scanned Over-sampling for Class Imbalance Problem

preprint
posted on 04.05.2021, 18:12 by Yi SunYi Sun, Lijun Cai, Bo Liao, Wen Zhu, JunLin Xu

In data mining and machine learning, the class imbalance problem occurs when the number of samples in one class (minority) is obviously smaller than the other one (majority), ignoring the learning of minority class. To solve this problem, many over-sampling methods, that fill the minority area with synthetic samples, have been proposed. But no one pure over-sampling method has been specially designed to fit complex distributions (such as noise, class overlapping and disjuncts). To fill this gap, this study firstly proposes one searchlight-scanned over-sampling method (SCOS), which tactfully treats the data filling of minority area as the searchlight scanning of objective area in real life. By respectively regarding the minority and majority area as the objective area and the barrier area of buildings, a series of searchlight structures are computed to firstly pass through corresponding minority area and then be stopped by the majority area. Finally, synthetic samples are generated in those structures. Implement on real-world datasets demonstrates the capability of our method to complex distributions, and the outperforming performance to current state-of-art over-sampling methods.

Funding

the National Natural Science Foundation of China

History

Email Address of Submitting Author

id.yisun@gmail.com

ORCID of Submitting Author

https://orcid.org/0000-0002-5977-855X

Submitting Author's Institution

Hunan University, Changsha

Submitting Author's Country

China