Searchlight-scanned Over-sampling for Class Imbalance Problem
In data mining and machine learning, the class imbalance problem occurs when the number of samples in one class (minority) is obviously smaller than the other one (majority), ignoring the learning of minority class. To solve this problem, many over-sampling methods, that fill the minority area with synthetic samples, have been proposed. But no one pure over-sampling method has been specially designed to fit complex distributions (such as noise, class overlapping and disjuncts). To fill this gap, this study firstly proposes one searchlight-scanned over-sampling method (SCOS), which tactfully treats the data filling of minority area as the searchlight scanning of objective area in real life. By respectively regarding the minority and majority area as the objective area and the barrier area of buildings, a series of searchlight structures are computed to firstly pass through corresponding minority area and then be stopped by the majority area. Finally, synthetic samples are generated in those structures. Implement on real-world datasets demonstrates the capability of our method to complex distributions, and the outperforming performance to current state-of-art over-sampling methods.