Computer Science ›› 2019, Vol. 46 ›› Issue (4): 22-27.doi: 10.11896/j.issn.1002-137X.2019.04.004

Weighted Oversampling Method Based on Hierarchical Clustering for Unbalanced Data

XIA Ying, LI Liu-jie, ZHANG XU, BAE Hae-young   

  1. School of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China
  • Received:2018-09-23 Online:2019-04-15 Published:2019-04-23

Abstract: Imbalanced data affect the performance of traditional classification algorithms to some extent,leading to a lower recognition rate for minority classes.Oversampling is one of the common methods for processing Imbalanced data-sets.Its main idea is to increase the number of minority class samples so that the number of minority classes and majority classes can be balanced to a certain extent.Existing oversampling methods have problems of synthesis of overlapping samples and overfitting.This paper proposed a weighted oversampling method based on hierarchical clustering for Imbalanced data,named WOHC.It uses hierarchical clustering algorithm to divide the minority class samples into several clusters first,then it calculates the clusters’ density factors to determine the sampling rate of each cluster,and finally determines the sampling weights according to the distance between the minority classes and the boundary of majority classes.In the experiments,WOHC method is adopted for oversampling and C4.5 algorithm is combined to perform the classification experiment on several datasets.Results show that the proposed method can improve the performance of algorithm by 7.6% and 5.8% on F-measure and G-mean respectively,which indicates the effectiveness of the method.

Key words: Hierarchical clustering, Imbalanced data, Overlapping sample, Oversampling

