Computer Science ›› 2016, Vol. 43 ›› Issue (7): 245-250.doi: 10.11896/j.issn.1002-137X.2016.07.044

Multi-clusters IB Algorithm for Imbalanced Data Set

JIANG Peng, YE Yang-dong and LOU Zheng-zheng   

  • Online:2018-12-01 Published:2018-12-01

Abstract: When dealing with imbalanced data sets,the original IB method tends to produce clusters of relatively uniform size,resulting in the problem of unsatisfactory clustering effect.To solve this problem,this paper proposesd a multi-clusters information bottleneck (McIB) algorithm.McIB algorithm tries to reduce the skewness of the data distributions by under-sampling method to divide the imbalanced data sets into multiple relatively uniform size clusters.Entire algorithm consists of three steps.First,a dividing measurement standard is proposed to determine the sampling ratio parameter.Second,McIB algorithm preliminary analyses the data to generate reliable multi-clusters.At last,McIB algorithm merges clusters into one bigger size cluster according to the similarity between clusters and organizes multiple clusters representing the actual cluster to obtain the final clustering results.Experimental results show that the McIB algorithm can effectively mine the pattern resided in imbalanced data sets.Compared with other common clustering algorithms,the performance of the McIB algorithm is better.

Key words: Clustering,Information bottleneck method,Imbalanced data,Multi-clusters,Cluster merging

