Computer Science ›› 2019, Vol. 46 ›› Issue (8): 16-22.doi: 10.11896/j.issn.1002-137X.2019.08.003

Study on Clustering Mining of Imbalanced Data Fusion Towards Urban Hotspots

CAI Li 1,2, LI Ying-zi2, JIANG Fang2, LIANG Yu2   

  1. (School of Computer Science,Fudan University,Shanghai 200433,China)1
    (School of Software,Yunnan University,Kunming 650091,China)2
  • Received:2018-11-27 Online:2019-08-15 Published:2019-08-15

Abstract: In the era of big data,multi-source data fusion is a trending topic in the field of data mining.Previous studies have mostly focused on fusion models and algorithms of balanced data sets,but seldom on issues of clustering mining for imbalanced data sets.DBSCAN algorithm is a classical algorithm for mining urban hotspots.However,it cannot deal with imbalanced location data,and the clustering results generated by the minority class are difficult to discovery.Aiming at the imbalanced data fusion,this paper proposed a novel fusion model based on spatio-temporal features,at the same time,proposed a novel approach to solve the mining problem of imbalance data from data aspect and algorithm aspect.Since the evaluation index of current clustering algorithm is not suitable for the evaluation of unbalanced data clustering results,a new comprehensive evaluation index was proposed to reflect the clustering quality.GPS trajectory data (the majority class data) from the traffic field and microblog check-in data (the minority class data) from the social field are fused,and then the proposed method is used to mine hot spots.The mining results of hot spots based on multi-source data fusion are better than those of single source data fusion.The location,distribution and number of hot spots are consistent with the actual situation.The proposed fusion model algorithm and evaluation index method are effective and feasible,and can also be used for the fusion and analysis of location data from other sources

Key words: Imbalanced data, Data fusion, Urban hotspots, Clustering criteria, Location data

  • TP301
