Computer Science ›› 2019, Vol. 46 ›› Issue (8): 16-22.doi: 10.11896/j.issn.1002-137X.2019.08.003

• Big Data & Data Science • Previous Articles     Next Articles

Study on Clustering Mining of Imbalanced Data Fusion Towards Urban Hotspots

CAI Li 1,2, LI Ying-zi2, JIANG Fang2, LIANG Yu2   

  1. (School of Computer Science,Fudan University,Shanghai 200433,China)1
    (School of Software,Yunnan University,Kunming 650091,China)2
  • Received:2018-11-27 Online:2019-08-15 Published:2019-08-15

Abstract: In the era of big data,multi-source data fusion is a trending topic in the field of data mining.Previous studies have mostly focused on fusion models and algorithms of balanced data sets,but seldom on issues of clustering mining for imbalanced data sets.DBSCAN algorithm is a classical algorithm for mining urban hotspots.However,it cannot deal with imbalanced location data,and the clustering results generated by the minority class are difficult to discovery.Aiming at the imbalanced data fusion,this paper proposed a novel fusion model based on spatio-temporal features,at the same time,proposed a novel approach to solve the mining problem of imbalance data from data aspect and algorithm aspect.Since the evaluation index of current clustering algorithm is not suitable for the evaluation of unbalanced data clustering results,a new comprehensive evaluation index was proposed to reflect the clustering quality.GPS trajectory data (the majority class data) from the traffic field and microblog check-in data (the minority class data) from the social field are fused,and then the proposed method is used to mine hot spots.The mining results of hot spots based on multi-source data fusion are better than those of single source data fusion.The location,distribution and number of hot spots are consistent with the actual situation.The proposed fusion model algorithm and evaluation index method are effective and feasible,and can also be used for the fusion and analysis of location data from other sources

Key words: Clustering criteria, Data fusion, Imbalanced data, Location data, Urban hotspots

CLC Number: 

  • TP301
[1]YUAN J,ZHENG Y,XIE X.Discovering Regions of Different Functions in a City Using Human Mobility and POIs[C]∥Proceedings of the 18th ACM International Conference on Knowledge Discovery and Data Mining.New York:ACM,2012:186-194.
[2]CHEN Y,YUAN P,QIU M,et al.An Indoor Trajectory Frequent Pattern Mining Algorithm Based on Vague Grid Sequence[J].Expert Systems With Applications,2019,118:614-624.
[3]ZHENG Y.Methodologies for Cross-Domain Data Fusion:An Overview[J].IEEE Transactions on Big Data,2015,1(1):16-34.
[4]DING Z Y,JIA Y,ZHOU B.Research Summary of Wei bo Data Mining[J].Journal of Computer Research and Development,2014,51(4):691-706.(in Chinese) 丁兆云,贾焰,周斌.微博数据挖掘研究综述[J].计算机研究与发展,2014,51(4):691-706.
[5]LEE J,SHIN I,PARK G,et al.Analysis of the Passenger Pick-up Pattern for Taxi Location Recommendation[C]∥2008 Fourth International Conference on Networked Computing and Advanced Information Management.New York:IEEE,2008,1:199-204.
[6]KISILEVICH S,MANSMANN F,KEIM D.P-DBSCAN:A Density Based Clustering Algorithm for Exploration and Analysis of Attractive Areas Using Collections of Geo-tagged photos[C]∥Proceedings of the First International Conference and Exhibition on Computing for Geospatial Research & Application.New York:ACM,2010:38-41.
[7]VERMA N,BALIYAN N.PAM Clustering Based Taxi Hotspot Detection for Informed Driving[C]∥2017 8th International Conference on Computing,Communication and Networking Technologies (ICCCNT).New York:IEEE,2017:1-7.
[8]NING P F,WANG Y,SHEN Y R,et al.Identification of Urban Interest Function Region by Using Social Medida Check-in Data[J].Journal of Geomatics,2018,43(2):110-114.(in Chinese) 宁鹏飞,万幼,沈怡然,等.基于签到数据的城市热点功能区识别研究[J].测绘地理信息,2018,43(2):110-114.
[9]ORRIOLS-PUIG A,BERNADO-MANSILLA E,GOLDBERG D E,et al.Facetwise Analysis of XCS for Problems With Class Imbalances[J].IEEE Transactions on Evolutionary Computation,2009,13(5):1093-1119.
[10]KRAWCZYK B,MCINNES B T.Local ensemble learning from imbalanced and noisy data for word sense disambiguation[J].Pattern Recognition,2017,78:103-119.
[11]SEBASTIÁN M,JULIO L.Dealing with High-dimensional Class-imbalanced Data sets:Embedded Feature Selection for SVM Classification[J].Applied Soft Computing,2018,67:94-105.
[12]ZHAI Y,YANG B R,QU W.Survey of Mining Imbalanced Datasets[J].Computer Science,2010,37(10):27-32.(in Chinese) 翟云,杨炳儒,曲武.不平衡类数据挖掘研究综述[J].计算机科学,2010,37(10):27-32.
[13]ZHU Y J,WANG Z,ZHA H Y,et al.Boundary-Eliminated Pseudo Inverse Linear Discriminant for Imbalanced Problems[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(6):2581-2594.
[14]LI X,CHENG Z G,Fan Y,et al.Exploring of Clustering Algorithm on Class-imbalanced Data[C]∥2013 8th International Conference on Computer Science & Education.New York:IEEE,2013:89-93.
[15]PAN Q,WANG Z F,LIANG Y,et al.Basic Methods and Progress of Information Fusion[J].Control Theory & Applications,2012,29(10):1234-1244.(in Chinese) 潘泉,王增福,梁彦,等.信息融合理论的基本方法与进展[J].控制理论与应用,2012,29(10):1234-1244.
[16]HALL D L,LLINAS J.Handbook of Multi-sensor Data fusion[M].New York:CRC Press,2001.
[17]BRODINOVÁ Š,ZAHARIEVA M,FILZMOSER P,et al.Clustering of Imbalanced High-dimensional Media data [J].Advances in Data Analysis and Classification,2018,12(2):261-284.
[18]GUO H X,LI Y J,JENNIFER S,et al.Learning from Class-imbalanced Data:Review of Methods and Applications[J].Expert Systems with Applications,2017,73:720-739.
[19]LI K,ZHANG W,LU Q,et al.An Improved SMOTE Imba- lanced Data Classification Method Based on Support Degree[C]∥2014 International Conference on Identification,Information and Knowledge in the Internet of Things.New York:IEEE,2014:34-38.
[20]DENG X,ZHONG W,REN J,et al.An Imbalanced Data Classification Method Based on Automatic Clustering Under-sampling[C]∥Proceedings of IEEE Conference on Performance Computing and Communications.New York:IEEE Press,2016:1-8.
[21]XIE J Y,ZHOU Y,WANG M Z,et al.New Criteria for Evaluating the Validity of Clustering[J].CAAI Transactions on Intelligent Systems,2017,12(6):873-882.(in Chinese) 谢娟英,周颖,王明钊,等.聚类有效性评价新指标[J].智能系统学报,2017,12(6):873-882.
[1] CHEN Ming-xin, ZHANG Jun-bo, LI Tian-rui. Survey on Attacks and Defenses in Federated Learning [J]. Computer Science, 2022, 49(7): 310-323.
[2] LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing. Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning [J]. Computer Science, 2022, 49(6A): 144-149.
[3] YANG Fei-fei, SHEN Si-yu, SHEN De-rong, NIE Tie-zheng, KOU Yue. Method on Multi-granularity Data Provenance for Data Fusion [J]. Computer Science, 2022, 49(5): 120-128.
[4] DONG Qi-da, WANG Zhe, WU Song-yang. Feature Fusion Framework Combining Attention Mechanism and Geometric Information [J]. Computer Science, 2022, 49(5): 129-134.
[5] JIANG Hao-chen, WEI Zi-qi, LIU Lin, CHEN Jun. Imbalanced Data Classification:A Survey and Experiments in Medical Domain [J]. Computer Science, 2022, 49(1): 80-88.
[6] ZHOU Xin-min, HU Yi-gui, LIU Wen-jie, SUN Rong-jun. Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method [J]. Computer Science, 2021, 48(9): 50-58.
[7] WU Cheng-feng, CAI Li, LI Jin, LIANG Yu. Frequent Pattern Mining of Residents’ Travel Based on Multi-source Location Data [J]. Computer Science, 2021, 48(7): 155-163.
[8] CHEN Jing-jie, WANG Kun. Interval Prediction Method for Imbalanced Fuel Consumption Data [J]. Computer Science, 2021, 48(7): 178-183.
[9] ZHANG Ren-zhi, ZHU Yan. Malicious User Detection Method for Social Network Based on Active Learning [J]. Computer Science, 2021, 48(6): 332-337.
[10] LU Shu-xia, ZHANG Zhen-lian. Imbalanced Data Classification of AdaBoostv Algorithm Based on Optimum Margin [J]. Computer Science, 2021, 48(11): 184-191.
[11] ZHANG Jun, WANG Yang, LI Kun-hao, LI Chang, ZHAO Chuan-xin. Multi-source Sensor Body Area Network Data Fusion Model Based on Manifold Learning [J]. Computer Science, 2020, 47(8): 323-328.
[12] CUI Wei, JIA Xiao-lin, FAN Shuai-shuai and ZHU Xiao-yan. New Associative Classification Algorithm for Imbalanced Data [J]. Computer Science, 2020, 47(6A): 488-493.
[13] MA Hong. Fusion Localization Algorithm of Visual Aided BDS Mobile Robot Based on 5G [J]. Computer Science, 2020, 47(6A): 631-633.
[14] SONG Ling-ling, WANG Shi-hui, YANG Chao, SHENG Xiao. Application Research of Improved XGBoost in Imbalanced Data Processing [J]. Computer Science, 2020, 47(6): 98-103.
[15] HUANG Ting-ting, FENG Feng. Study on Optimization of Heterogeneous Data Fusion Model in Wireless Sensor Network [J]. Computer Science, 2020, 47(11A): 339-344.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!