Computer Science ›› 2019, Vol. 46 ›› Issue (11A): 204-207.

• Data Science • Previous Articles     Next Articles

Cell Clustering Algorithm Based on MapReduce and Strongly Connected Fusion

HU Ying-shuang, LU Yi-hong   

  1. (College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310014,China)
  • Online:2019-11-10 Published:2019-11-20

Abstract: With the explosive growth of large location data,most of the traditional serial clustering algorithms can not process big data efficiently.In order to solve this problem,more and more people are studying parallel clustering algorithm.It is difficult to guarantee the clustering quality of parallel clustering algorithm,so it is important to study the algorithm of reducing the result of parallel clustering.Therefore,a grid clustering algorithm based on strongly connected fusion was proposed.Firstly,clustering result of data subsets is obtained according to the improved DBSCAN algorithm based on MapReduce.Next,the relationship between grid and cluster is analyzed and the concepts of Gird-cluster,connectivity and strong connectivity of Gird-clusters are defined.Then the connectivity weights matrix between Gird-cluster and Gird-cluster is calculated.Finally,whether to reduce two Gird-clusters or not is decided according to connectivity weight.The experimental results show that the proposed algorithm has high efficiency and high clustering quality in processing large location data.

Key words: Big data of position, DBSCAN, Gird, MapReduce, Strongly connected fusion

CLC Number: 

  • TP274
[1]刘经南,方媛,郭迟,等.位置大数据的分析处理研究进展[J].武汉大学学报(信息科学版),2014,39(4):379-385.
[2]YUAN J,ZHENG Y,XIE X,et al.T-Drive:Enhancing driving directions with taxi drivers’ intelligence[J].IEEE Trans.on Knowledge & Data Engineering,2013,25(1):220-232.
[3]ZHENG Y,XIE X,MA W Y.GeoLife:A collaborative social networking service among user,location and trajectory[J].Bulletin of the TechnicalCommittee on Data Engineering,2010,33(2):32-39.
[4]YUAN J,ZHENG Y,XIE X.Discovering regions of differentfunctions in a city using humanmobility and POIs[C]∥Know-ledge Discovery and Data Mining.ACM Press,2012:186-194.
[5]郭迟,刘经南,方媛,等.位置大数据的价值提取与协同挖掘方法[J].软件学报,2014,25(4):713-730.
[6]林乐轩.基于位置大数据的行人路径预测及人群密度预估系统研究[D].北京:北京邮电大学,2018.
[7]TOBLER W,DEICHMANN U,GOTTSEGEN J,et al.World population in a grid of spherical quadrilaterals[J].International Journal of Population Geography,1997,3(3):203-225.
[8]李斯凡.基于无监督学习技术的位置大数据分析[D].杭州:浙江理工大学,2017.
[9]GUTTMAN A.R-trees:A dynamic index structure for spatial searching[C]∥International Conference on Management of Data.Boston:1984:47-57.
[10]ZHAO Q,SHI Y,LIU Q,et al.A grid-growing clustering algorithm for geospatial Data[J].Pattern Recognition Letters,2014,53(53):77-84.
[11]KUMAR K M,REDDY A R M.A fast DBSC-AN clustering algorithm by accelerating neighbor searching using Groups me-thod[J].Pattern Recognition,2016,58:39-48.
[12]HE Y,TAN H,LUO W,et al.MR-DBSCAN:An efficient parallel density-based clustering algorithm using MapReduce[C]∥2011 IEEE 17th International Conference on Parallel and Distributed Systems.IEEE Computer Society,2011:473-480.
[13]KIM Y,SHIM K,KIM M S,et al.DBCURE-MR:An efficient density-based clustering algorithm for large data using MapReduce[J].Information Systems,2014,42(2):15-35.
[14]于彦伟,贾召飞,曹磊,等.面向位置大数据的快速密度聚类算法[J].软件学报,2018,29(8):2470-2484.
[15]ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]∥International Conference on Knowledge Discovery & Data Mining.Portland:AAAI Press,1996:226-231.
[16]黄德才.数据仓库与数据挖掘教程[M].北京:清华大学出版社,2016.
[17]钱潮恺,黄德才.基于维度频率相异度和强连通融合的混合数据聚类算法[J].模式识别与人工智能,2016,29(1):82-89.
[18]余长俊,张燃.云环境下基于Canopy聚类的FCM算法研究[J].计算机科学,2014,41(S1):316-319.
[19]GIONIS A,MANNILA H,TSAPARAS P.Clustering aggregation[J].ACM Transactions on Knowledge Discovery from Data (TKDD),2007,1(1):1-30.
[20]ZAHN C T.Graph-theoretical methods for detecting and de-scribing gestalt clusters[J].IEEE Transactions on Computers,1971,100(1):68-86.
[21]YUAN J,ZHENG Y,XIE X,et al.T-Drive:Enhancing driving directions with taxi drivers’ intelligence[J].IEEE Transactions on Knowledge & Data Engineering,2013,25(1):220-232.
[1] LIU Wei-ming, AN Ran, MAO Yi-min. Parallel Support Vector Machine Algorithm Based on Clustering and WOA [J]. Computer Science, 2022, 49(7): 64-72.
[2] ZHANG Ren-jie, CHEN Wei, HANG Meng-xin, WU Li-fa. Detection of Abnormal Flow of Imbalanced Samples Based on Variational Autoencoder [J]. Computer Science, 2021, 48(7): 62-69.
[3] ZHANG Yuan-ming, YU Jia-rui, JIANG Jian-bo, LU Jia-wei, XIAO Gang. Intermediate Data Transmission Pipeline Optimization Mechanism for MapReduce Framework [J]. Computer Science, 2021, 48(2): 41-46.
[4] LUO Jin-nan and ZHANG Ji-min. Rail Area Extraction Using Extended Haar-like Features and DBSCAN Clustering [J]. Computer Science, 2020, 47(6A): 153-156.
[5] DENG Ding-sheng. Application of Improved DBSCAN Algorithm on Spark Platform [J]. Computer Science, 2020, 47(11A): 425-429.
[6] WANG Tong, MA Wen-ping, LUO Wei. Information Sharing and Secure Multi-party Computing Model Based on Blockchain [J]. Computer Science, 2019, 46(9): 162-168.
[7] ZHANG Jian-xin, LIU Hong, LI Yan. Efficient Grouping Method for Crowd Evacuation [J]. Computer Science, 2019, 46(6): 231-238.
[8] WANG Xiao-xia, SUN De-cai. Q-sample-based Local Similarity Join Parallel Algorithm [J]. Computer Science, 2019, 46(12): 38-44.
[9] WU Jian-wei, LI Yan-ling, ZHANG Hui, ZANG Han-lin. HMM Cooperative Spectrum Prediction Algorithm Based on Density Clustering [J]. Computer Science, 2018, 45(9): 129-134.
[10] QI Yu-dong,HE Cheng,SI Wei-chao. Cloud Resource Selection Algorithm by Skyline under MapReduce Frame [J]. Computer Science, 2018, 45(6A): 411-414.
[11] ZHANG Bin, LE Jia-jin. Hash Join in MapReduce Distributed Environment Based on Column-store [J]. Computer Science, 2018, 45(6A): 471-475.
[12] ZHOU Hua-ping, LIU Guang-zong and ZHANG Bei-bei. Load Balancing Strategy of MapReduce Clustering Based on Index Shift [J]. Computer Science, 2018, 45(5): 303-309.
[13] WANG Hua-jin, LI Jian-hui, SHEN Zhi-hong and ZHOU Yuan-chun. ORC Metadata Based Reducer Load Balancing Method for Hive Join Queries [J]. Computer Science, 2018, 45(3): 158-164.
[14] MIAO Feng-yu, WANG Hong-zhi, RUAN Qun-sheng. Method of Similarity Join on Uncertain Graphs Using MapReduce [J]. Computer Science, 2018, 45(12): 299-307.
[15] YING Yi, REN Kai, LIU Ya-jun. Network Log Analysis Technology Based on Big Data [J]. Computer Science, 2018, 45(11A): 353-355.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!