Computer Science ›› 2018, Vol. 45 ›› Issue (1): 97-102.doi: 10.11896/j.issn.1002-137X.2018.01.015

Previous Articles     Next Articles

Algorithm for Ordering Points to Identify Clustering Structure Based on Spark

QU Yuan, DENG Wei-bin, HU Feng, ZHNG Qi-long and WANG Hong   

  • Online:2018-01-15 Published:2018-11-13

Abstract: Ordering points to identify the clustering Structure (OPTICS) is a hierarchical density-based data clustering algorithm,which can derive the intrinsic clustering structure of the dataset in a visual way,and can extract the basic clustering information by cluster sorting.However,due to its high temporal and spatial complexity,it can not adapt well to the large datasets in modern society.With the development of cloud computing and parallel computing,a method to solve the complexity of OPTICS algorithm was provided.This paper proposed a parallel OPTICS algorithm based on the Spark memory computing platform.The experimental results show that it can greatly reduce the time and space consumption of OPTICS algorithm.

Key words: Bigdata,Spark,OPTICS algorithm,Density based clustering

[1] 韩家炜,坎伯,裴建.数据挖掘:概念与技术(第3版)[M].北京:机械工业出版社,2012:288-291.
[2] KRIEGEL H P,PFEIFLE M.Hierarchical Density-Based Clustering of Uncertain Data[C]∥Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Chicago,Illinois:ACM,2005:689-692.
[3] VISWANATH P,PINKESH R.l-DBSCAN:A Fast HybridDensity Based Clustering Method[C]∥International Conference on Pattern Recognition.HongKong:IEEE,2006:912-915.
[4] HOU J,GAO H,LI X.DSets-DBSCAN:A Parameter-FreeClustering Algorithm[J].IEEE Transactions on Image Proces-sing,2016,25(7):3182-3193.
[5] AMINI A,SABOOHI H,WAH T Y.A Multi Density-BasedClustering Algorithm for Data Stream with Noise[C]∥IEEE,International Conference on Data Mining Workshops.Dallas,Texas:IEEE,2013:1105-1112.
[6] BIRANT D,KUT A.ST-DBSCAN:An algorithm for clustering spatial-temporal data[J].Data & Knowledge Engineering,2007,60(1):208-221.
[7] ANKERST M,BREUNIG M M,KRIEGEL H P,et al.OP-TICS:ordering points to identify the clustering structure[J].Acm Sigmod Record,1999,28(2):49-60.
[8] PATWARY M A,PALSETIA D,AGRAWAL A,et al.Scalable parallel OPTICS data clustering using graph algorithmic techniques[C]∥International Conference for High Performance Computing,Networking,Storage and Analysis.New York:ACM,2013.
[9] GOYAL P,KUMARI S,KUMAR D,et al.Parallelizing OP-TICS for multicore systems[C]∥ACM COMPUTE.New York:ACM,2014:1-6.
[10] DENG Z,HU Y,ZHU M,et al.A scalable and fast OPTICS for clustering trajectory big data[J].Cluster Computing,2015,18(2):549-562.
[11] PITALSKY V,GRENDR M.OPTICS-Based Clustering ofEmails Represented by Quantitative Profiles[M]∥Distributed Computing and Artificial Intelligence.Berlin:Springer,2013:53-60.
[12] KALITA H K,BHATTACHARYA D K,KAR A.A New Algorithm for Ordering of Points to Identify Clustering Structure Based on Perimeter of Triangle:OPTICS(BOPT)[C]∥International Conference on Advanced Computing and Communications.Guwahati,Assam:IEEE Computer Society,2007:523-528.
[13] ANKERST M.OPTICS:ordering points to identify the clustering structure[J].Acm Sigmod Record,1999,28(2):49-60.
[14] ZAHARIA M,CHOWDHURY M,FRANKLIN M J,et al.Spark:cluster computing with working sets[C]∥Usenix Conference on Hot Topics in Cloud Computing.Boston:USENIX Association,2010:1765-1773.
[15] BHARILL N,TIWARI A,MALVIYA A.Fuzzy Based Scalable Clustering Algorithms for Handling Big Data using Apache Spark[J].IEEE Transactions on Big Datam,2016,4(2):339-352.
[16] LI J,LI D,ZHANG Y.Efficient Distributed Data Clustering on Spark[C]∥IEEE International Conference on CLUSTER Computing.Chicago:IEEE Computer Society,2015:504-505.
[17] SINHA A,JANA P K.A novel K-means based clustering algorithm for big data[C]∥International Conference on Advances in Computing,Communications and Informatics.Jaipur:IEEE,2016:1875-1879.
[18] JIN C,LIU R,HENDRIX W,et al.A Scalable HierarchicalClustering Algorithm Using Spark[C]∥IEEE First Internatio-nal Conference on Big Data Computing Service and Applications.San Francisco Bay:IEEE,2015:418-426.
[19] SARAZIN T,AZZAG H,LEBBAH M.SOM Clustering Using Spark-MapReduce[C]∥IEEE International Parallel & Distributed Processing Symposium Workshops.Phoenix:IEEE Compu-ter Society,2014:1727-1734.
[20] CONRAD J G,AL-KOFAHI K,ZHAO Y,et al.Effective document clustering for large heterogeneous law firm collections[C]∥Proceedings of the 10th international conference on Artificial intelligence and law.Bologna:ACM,2005:177-187.
[21] UCI Machine Learning Repository [DB/OL].http://archive.ics.uci.edu/ml.
[22] Wikipedia Weka (machine learning) [CP/OL].http://en.wikipedia.org/wiki/Weka,2010.
[23] xj1986.MR-DBSCAN [EB/OL].[2013-5-15].https://github.com/xj1986/MR-DBSCAN.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!