Computer Science ›› 2020, Vol. 47 ›› Issue (3): 79-86.doi: 10.11896/jsjkx.190400123

• Database & Big Data & Data Science • Previous Articles     Next Articles

Clustering Algorithm by Fast Search and Find of Density Peaks for Complex High-dimensional Data

CHEN Jun-fen,ZHANG Ming,ZHAO Jia-cheng   

  1. (Hebei Key Laboratory of Machine Learning and Computational Intelligence, College of Mathematics and Information Sciences, Hebei University, Baoding, Hebei 071002, China)
  • Received:2019-04-22 Online:2020-03-15 Published:2020-03-30
  • About author:CHEN Jun-fen,born in 1976.Ph.D,associate professor,master supervisor,is member of China Computer Federation.Her main research interests include data mining,machine learning and image processing.
  • Supported by:
    This work was supported by the Natural Science Foundation of Hebei Province, China (F2016201161) and Research Foundation for Advanced Scholars Program of Hebei University, China.

Abstract: Unsupervised clustering in machine learning is widely applied in various object recognition tasks.A novel clustering algorithm based on density peaks (DPC) can find out cluster center points quickly in decision graph and the number of clusters.However,when dealing with the data of complex distribution shape and high-dimensional image data,there are still some problems in DPC algorithm,such as difficult to determine the cluster center points and few clusters.In order to improve its robustness in dealing with complex high-dimensional data,an improved DPC clustering algorithm (AE-MDPC) was presented,which employs an autoencoder,a kind of unsupervised learning method,to obtain the optimal feature representation from input data,and manifold similarity of pairwise data to describe the global consistence.The autoencoder can reduce feature noises via reducing dimension of the high-dimensional image data,whilst manifold distance can lead to the densities of the potential cluster centers become global peaks.AE-MDPC algorithm was compared with K-means,DBSCAN,DPC and DPC combined PCA on four artificial datasets and four real face image datasets.The experimental results demonstrate that AE-MDPC outperforms the other clustering algorithms on clustering accuracy,adjusted mutual information and adjusted rand index,meanwhile AE-MDPC provides better clustering visualization.Overall,the proposed AE-MDPC algorithm can effectively handle complex manifold data and high-dimensional image data.

Key words: Clustering, Density peaks, DPC algorithm, Features representation, Manifold distance

CLC Number: 

  • TP181
[1]QUEEN J M.Some methods for classification and analysis of multivariate observations[C]∥Proc of the fifth Berkeley symposium on mathematical statistics and probability.Oakland:Lucien Marie Le Cam,1967:281-297.
[2]ISMKHAN H.I-k-means-+:An Iterative Clustering Algorithm Based on an Enhanced Version of the k-means[J].Pattern Re-cognition,2018,79:402-413.
[3]JIA R Y,LI Y G.K-means algorithm for self-determination of cluster number and initial center[J].Computer Engineering and Application,2018,54 (7):152-158.
[4]BIRANT D,KUT A.ST-DBSCAN:An algorithm for clustering spatial-temporal data[J].Data & Knowledge Engineering,2007,60(1):208-221.
[5]HOU J,GAO H J,LI X L.DSets-DBSCAN:A Parameter-Free Clustering Algorithm[J].IEEE Transactions on Image Proces-sing,2016,25(7):3182-3193.
[6]TRAN T N,DRAB K,DASZYKOWSKI M.Revised DBSCAN algorithm to cluster data with dense adjacent clusters[J].Chemometrics & Intelligent Laboratory Systems,2013,120(2):92-96.
[7]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[8]XIE J Y,GAO H C,XIE W X.K Nearest Neighbor Optimized Density Peak Fast Search Clustering Algorithms[J].Chinese Science:Information Science,2016,46 (2):258-280.
[9]MEHMOOD R,BIE R,DAWOOD H,et al.Fuzzy Clustering by Fast Search and Find of Density Peaks[J].Personal & Ubiquitous Computing,2016,20(5):785-793.
[10]LI C Y,DING G Y,WANG D K.Clustering by Fast Search and Find of Density Peaks with Data Field[J].Chinese Journal of Electronics,2016,25(3):397-402.
[11]XU J,WANG G Y,DENG W H.DenPEHC:Density Peak based Efficient Hierarchical Clustering[J].Information Sciences,2016,373(12):200-218.
[12]LU Y H,XIA C.Optimal K-Nearest Neighbor and Local Density Clustering Algorithms for Uncertain Data[J].Control and Decision-making,2016,31(3):541-546.
[13]WANG P F,YANG Y W,KE Y Q.Research on Optimization of fast clustering algorithm for peak density[J].Computer Engineering and Science,2018,40(8):1503-1510.
[14]XIE J Y,QU Y N.K-medoids clustering algorithm for initial center of peak density optimization [J].Computer Science and Exploration,2016,10(2):230-247.
[15]WANG S L,WANG D K,LI C Y,et al.Clustering by fast search and find of density peaks with data field[J].Chinese Journal of Electronics,2016,25(3):397-402.
[16]XIE J Y,GAO H C,XIE W X,et al.Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J].Information Sciences,2016,354(C):19-40.
[17]DU M J,DING S F,JIA H J.Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J].Knowledge-Based Systems,2016,99:135-145.
[18]LIU R,WANG H,YU X M.Shared-nearest-neighbor-based Clustering by Fast Search and Find of Density Peaks[J].Information Sciences,2018,450:200-226.
[19]GOTTUMUKKAL R.An improved face recognition technique based on modular PCA approach[J].Pattern Recognit Lett,2004,25(4):429-436.
[20]KE Y,SUKTHANKAR R.PCA-SIFT:a more distinctive representation for local image descriptors[C]∥Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Washington:IEEE,2004.
[21]ZHANG J Q,ZHANG H Y.Fast search clustering algorithm for density peak based on manifold distance[J].ComputerKnow-ledge and Technology,2017,13(2):179-182.
[22]YANG H,FU Y,FAN D.The effect of noise characteristics on the internal validity of clustering[J].Computer Science,2018,45(7):22-30.
[23]TENENBAUM J B,SILVA V D E,LANGFORD J C.A global geometric framework for nonlinear dimensionality reduction[J].Science,2000,290(5500):2319-2323.
[24]RUMELHART D E,HINTON G E,WILLIAMS R J.Learning representations by back-propagating errors[J].Nature,1986,323(6088):533-536.
[25]YOSHUA B,LAMBLIN P,POPOVICI D,et al.Greedy layer- wise training of deep networks[C]∥Advances in Neural Information Processing Systems (NIPS’06).2006:153-160.
[26]VINH N X,EPPS J,BAILEY J.Bibliometrics:Information theo- retic measures for clusterings comparison[C]∥Proc of the International Conference on Machine Learning.New York:ACM,2010,2837-2854.
[27]YANN L C,L′EON B,YOSHUA B,et al.Gradient-based lear- ning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[1] CHAI Hui-min, ZHANG Yong, FANG Min. Aerial Target Grouping Method Based on Feature Similarity Clustering [J]. Computer Science, 2022, 49(9): 70-75.
[2] LU Chen-yang, DENG Su, MA Wu-bin, WU Ya-hui, ZHOU Hao-hao. Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients [J]. Computer Science, 2022, 49(9): 183-193.
[3] YU Shu-hao, ZHOU Hui, YE Chun-yang, WANG Tai-zheng. SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion [J]. Computer Science, 2022, 49(6A): 256-260.
[4] MAO Sen-lin, XIA Zhen, GENG Xin-yu, CHEN Jian-hui, JIANG Hong-xia. FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition [J]. Computer Science, 2022, 49(6A): 285-290.
[5] CHEN Jing-nian. Acceleration of SVM for Multi-class Classification [J]. Computer Science, 2022, 49(6A): 297-300.
[6] Ran WANG, Jiang-tian NIE, Yang ZHANG, Kun ZHU. Clustering-based Demand Response for Intelligent Energy Management in 6G-enabled Smart Grids [J]. Computer Science, 2022, 49(6): 44-54.
[7] CHEN Jia-zhou, ZHAO Yi-bo, XU Yang-hui, MA Ji, JIN Ling-feng, QIN Xu-jia. Small Object Detection in 3D Urban Scenes [J]. Computer Science, 2022, 49(6): 238-244.
[8] XING Yun-bing, LONG Guang-yu, HU Chun-yu, HU Li-sha. Human Activity Recognition Method Based on Class Increment SVM [J]. Computer Science, 2022, 49(5): 78-83.
[9] ZHU Zhe-qing, GENG Hai-jun, QIAN Yu-hua. Line-Segment Clustering Algorithm for Chemical Structure [J]. Computer Science, 2022, 49(5): 113-119.
[10] ZHANG Yu-jiao, HUANG Rui, ZHANG Fu-quan, SUI Dong, ZHANG Hu. Study on Affinity Propagation Clustering Algorithm Based on Bacterial Flora Optimization [J]. Computer Science, 2022, 49(5): 165-169.
[11] ZUO Yuan-lin, GONG Yue-jiao, CHEN Wei-neng. Budget-aware Influence Maximization in Social Networks [J]. Computer Science, 2022, 49(4): 100-109.
[12] YANG Xu-hua, WANG Lei, YE Lei, ZHANG Duan, ZHOU Yan-bo, LONG Hai-xia. Complex Network Community Detection Algorithm Based on Node Similarity and Network Embedding [J]. Computer Science, 2022, 49(3): 121-128.
[13] HAN Jie, CHEN Jun-fen, LI Yan, ZHAN Ze-cong. Self-supervised Deep Clustering Algorithm Based on Self-attention [J]. Computer Science, 2022, 49(3): 134-143.
[14] PU Shi, ZHAO Wei-dong. Community Detection Algorithm for Dynamic Academic Network [J]. Computer Science, 2022, 49(1): 89-94.
[15] ZHANG Ya-di, SUN Yue, LIU Feng, ZHU Er-zhou. Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index [J]. Computer Science, 2022, 49(1): 121-132.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!