Computer Science ›› 2021, Vol. 48 ›› Issue (3): 151-157.doi: 10.11896/jsjkx.200100112

• Database & Big Data & Data Science • Previous Articles     Next Articles

Density Peaks Clustering Algorithm Based on Natural Nearest Neighbor

TANG Xin-yao, ZHANG Zheng-jun, CHU Jie, YAN Tao   

  1. School of Science,Nanjing University of Science and Technology,Nanjing 210094,China
  • Received:2020-01-07 Revised:2020-06-09 Online:2021-03-15 Published:2021-03-05
  • About author:TANG Xin-yao,born in 1996,postgra-duate.His main research interests include data mining and mechine lear-ning.
    ZHANG Zheng-jun,born in 1965,Ph.D,associate professor.His main research interests include data mining and gra-phic image.
  • Supported by:
    National Natural Science Foundation of China(11671205).

Abstract: Aiming at the problem that the density peak clustering (DPC) algorithm requires manually selected parameters (cutoff distance dc),as well as the problem of a poor performance on complex data sets caused by the simple definition of local density and the one-step assignment strategy,a new density peak clustering algorithm based on natural nearest neighbors (NNN-DPC) is proposed.The algorithm does not need to specify any parameters and is a non-parametric clustering method.Based on the definition of natural nearest neighbors,this algorithm firstly gives a new local density calculation formula to describe the distribution of data,and reveals the internal connection.A two-step assignment strategy is designed to divide the sample points.Finally,the similarity between clusters is defined,and a new cluster merging rule is proposed to merge the clusters to obtain the final clustering result.The experimental results show that without parameters,the NNN-DPC algorithm has excellent generalization ability on various types of data sets,and can more accurately identify the number and distribution of clusters on manifold data or data with large differences of density between clusters,and assign sample points to the corresponding clusters.Compared with the perfor-mance indicators of DPC,FKNN-DPC,and three other classic clustering algorithms,the NNN-DPC algorithm has a great advantage.

Key words: Clustering algorithm, Density peaks, Local density, Natural nearest neighbor

CLC Number: 

  • TP301.6
[1]MACQUEEN J.Some Methods for Classificati-on and Analysis of MultiVariate Observations[C]//Proc of Berkeley Sympo-sium on Mathematical Statistics & Probability.1967,1(14):281-297.
[2]KAUFMAN L,ROUSSEEUW P.Clustering by means of medoids[C]//Statistical Data Analysis Based on the L1-norm & Related Methods.North-Holland,1987:405-416.
[3]FREY B J,DUECK D.Clustering by Passing Messages Between Data Points [J].Science,2007,315(5814):972-976.
[4]ESTER M,KRIEGEL H P,SANDER J,et al.A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[C]//Internation-al Conference on Knowledge Discovery & Data Mining.1996:226-231.
[5]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks [J].Science,2014,344(6191):1492-1496.
[6]YANG Z,WANG H J,ZHOU Y.A clusteri-ng algorithm based on cutoff distance and adaptive cluster center [J].Data Analysis and Knowledge Discovery,2018,2(3):39-48.
[7]WANG Y,ZHANG G Z.Density peak algorithm by automatically determining cluster centers [J].Computer Engineering and Applications,2018,54(8):137-142.
[8]LIU R,WANG H,YU X.Shared-nearest-neighbor-based clustering by fast search and find of density peaks[J].Information Sciences,2018,450:200-226.
[9]DU M J,DING S F,JIA H J.Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J].Knowledge-Based Systems,2016,99:135-145.
[10]XIE J Y,GAO H C,XIE W X,et al.Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J].Information Sciences,2016,345:19-40.
[11]YANG L J,ZHU Q S,HUANG J L,et al.Adap-tive edited na-tural neighbor algorithm[J].Neurocomputing,2017,230:427-433.
[12]HUANG J L,ZHU Q S,YANG L J,et al.QCC:a novel clustering algorithm based on quasicluster centers[J].Machine Lear-ning,2017,106(3):337-357.
[13]ZHU Q S,FENG J,HUANG J L.Natural neighbor:a self adaptive neighborhood method without parameter K[J].Pattern Recognition Letters,2016,80:30-36.
[14]STEVENS S S.Mathematics,measurement,and psychophysics[C]//Handbook of Experimental Psychology.London:Wiley,1951:1-49.
[15]ZOU X L.Application of Natural Nearest Neighbor in High-dimensional Data Structure Learning[D].Chongqing:Chongqing University,2011.
[16]HUANG J L.Research on Parameterless Clustering Algorithm Based on Natural Nearest Neighbor[D].Chongqing:Chongqing University,2014.
[17]VINH N X,EPPS J,BAILEY J.Information theoretic measures for clusterings comparison:Variants,properties,normalization and correction for chance[J].Journal of Machine Learning Research,2010,11(Oct):2837-2854.
[18]FOWLKES E B,MALLOWS C L.A method for comparing two hierarchical clusterings[J].Journal of the American Statistical Association,1983,78(383):553-569.
[19]PEDREGOSA F,VAROQUAUX G,GRAMFORT A,et al.Scikit-learn:Machine learning in Python[J].Journal of Machine Learning Research,2011,12(Oct):2825-2830.
[20]CHANG H,YEUNG D Y.Robust path-based spectral cluste-ring[J].Pattern Recognition,2008,41(1):191-203.
[21]GIONIS A,MANNILA H,TSAPARAS P.Clusteri-ng aggregation[J].ACM Transactions on Knowledge Discovery from Data (TKDD),2007,1(1):4.
[22]VEENMAN C J,REINDERS M J T,BACKER E.A maximum variance cluster algorithm[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(9):1273-1280.
[23]FRANTI P,VIRMAJOKI O,HAUTAMAKI V.Fast agglomera-tive clustering using a k-nearest neighbor graph[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(11):1875-1881.
[24]BACHE K,LICHMAN M.UCI machine learning repository[EB/OL].[2019-12-20].http://archive.ics.uci.edu/ml.
[1] CHAI Hui-min, ZHANG Yong, FANG Min. Aerial Target Grouping Method Based on Feature Similarity Clustering [J]. Computer Science, 2022, 49(9): 70-75.
[2] ZHANG Ya-di, SUN Yue, LIU Feng, ZHU Er-zhou. Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index [J]. Computer Science, 2022, 49(1): 121-132.
[3] LI Shan, XU Xin-zheng. Parallel Pruning from Two Aspects for VGG16 Optimization [J]. Computer Science, 2021, 48(6): 227-233.
[4] WANG Mao-guang, YANG Hang. Risk Control Model and Algorithm Based on AP-Entropy Selection Ensemble [J]. Computer Science, 2021, 48(11A): 71-76.
[5] WANG Wei-dong, XU Jin-hui, ZHANG Zhi-feng, YANG Xi-bei. Gaussian Mixture Models Algorithm Based on Density Peaks Clustering [J]. Computer Science, 2021, 48(10): 191-196.
[6] ZHANG Yu, LU Yi-hong, HUANG De-cai. Weighted Hesitant Fuzzy Clustering Based on Density Peaks [J]. Computer Science, 2021, 48(1): 145-151.
[7] XU Shou-kun, NI Chu-han, JI Chen-chen, LI Ning. Image Caption of Safety Helmets Wearing in Construction Scene Based on YOLOv3 [J]. Computer Science, 2020, 47(8): 233-240.
[8] ZHANG Qin, CHEN Hong-mei, FENG Yun-fei. Overlapping Community Detection Method Based on Rough Sets and Density Peaks [J]. Computer Science, 2020, 47(5): 72-78.
[9] CHEN Jun-fen,ZHANG Ming,ZHAO Jia-cheng. Clustering Algorithm by Fast Search and Find of Density Peaks for Complex High-dimensional Data [J]. Computer Science, 2020, 47(3): 79-86.
[10] DENG Ding-sheng. Application of Improved DBSCAN Algorithm on Spark Platform [J]. Computer Science, 2020, 47(11A): 425-429.
[11] ZHANG Jian-xin, LIU Hong, LI Yan. Efficient Grouping Method for Crowd Evacuation [J]. Computer Science, 2019, 46(6): 231-238.
[12] LI Chang-jing,ZHAO Shu-liang,CHI Yun-xian. Outlier Detection Algorithm Based on Spectral Embedding and Local Density [J]. Computer Science, 2019, 46(3): 260-266.
[13] HU Chuang, YANG Geng, BAI Yun-lu. Clustering Algorithm in Differential Privacy Preserving [J]. Computer Science, 2019, 46(2): 120-126.
[14] ZHANG Tian-zhu, ZOU Cheng-ming. Study on Image Classification of Capsule Network Using Fuzzy Clustering [J]. Computer Science, 2019, 46(12): 279-285.
[15] CHEN Zi-hao, LI Qiang. Improved PBFT Consensus Mechanism Based on K-medoids [J]. Computer Science, 2019, 46(12): 101-107.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!