Computer Science ›› 2021, Vol. 48 ›› Issue (10): 191-196.doi: 10.11896/jsjkx.200800191

• Database & Big Data & Data Science • Previous Articles     Next Articles

Gaussian Mixture Models Algorithm Based on Density Peaks Clustering

WANG Wei-dong, XU Jin-hui, ZHANG Zhi-feng, YANG Xi-bei   

  1. College of Computer Science,Jiangsu University of Science and Technology,Zhenjiang,Jiangsu 212100,China
  • Received:2020-08-28 Revised:2020-11-30 Online:2021-10-15 Published:2021-10-18
  • About author:WANG Wei-dong,Ph.D,associate professor.His main research interests include pattern recognition and intelligent information processing.
  • Supported by:
    National Natural Science Foundation of China(61572242).

Abstract: Due to the existence of a large number of sample data which obey the Gaussian distribution,GMM (Gaussian mixture models) is used to cluster these sample data and get more accurate clustering results.In general,EM algorithm(expectation maxi-mization algorithm) is used to estimate the parameters of GMM iteratively.However,the traditional EM algorithm has two shortcomings:it is sensitive to the initial clustering center;the itera-tive termination condition of iterative parameter estimation is to judge that the distance between two adjacent estimated parameters is less than a given threshold,which can't guarantee that the algorithm converges to the optimal value of the parameters.In order to overcome the above shortcomings,density peaks clustering (DPC) is proposed to initialize EM algorithm to improve the robustness of the algorithm.The relative entropy is used as the ite-ration termination condition of the EM algorithm to optimize the parameters of GMM algorithm.The comparative experiments on artificial datasets and UCI datasets show that the new algorithm not only improves the robustness of EM algorithm,but also outperforms the traditional clustering algorithm.On the datasets which obey Gaussian distribution,the new algorithm greatly improves the clustering accuracy.

Key words: Clustering algorithm, Density peaks clustering, Expectation maximization algorithm, Gaussian mixture models, Relative entropy

CLC Number: 

  • TP391.4
[1]BENABDELLAH A C,BENGHABRIT A,BOUHADDOU I.A survey of clustering algorithms for an industrial context[J].Procedia Computer Science,2019,148:291-302.
[2]OYELADE J,ISEWON I,OLADIPUPO O,et al.Data Clustering:Algorithms and Its Applications[C]//2019 19th International Conference on Computational Science and Its Applications (ICCSA).Saint Petersburg,Russia,2019:71-81.
[3]CHAI W Y,YANG F,YUAN S F,et al.Multi-class Gaussian Mixture Model and Neighborhood Information Based Gaussian Mixture Model for Image Segmentation[J].Computer Science,2018,45(11):272-277.
[4]ZOU C M,CHEN D.Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis[J].Computer Science,2021,48(2):121-127.
[5]MCQUEEN J.Some methods for classification and analysis of multivariate observations[C]//Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability.Los Angeles:University of California,1967:281-297.
[6]SHEN H,DUAN Z.Application Research of Clustering Algorithm Based on K-Means in Data Mining[C]//2020 Internatio-nal Conference on Computer Information and Big Data Applications (CIBDA).Guiyang,China,2020:66-69.
[7]SAPKOTA N,ALSADOON A,PRASAD P W C,et al.Data Summarization Using Clustering and Classification:Spectral Clustering Combined with k-Means Using NFPH[C]//2019 International Conference on Machine Learning,Big Data,Cloud and Parallel Computing (COMITCon).Faridabad,India,2019:146-151.
[8]ZAKIR H,NASIM A,AHMADR B,et al.A dynamic K-means clustering for data mining[J].Indonesian Journal of Electrical Engineering and Computer Science,2019,13(2):521-526.
[9]ALEX R,ALESSANDRO L.Clustering by fast search and find of density peaks[J].Science,2014,344:1492-1496.
[10]DEMPSTERA P,LAIRDN M,RUBIND B.Maximum Likeli-hood from Incomplete Data via the EM Algorithm[J].Journal of the Royal Statistical Society,1997,39(1):1-38.
[11]YANG M S,LAI C Y,LIN C Y.A robust EM clustering algorithm for Gaussian mixture models[J].Pattern Recognition,2012,45(11):3950-3961.
[12]YUE J,WANG S T.Algorithm EM and Its Initialization inGaussian Mixture Model Based Clustering[J].Microcomputer Information,2006,11(22):244-247.
[13]XIE J Y,GAO H C,XIE W X.K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a data set[J].Scientian Sinica Informationis,2016,46(2):258-280.
[14]WANG S L,WANG D K,LI C Y,et al.Clustering by fastsearch and find of density peaks with data field[J].Chinese Journal of Electronics,2016,3(25):397-402.
[15]ZHANG W K,LI J.Extended fast search clustering algorithm:widely density clusters,no density peaks[J].arXiv:1505.05160,2015.
[16]JI X,YAO S,ZHAO P.Relative Neighborhood and PruningStrategy Optimized Density Peaks Clustering Algorithm[J].ACTA Automatica Sinica,2020,46(3):562-575.
[17]YANG W,CAI L,WU F.Image segmentation based on gray level and local relative entropy two dimensional histogram[J].PLOS ONE,2020,15(3):1-9.
[18]GIONIS A,MANNILA H,TSAPARAS P.clustering aggregation[C]//Proceedings of ACM Transactions on Knowledge Discovery from Data.2007,1(1):1-30.
[19]LIMIN F,ENZO M.Flame,a novel fuzzy clustering method for the analysis of DNA microarray data[J].BMC Bioininforma-tics,2007,8(1):3-17.
[20]CHANG H,YEUNG D Y.Robust path-based spectral cluste-ring[J].Pattern Recognition,2008,41(1):191-203.
[21]LI C M.UCI machine learning repository [EB/OL].http://archive.ics.uci.edu/ml.
[22]VINH N X,EPPS J,BAILEY J.Information theoretic measures for clustering comparison:is a correction for chance necessary? [C]//Proceedings of ICML'09.Montreal,2009:1073-1080.
[1] CHAI Hui-min, ZHANG Yong, FANG Min. Aerial Target Grouping Method Based on Feature Similarity Clustering [J]. Computer Science, 2022, 49(9): 70-75.
[2] ZHANG Ya-di, SUN Yue, LIU Feng, ZHU Er-zhou. Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index [J]. Computer Science, 2022, 49(1): 121-132.
[3] WU Shao-bo, FU Qi-ming, CHEN Jian-ping, WU Hong-jie, LU You. Meta-inverse Reinforcement Learning Method Based on Relative Entropy [J]. Computer Science, 2021, 48(9): 257-263.
[4] LI Shan, XU Xin-zheng. Parallel Pruning from Two Aspects for VGG16 Optimization [J]. Computer Science, 2021, 48(6): 227-233.
[5] TANG Xin-yao, ZHANG Zheng-jun, CHU Jie, YAN Tao. Density Peaks Clustering Algorithm Based on Natural Nearest Neighbor [J]. Computer Science, 2021, 48(3): 151-157.
[6] WANG Mao-guang, YANG Hang. Risk Control Model and Algorithm Based on AP-Entropy Selection Ensemble [J]. Computer Science, 2021, 48(11A): 71-76.
[7] ZHANG Yu, LU Yi-hong, HUANG De-cai. Weighted Hesitant Fuzzy Clustering Based on Density Peaks [J]. Computer Science, 2021, 48(1): 145-151.
[8] XU Shou-kun, NI Chu-han, JI Chen-chen, LI Ning. Image Caption of Safety Helmets Wearing in Construction Scene Based on YOLOv3 [J]. Computer Science, 2020, 47(8): 233-240.
[9] DENG Ding-sheng. Application of Improved DBSCAN Algorithm on Spark Platform [J]. Computer Science, 2020, 47(11A): 425-429.
[10] ZHANG Jian-xin, LIU Hong, LI Yan. Efficient Grouping Method for Crowd Evacuation [J]. Computer Science, 2019, 46(6): 231-238.
[11] HU Chuang, YANG Geng, BAI Yun-lu. Clustering Algorithm in Differential Privacy Preserving [J]. Computer Science, 2019, 46(2): 120-126.
[12] CHEN Zi-hao, LI Qiang. Improved PBFT Consensus Mechanism Based on K-medoids [J]. Computer Science, 2019, 46(12): 101-107.
[13] ZHANG Tian-zhu, ZOU Cheng-ming. Study on Image Classification of Capsule Network Using Fuzzy Clustering [J]. Computer Science, 2019, 46(12): 279-285.
[14] CHEN Chun-tao, CHEN You-guang. Influence Space Based Robust Fast Search and Density Peak Clustering Algorithm [J]. Computer Science, 2019, 46(11): 216-221.
[15] WANG Wei-hong, CHEN Xiao, WU Wei, GAO Xing-yu. Method of Automatically Extracting Urban Water Bodies from High-resolution Images with Complex Background [J]. Computer Science, 2019, 46(11): 277-283.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!