Computer Science ›› 2021, Vol. 48 ›› Issue (7): 137-144.doi: 10.11896/jsjkx.200800190

• Database & Big Data & Data Science • Previous Articles     Next Articles

Semi-supervised Clustering Based on Gaussian Fields and Adaptive Graph Regularization

ZHAO Min, LIU Jing-lei   

  1. School of Computer and Control Engineering,Yantai University,Yantai,Shandong 264005,China
  • Received:2020-08-28 Revised:2020-10-19 Online:2021-07-15 Published:2021-07-02
  • About author:ZHAO Min,born in 1995,postgraduate.Her main research interests include semi-supervised clustering and so on.(ytdxzhaomin@163.com)
    LIU Jing-lei,born in 1970,Ph.D,professor,master supervisor.His main research interests include artificial intelligent and theoretical computer science.
  • Supported by:
    National Natural Science Foundation of China(61572419,61773331,61801414,62072391).

Abstract: Clustering is to divide a given sample into several different clusters,which is a widely used tool,has been applied in machine learning,data mining and so on,and has received extensive concern by researchers.However,there are still three main limitations.Firstly,usually there are noises and outliers in the data,which will bring about significant errors in the clustering results.Secondly,traditional clustering methods do not use supervision information to guide the construction of similarity matrices.Finally,in the graph-based clustering method,when constructing graphs,the neighbor relationship is determined.Once the calculation is wrong, it will result in poor quality of the constructed graph,which will affect the clustering performance.Therefore,a semi-supervised clustering model based on Gaussian field and adaptive graph regularization (SCGFAG) is proposed in this paper. In this model,supervised information is introduced by gaussian field and harmonic function to guide the construction of similarity matrix to realize semi-supervised learning.Sparse error matrix is introduced to represent sparse noise,such as impulse noise,dead line,stripes,and l1 norm is introduced to alleviate the sparse noise.In addition,the l2,1 norm is also introduced by the proposed model to mitigate the effects of outliers.Therefore,our SCGFAG is insensitive to data noise and outliers.More importantly,the regularization of adaptive graph is introduced into SCGFAG to improve the clustering performance.In order to realize the goal of optimization clustering,an iterative updating algorithm-Augmented Lagrangian Method (ALM) is proposed to update the optimization variables respectively.Experimental results on four datasets show that the proposed method is superior to the eight classical clustering methods,and has better clustering performance.

Key words: Adaptive graph regularization, Augmented Lagrangian method, Noise and outliers, Rotation invariance property of l2,1, Semi-supervised clustering

CLC Number: 

  • TP311
[1]RAO S R,TRON R,VIDAL R,et al.Motion segmentation in the presence of outlying,incomplete,or corrupted trajectories[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(10):1832-1845.
[2]SHI J,MALIK J.Normalized cuts and image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905.
[3]LIU G,LIN Z,YAN S,et al.Robust Recovery of SubspaceStructures by Low-Rank Representation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):171-184.
[4]HARTIGAN J A,WONG M A.A K-means Clustering Algorithm:Algorithm AS 136[J].Applied Statistics,1979,28(1):100-108.
[5]VON LUXBURG U.A tutorial on spectral clustering[J].Statistics and Computing,2007,17(4):395-416.
[6]ZHOU S H,ZHU E,LIU X W,et al.Subspace segmentation-based robust multiple kernel clustering[J].Information Fusion,2020,53:145-154
[7]ZHOU S,LIU X,ZHU C,et al.Spectral clustering-based local and global structure preservation for feature selection[C]//International Joint Conference on Neural Networks.IEEE,2014.
[8]DING C,LI T,JORDAN M.Convex and semi-nonnegative matrix factorizations[J].IEEE Transactions on Software Enginee-ring,2010,32(1):45-55.
[9]ZHANG Y,KONG X W,WANG Z F,et al.Cluster analysis based on multi-view matrix Decomposition[J].Acta Automata Sinica,2018,44(12):2160-2169.
[10]DING Y,LI Y Z.Intrusion detection Algorithm based on PCA and Semi-supervised clustering[J].Journal of Shandong University (Engineering Science),2012,42(5):41-46.
[11]BASU S,BILENKO M,MOONEY R J,et al.A probabilisticframework for semi-supervised clustering[C]//Knowledge Discovery and Data Mining.2004:59-68.
[12]LIU H.Constrained nonnegative matrix factorization for image representation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(7):1299-1311.
[13]CAI D,HE X,HAN J,et al.Graph regularized non-negative matrix factorization for data representation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(8):1548-1560.
[14]HUANG J, NIE F P,HUANG H,et al.Robust Manifold Nonnegative Matrix Factorization[J].ACM Transactions on Know-ledge Discovery from Data,2014,8(3):1-21.
[15]ZENG K,YU J,LI C,et al.Image clustering by hyper-graphregularized non-negative matrix factorization[J].Neurocompu-ting,2014,138:209-217.
[16]ZHANG X.Non-negative low rank and sparse graph for semi-supervised learning[C]//Computer Vision & Pattern Recognition.IEEE,2012.
[17]NIE F,XU D,TSANG W H,et al.Flexible manifold embedding:a framework for semi-Supervised and unsupervised dimension reduction[J].IEEE Transactions on Image Processing,2010,19(7):1921-1932.
[18]ZHU X,GHAHRAMANI Z,LAFFERTY J D.Semi-supervised learning using Gaussian fields and Harmonic functions[C]//Machine Learning,Proceedings of the Twentieth International Conference (ICML 2003).Washington DC,USA,2003.
[19]LU G,WANG Y,ZOU J.Low-rank matrix factorization withadaptive graph regularizer [J].IEEE Transactions on Image Processing,2016,25(5):2196-2205.
[20]ZHANG L,ZHANG Q,DU B,et al.Adaptive manifold regulari-zed matrix factorization for data clustering[C]//Twenty-sixth International Joint Conference on Artificial Intelligence.2017.
[21]HE F,NIE F,WANG R,et al.Fast Semi-supervised learning with bipartite graph for large-scale data[J].IEEE Transactionson Neural Networks,2020,31(2):626-638.
[22]LI Z,LIU J,TANG J.Robust Structured Subspace Learning for Data Representation [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(10):2085-2098.
[23]BHARDWAJ A,RAMAN S.Robust PCA-based solution to ima-ge composition using augmented Lagrange multiplier (ALM)[J].Visual Computer,2016,32(5):591-600.
[24]TOH K C,YUN S.An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems[J].Pacific Journal of Optimization,2010,6(3):615-640.
[25]YANG J,YUAN X.Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization[J].Mathematics of Computation,2012,82(281):301-329.
[26]LEE D D,SEUNG H S.Algorithms for Non-negative MatrixFactorization[C]//Neural Information Processing Systems.2000:556-562.
[27]PENG C,KANG Z,HU Y,et al.Robust graph regularized nonnegative matrix factorization for clustering[J].ACM Transactions on Knowledge Discovery from Data,2017,11(3):33.
[28]HE R,ZHENG W S,HU B G,et al.Nonnegative sparse coding for discriminative semi-supervised learning[C]//IEEE Confe-rence on Computer Vision & Pattern Recognition.IEEE,2011.
[29]JIA Y H,KWONG S,HOU J H,et al.Semi-Supervised Non-Negative Matrix Factorization with Dissimilarity and Similarity Regularization [J].IEEE Transaction on Neural Networks and Learning Systems,2020,31(7):2510-2521.
[1] YANG Fan, WANG Jun-bin, BAI Liang. Extended Algorithm of Pairwise Constraints Based on Security [J]. Computer Science, 2020, 47(9): 324-329.
[2] QIN Yue, DING Shi-fei. Survey of Semi-supervised Clustering [J]. Computer Science, 2019, 46(9): 15-21.
[3] WANG Nan, SUN Shan-wu. UAV Fault Recognition Based on Semi-supervised Clustering [J]. Computer Science, 2019, 46(6A): 192-195.
[4] JIA Hong-jie, WANG Liang-jun, SONG He-ping. HMRF Semi-supervised Approximate Kernel k-means Algorithm [J]. Computer Science, 2019, 46(12): 31-37.
[5] CHENG Xue-mei, YANG Qiu-hui, ZHAI Yu-peng and CHEN Wei. Test Case Selection Technique Based on Semi-supervised Clustering Method [J]. Computer Science, 2018, 45(1): 249-254.
[6] LIANG Chen and LI Cheng-hai. Novel Intrusion Detection Method Based on Semi-supervised Clustering [J]. Computer Science, 2016, 43(5): 87-90.
[7] FENG Chen-fei, YANG Yan, WANG Hong-jun, XU Ying-ge and WANG Tao. Semi-supervised Fuzzy Clustering Ensemble Approach with Data Correlation [J]. Computer Science, 2015, 42(6): 41-45.
[8] SU Ying-bin, DU Xue-hui, XIA Chun-tao, CAO Li-feng and CHEN Hua-cheng. Sensitive Information Inference Method Based on Semi-supervised Document Clustering [J]. Computer Science, 2015, 42(10): 132-137.
[9] CUI Peng,ZHANG Ru-bo. Novel Semi-supervised Clustering for High Dimensional Data [J]. Computer Science, 2010, 37(7): 205-207.
[10] LI Lin-na,CHEN Hai-rui,WANG Ying-long. Semi-supervised Clustering of Complex Structured Data Based on Higher-order Logic [J]. Computer Science, 2009, 36(9): 196-200.
[11] LU Wei-zhou ,YU Shun-zheng (Department of Electronic and Communication Engineering,Sun Yat-Sen University,Guangzhou 510275,China). [J]. Computer Science, 2009, 36(2): 90-94.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!