Computer Science ›› 2019, Vol. 46 ›› Issue (12): 31-37.doi: 10.11896/jsjkx.190600159

• Big Data & Data Science • Previous Articles     Next Articles

HMRF Semi-supervised Approximate Kernel k-means Algorithm

JIA Hong-jie, WANG Liang-jun, SONG He-ping   

  1. (School of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang,Jiangsu 212013,China)
  • Received:2019-04-15 Online:2019-12-15 Published:2019-12-17

Abstract: Massive data are produced with the development of information technology.Clustering can help to discover the intrinsic links of data and extract valuable information from them.In data analyzing,it is easy to get some background knowledge about data.Using these limited prior information to guide clustering can significantly improve the clustering results.The semi-supervised clustering based on Hidden Markov Random Fields (HMRF) uses pairwise constraints as the supervision information.Although it has good clustering results in many applications,its time and space complexity are very high,which cannot meet the needs of large-scale data processing.To solve this problem,this paper first analyzed the mathematical relationship between HMRF semi-supervised clustering and kernel k-means,and used matrix trace to unify the objective functions of the two clustering methods.In order to reduce the complexity of HMRF semi-supervised clustering,this paper proposed a HMRF semi-supervised approximate kernel k-means algorithm (HMRF-AKKM),which constructs an approximate kernel matrix by sampling,and used the approximate kernel k-means to optimize the clustering objective function.Finally,the HMRF-AKKM algorithm was compared with the related clustering algorithms on several benchmark datasets and the clustering performances of different algorithms were analyzed in the experiments.The experimental results show that the HMRF-AKKM algorithm has similar clustering quality to the original HMRF semi-supervised clustering on the same clustering task,but the HMRF-AKKM algorithm has shorter clustering time.This indicates that the HMRF-AKKM algorithm inherits the advantages of HMRF semi-supervised clustering and approximate kernel k-means.On the one hand,HMRF-AKKM can make full use of pairwise constraints to achieve high clustering quality.On the other hand,it improves the clustering efficiency by sampling and matrix approximation.Moreover,the clustering quality and clustering efficiency can be balanced by adjusting the sampling ratio and the number of pairwise constraints.Therefore,the proposed HMRF-AKKM algorithm has good scalability and it is suitable for the clustering problems of large-scale nonlinear data.

Key words: Approximate kernel k-means, HMRF model, Matrix trace, Pairwise constraints, Semi-supervised clustering

CLC Number: 

  • TP391
[1]ZHANG X T,ZHANG X C,LIU H.Weighed Multi-Task Clustering by Feature and Instance Transfer [J].Chinese Journal of Computers,2019,42(36):1-17.(in Chinese)
张晓彤,张宪超,刘晗.基于特征和实例迁移的加权多任务聚类[J].计算机学报,2019,42(36):1-17.
[2]MARIN D,TANG M,AYED I B,et al.Kernel clustering:density biases and solutions [J].IEEE Transactions on Pattern Ana-lysis and Machine Intelligence,2019,41(1):136-147.
[3]LIU X,ZHU X,LI M,et al.Multiple kernel k-means with incomplete kernels [OL].https://doi.org/10.1109/TPAMI.2019.2892416.
[4]CHITTA R,JIN R,HAVENS T C,et al.Scalable kernel clustering:Approximate kernel k-means [J].arXiv:1402.3849.
[5]MEHRKANOON S,ALZATE C,MALL R,et al.Multiclass Semisupervised Learning Based Upon Kernel Spectral Clustering [J].IEEE Transactions on Neural Networks & Learning Systems,2015,26(4):720-733.
[6]GAN H,FAN Y,LUO Z,et al.Local homogeneous consistent safe semi-supervised clustering [J].Expert Systems with Applications,2018,97:384-393.
[7]WANG W,YANG C,CHEN H,et al.Unified Discriminative and Coherent Semi-Supervised Subspace Clustering [J].IEEE Transactions on Image Processing,2018,27(5):2461-2470.
[8]YU Z,LUO P,LIU J,et al.Semi-supervised ensemble clustering based on selected constraint projection [J].IEEE Transactions on Knowledge and Data Engineering,2018,30(12):2394-2407.
[9]REN Y,HU K,DAI X,et al.Semi-supervised deep embedded clustering [J].Neurocomputing,2019,325:121-130.
[10]MEI J P.Semi-supervised fuzzy clustering with partition information of subsets[OL].https://doi.org/10.1109/TFUZZ.2018.2889010.
[11]NGUYEN B,DE BAETS B.Kernel-Based Distance Metric Learning for Supervised k-Means Clustering [OL].https://doi.org/10.1109/TNNLS.2018.2890021.
[12]WU B,ZHANG Y,HU B G,et al.Constrained clustering and its application to face clustering in videos[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2013:3507-3514.
[13]WANG S,GITTENS A,MAHONEY M W.Scalable kernel K-means clustering with Nyström approximation:relative-error bounds [J].The Journal of Machine Learning Research,2019,20(1):431-479.
[14]KULIS B,BASU S,DHILLON I,et al.Semi-supervised graph clustering:a kernel approach [J].Machine Learning,2009,74(1):1-22.
[15]BÜHLER T,HEIN M.Spectral clustering based on the graph p-Laplacian[C]//Proceedings of the 26th International Confe-rence on Machine Learning.ACM,2009:81-88.
[16]CAI D,HE X,HAN J,et al.Graph regularized nonnegative matrix factorization for data representation [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(8):1548-1560.
[17]BAGHSHAH M S,AFSARI F,SHOURAKI S B,et al.Scalable semi-supervised clustering by spectral kernel learning [J].Pattern Recognition Letters,2014,45:161-171.
[18]BHATT R B,DHALL A.Skin Segmentation Dataset.UCI Machine Learning Repository[OL].https://archive.ics.uci.edu/ml/index.html.
[19]KLEIN D,KAMVAR S D,MANNING C D.From Instance-le- vel Constraints to Space-Level Constraints:Making the Most of Prior Knowledge in Data Clustering[C]//Proceedings of the 19th International Conference on Machine Learning.ACM,2002:307-314.
[20]JIA H J,DING S F,SHI Z Z.Approximate weighted kernel k-means for large-scale spectral clustering [J].Journal of Software,2015,26(11):2836-2846.(in Chinese)
贾洪杰,丁世飞,史忠植.求解大规模谱聚类的近似加权核k-means算法[J].软件学报,2015,26(11):2836-2846.
[21]YANG Y,TAN W,LI T,et al.Consensus clustering based on constrained self-organizing map and improved Cop-Kmeans ensemble in intelligent decision support systems [J].Knowledge-Based Systems,2012,32:101-115.
[22]DING S,JIA H,DU M,et al.A semi-supervised approximate spectral clustering algorithm based on HMRF model [J].Information Sciences,2018,429:215-228. SHI Y,OTTO C,JAIN A K.Face clustering:representation and pairwise constraints .IEEE Transactions on Information Forensics and Security,2018,13(7):1626-1640. HE L,RAY N,GUAN Y,et al.Fast large-scale spectral clustering via explicit feature mapping .IEEE Transactions on Cybernetics,2018,49(3):1058-1071. ZHAO X,LIANG J,DANG C.A stratified sampling based clustering algorithm for large-scale data . Knowledge-Based Systems,2019,163:416-428.
[1] ZHAO Min, LIU Jing-lei. Semi-supervised Clustering Based on Gaussian Fields and Adaptive Graph Regularization [J]. Computer Science, 2021, 48(7): 137-144.
[2] YANG Fan, WANG Jun-bin, BAI Liang. Extended Algorithm of Pairwise Constraints Based on Security [J]. Computer Science, 2020, 47(9): 324-329.
[3] QIN Yue, DING Shi-fei. Survey of Semi-supervised Clustering [J]. Computer Science, 2019, 46(9): 15-21.
[4] WANG Nan, SUN Shan-wu. UAV Fault Recognition Based on Semi-supervised Clustering [J]. Computer Science, 2019, 46(6A): 192-195.
[5] CHENG Xue-mei, YANG Qiu-hui, ZHAI Yu-peng and CHEN Wei. Test Case Selection Technique Based on Semi-supervised Clustering Method [J]. Computer Science, 2018, 45(1): 249-254.
[6] LIANG Chen and LI Cheng-hai. Novel Intrusion Detection Method Based on Semi-supervised Clustering [J]. Computer Science, 2016, 43(5): 87-90.
[7] WANG Zong-hu and LIU Su. Pairwise Constrained Semi-supervised Text Clustering Algorithm [J]. Computer Science, 2016, 43(12): 183-188.
[8] FENG Chen-fei, YANG Yan, WANG Hong-jun, XU Ying-ge and WANG Tao. Semi-supervised Fuzzy Clustering Ensemble Approach with Data Correlation [J]. Computer Science, 2015, 42(6): 41-45.
[9] SU Ying-bin, DU Xue-hui, XIA Chun-tao, CAO Li-feng and CHEN Hua-cheng. Sensitive Information Inference Method Based on Semi-supervised Document Clustering [J]. Computer Science, 2015, 42(10): 132-137.
[10] . Pairwise Constraint Projections Inosculating Sparsity Preserving [J]. Computer Science, 2012, 39(11): 212-215.
[11] CUI Peng,ZHANG Ru-bo. Novel Semi-supervised Clustering for High Dimensional Data [J]. Computer Science, 2010, 37(7): 205-207.
[12] LI Lin-na,CHEN Hai-rui,WANG Ying-long. Semi-supervised Clustering of Complex Structured Data Based on Higher-order Logic [J]. Computer Science, 2009, 36(9): 196-200.
[13] LU Wei-zhou ,YU Shun-zheng (Department of Electronic and Communication Engineering,Sun Yat-Sen University,Guangzhou 510275,China). [J]. Computer Science, 2009, 36(2): 90-94.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!