计算机科学 ›› 2019, Vol. 46 ›› Issue (6A): 452-456.

• 大数据与数据挖掘 • 上一篇    下一篇

一种基于超图Markov链松弛的聚类学习方法

郭鹏1,2, 李仁发1, 胡慧3   

  1. 湖南大学信息科学与工程学院 长沙4100821;
    湖南工程学院计算机与通信学院 湖南 湘潭4111042;
    湖南工程学院电气信息学院 湖南 湘潭4111043
  • 出版日期:2019-06-14 发布日期:2019-07-02
  • 通讯作者: 李仁发(1957-),男,教授,主要研究方向为嵌入式系统、并行计算,E-mail:da_peng219@126.com
  • 作者简介:郭 鹏(1978-),男,博士生,副教授,主要研究方向为机器学习、边缘计算;胡 慧(1979-),女,博士,副教授,主要研究方向为计算机控制、模糊控制、神经网络控制和非线性系统控制等。
  • 基金资助:
    本文受湘潭市科技计划一般项目(GXY-YB 20171004)资助。

Clustering Method Based on Hypergraph Morkov Relaxation

GUO Peng1,2, LI Ren-fa1, HU Hui3   

  1. College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China1;
    College of Computer and Communication,Hunan Institute of Engineering,Xiangtan,Hunan 411104,China2;
    School of Electrical and Information,Hunan Institue of Engineering,Xiangtan,Hunan 411104,China3
  • Online:2019-06-14 Published:2019-07-02

摘要: 将车联网中高维的时空特征嵌入到低维的特征语义词袋是一种典型的聚类问题。谱聚类因其计算简单且有全局最优解的特点而备受关注,但是关于其聚类数目的研究工作相对较少。针对传统eigengap启发式方法无法适应于多噪声点和边界模糊数据集,导致聚簇过度分割的问题,提出了一种基于超图Markov链松弛的聚类学习方法(HS-MR算法)。该算法的基本思想是用Markov过程形式化描述超图并开始随机游走。在超图Markov链松弛过程中,通过随机转移矩阵P的t次幂和扩散映射找到数据集有意义的几何分布,然后提出基于互信息的目标函数进行聚类数目的自动收敛。实验结果表明,该算法在准确率上优于简单图谱聚类算法和标准超图谱聚类算法。

关键词: Markov链松弛, 超图Laplacian, 互信息, 扩散映射

Abstract: How to embed high dimention spatial-temporal feature into low dimention semantic feature word bag is a typic clustering problem in the Internet of vehicle .Spectral clustering algorithm is recently focused because of its simple computing and global optimal solution,however,the research about the numbers of clusters is relatively little.Tranditional eigengap heuristic method works well if the clusters in the data are very well pronounced.However,the more noisy or overlapping the clusters are,the less effective this heuristic is.This paper proposed a clustering method based on hypergraph markov relaxation (HS-MR method).The basic idea of this algorithm is using the Markov process to formally describe hypergraph and start random walk.In the relaxation process of hypergraph Markov chain,meaningful geometric distribution of data set is found through tth power of random transfer matrix P and diffusion mapping.Then,the objective function based on mutual information is proposed to automatically converge the clustering number.Finally,the experimental results show that the algorithm is superior to simple graph spectral clustering algorithm and hypergraph spectral clustering algorithm in accuracy rate.

Key words: Diffusion map, Hypergraph laplacian, Markov relaxation, Mutual informaion

中图分类号: 

  • TP391
[1]BLEI D,NG A,JORDAN M.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3(1):993-1022.
[2]HOFMANN T.Unsupervised learning by probabilistic latent semantic analysis[J].Machine Learning,2001,42(1):177-196.
[3]SELIM S Z.K-Means-Type Algorithms:A Generalized Convergence Theorem and Characterization of Local Optimality[J].IEEE Transactions on Pattern Analysis & Machine,1984,6(1):81-87.
[4]李永森.空间聚类算法中的K值优化问题研究[J].系统仿真学报,2006,18(3):573-576.
[5]周水庚.基于数据分区的DBSCAN算法[J].计算机研究与发展,2000,37(10):1153-1159.
[6]LUXBURG U V.A tutorial on spectral clustering[J].Statistics and Computing,2007,17(4):395-416.
[7]SHI J,MALIK J.Normalized cuts and image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905.
[8]LUXBURG U V,BOUSQUET O.Limits of spectral clustering[C]∥Advances in Neural Information Processing Systems (NIPS).2005:857-864.
[9]SCHÖLKOPF B.Learning with Hypergraphs:Clustering,Classification,and Embedding[C]∥Conference on Advances in Neural Information Pr.2006:1601-1608.
[10]ZHAN Y.A semi-supervised incremental learning method based on adaptive probabilistic hypergraph for video semantic detection[J].Multimedia Tools and Applications,2015,74(15):5513-5531.
[11]LIU X.Event-Based Media Enrichment Using an Adaptive Probabilistic Hypergraph Model[J].IEEE Transactions on Cybernetics,2015,45(11):2461-2471.
[12]MICHOEL T.Alignment and integration of complex networks by hypergraph-based spectral clustering[J].Physical Review E Statistical Nonlinear & Soft Matter Physics,2012,86(5 Pt 2):1188-1205.
[13]LIU J.Learning semantic features for action recognition via diffusion maps[J].Computer Vision & Image Understanding,2012,116(3):361-377.
[14]LU L.Recognizing human actions by two-level Beta process hidden Markov model[J].Multimedia Systems,2015:1.
[15]LANGE T.Stability-based validation of clustering solutions[J].Neural Computation,2004,16(6):1299-323.
[16]ZHOU D.Beyond pairwise classification and clustering using hypergraphs[C]∥Max Plank Institute for Biological Cybernetics.Tübingen,Germany,2005.
[17]NADLER B,LAFON S.Diffusion maps,spectral clustering and reaction coordinates of dynamical systems[J].Applied & Computational Harmonic Analysis,2005,21(1):113-127.
[18]COIFMAN R R,LAFON S.Diffusion Maps[J].Submitted to Applied Computational and Harmonic Analysis,2004.
[19]MOUYSSET S,NOAIUES J,RUIZ D,et al.Spectral Clustering:Interpretation and Gaussian Parameter[M].Data Analysis Machine Learning and Knowledge Discovery,2014.
[20]夏利民.基于信息瓶颈算法的图像语义标注[J].模式识别与人工智能,2008,21(6):812-818.
[1] 雷阳, 姜瑛.
云计算环境下关联节点的异常判断
Anomaly Judgment of Directly Associated Nodes Under Cloud Computing Environment
计算机科学, 2021, 48(1): 295-300. https://doi.org/10.11896/jsjkx.191200186
[2] 王毛妮, 彭长根, 何文竹, 丁兴, 丁红发.
基于图论与互信息量的差分隐私度量模型
Privacy Metric Model of Differential Privacy via Graph Theory and Mutual Information
计算机科学, 2020, 47(4): 270-277. https://doi.org/10.11896/jsjkx.190400098
[3] 林朗, 张自力.
基于多头绒泡菌的贝叶斯网络结构学习
Bayesian Structure Learning Based on Physarum Polycephalum
计算机科学, 2019, 46(9): 206-210. https://doi.org/10.11896/j.issn.1002-137X.2019.09.030
[4] 张美玉, 王洋洋, 侯向辉, 秦绪佳.
基于ORB和改进的RANSAC图像拼接算法
Image Stitching Algorithm Based on ORB and Improved RANSAC
计算机科学, 2019, 46(11A): 294-298.
[5] 彭晓冰, 朱玉全.
基于特征内相关和互信息的加权SVM算法
Weighted Support Vector Machine Algorithm Based on Inner-correlations and Mutual Information of Features
计算机科学, 2018, 45(12): 182-186. https://doi.org/10.11896/j.issn.1002-137X.2018.12.029
[6] 宋哲理, 王超, 王振飞.
基于MapReduce的多级特征选择机制
Multi-level Feature Selection Mechanism Based on MapReduce
计算机科学, 2018, 45(11A): 468-473.
[7] 李鹏,李英乐,王凯,何赞园,李星,常振超.
基于交互行为和连接分析的社交网络社团检测
Community Detection Based on User Interaction and Link Analysis in Social Networks
计算机科学, 2017, 44(7): 197-202. https://doi.org/10.11896/j.issn.1002-137X.2017.07.035
[8] 魏霖静,宁璐璐,练智超,代永强,王联国.
采用预编码的GSM网络最大互信息优化方法研究
Research on Maximum Mutual Information Optimization in GSM Networks with Precoding
计算机科学, 2017, 44(5): 71-74. https://doi.org/10.11896/j.issn.1002-137X.2017.05.013
[9] 刘哲,宋余庆,王栋栋.
自适应变异差分算法与Powell算法相结合的医学图像配准
Medical Image Registration Based on Self-adaptive DE Algorithm and Powell Algorithm
计算机科学, 2017, 44(11): 297-300. https://doi.org/10.11896/j.issn.1002-137X.2017.11.045
[10] 钱揖丽,蔡滢滢.
采用无标注语料和词“粘连”剔除策略的韵律短语识别
Recognition of Prosodic Phrases Based on Unlabeled Corpus and “Adhesion” Culling Strategy
计算机科学, 2016, 43(2): 51-56. https://doi.org/10.11896/j.issn.1002-137X.2016.02.011
[11] 阿力甫·阿不都克里木,李晓.
基于TextRank算法和互信息相似度的维吾尔文关键词提取及文本分类
Uyghur Keyword Extraction and Text Classification Based on TextRank Algorithm and Mutual Information Similarity
计算机科学, 2016, 43(12): 36-40. https://doi.org/10.11896/j.issn.1002-137X.2016.12.006
[12] 刘刚,周珩,梁晓庚,王明静.
非下采样轮廓波域红外与可见光图像配准算法
Image Registration Algorithm for Infrared and Visible Light Based on Non-subsampled Contourlet Transform
计算机科学, 2016, 43(11): 313-316. https://doi.org/10.11896/j.issn.1002-137X.2016.11.061
[13] 陈之彦,李晓杰,朱淑华,付丹龙,邢诒海.
基于Hash结构词典的双向最大匹配分词法
Bi-direction Maximum Matching Method Based on Hash Structural Dictionary
计算机科学, 2015, 42(Z11): 49-54.
[14] 黄 源,李 茂,吕建成.
一种基于开方检验的特征选择方法
New Feature Selection Method Based on CHI
计算机科学, 2015, 42(5): 54-56. https://doi.org/10.11896/j.issn.1002-137X.2015.05.011
[15] 魏中强,徐宏喆,李 文,桂小林.
基于条件互信息和概率突跳机制的贝叶斯网络结构学习算法
Bayesian Network Structure Learning Algorithm Based on Conditional Mutual Information and Probabilistic Jumping Mechanism
计算机科学, 2015, 42(3): 214-217. https://doi.org/10.11896/j.issn.1002-137X.2015.03.044
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!