基于融合变分图注意自编码器的深度聚类模型

doi:10.11896/jsjkx.210300036

摘要/Abstract

摘要： 聚类作为数据挖掘和机器学习中最基本的任务之一,在各种现实世界任务中已得到广泛应用。随着深度学习的发展,深度聚类成为一个研究热点。现有的深度聚类算法主要从节点表征学习或者结构表征学习两个方面入手,较少考虑同时将这两种信息进行融合以完成表征学习。提出一种融合变分图注意自编码器的深度聚类模型FVGTAEDC(Deep Clustering Model Based on Fusion Varitional Graph Attention Self-encoder),此模型通过联合自编码器和变分图注意自编码器进行聚类,模型中自编码器将变分图注意自编码器从网络中学习(低阶和高阶)结构表示进行集成,随后从原始数据中学习特征表示。在两个模块训练的同时,为了适应聚类任务,将自编码器模块融合节点和结构信息的表示特征进行自监督聚类训练。通过综合聚类损失、自编码器重构数据损失、变分图注意自编码器重构邻接矩阵损失、后验概率分布与先验概率分布相对熵损失,该模型可以有效聚合节点的属性和网络的结构,同时优化聚类标签分配和学习适合于聚类的表示特征。综合实验证明,该方法在5个现实数据集上的聚类效果均优于当前先进的深度聚类方法。

关键词: 变分图注意自编码器, 表征学习, 深度聚类, 自编码器, 自监督聚类

Abstract: As one of the most basic tasks in data mining and machine learning,clustering is widely used in various real-world tasks.With the development of deep learning deep clustering has become a research hotspot.Existing deep clustering algorithms are mainly from two aspects of node representation learning or structural representation learning.Less work considers fusing these two kinds of information at the same time to complete representation learning.This paper proposes a deep clustering model FVGTAEDC (Deep Clustering Model Based on Fusion Varitional Graph Attention Self-encoder),this model joints the autoencoderand the variational graph attention autoencoder for clustering.In the model,the autoencoder integrates the variational graph attention autoencoder from the network to learn (low-order and high-order) structural representations,and then the feature representation is learned from the original data.While the two modules are trained,in order to adapt to the clustering task,self-supervised clustering training for the autoencoder module is integrated with the representation features of the node and the structure information.Comprehensive clustering loss,autoencoder reconstruction data loss,and variational graph attention autoencoder reconstruction adjacency matrix loss,the relative entropy loss of the posterior probability distribution and the prior probability distribution.The method can effectively aggregate the attributes of nodes and the structure of the network,while optimizing the assignment of cluster labels and learning the representation features suitable for clustering.Comprehensive experiments prove that the method is better than the current advanced deep clustering method on 5 real data.

Key words: Deep clustering, Representation learning, Self encoder, Self-supervised clustering, Variational graph attention self-encoder

中图分类号:

TP181

康雁, 寇勇奇, 谢思宇, 王飞, 张兰, 吴志伟, 李浩. 基于融合变分图注意自编码器的深度聚类模型[J]. 计算机科学, 2021, 48(11A): 81-87. https://doi.org/10.11896/jsjkx.210300036

KANG Yan, KOU Yong-qi, XIE Si-yu, WANG Fei, ZHANG Lan, WU Zhi-wei, LI Hao. Deep Clustering Model Based on Fusion Variational Graph Attention Self-encoder[J]. Computer Science, 2021, 48(11A): 81-87. https://doi.org/10.11896/jsjkx.210300036

参考文献

[1]YANG B,LIU D Y,LIU J M,et al.Complex network clustering method[J].Journal of Software,2009,20(1):54-66.
[2]AHARTIGAN J,WONG M A.Algorithm AS 136:A k-means clustering algorithm.Journal of the Royal Statistical Society[J].Series C (Applied Statistics),1979,28(1):100-108.
[3]CHANG J L,WANG L F,MENG G F,et al.Deep AdaptiveImage Clustering[C]//IEEE International Conference on Computer Vision.2017:5880-5888.
[4]CAGGARWAL C,ZHAI C X.A survey of text clustering algorithms[C]//Mining Text Data.Springer.2012:77-128.
[5]SHANG J W,WANG C K,XIN X,et al.Community discovery algorithm based on deep sparse autoencoder[J].Journal of Software,2017,28(3):648-662.
[6]ARTHUR D,ASSILVITSKII S V.k-means++:The advantages of careful seeding[C]//SODA.2007:1027-1035.
[7]ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//KDD.1996:226-231.
[8]POUYANFAR S,SADIQ S,YAN Y L,et al.A Survey on Deep Learning:Algorithms,Techniques,and Applications[C]//ACM Computing Surveys.2019:1-36.
[9]TIAN F,GAO B,CUI Q,et al.Learning Deep Representations for Graph Clustering[C]//AAAI.2014:1293-1299.
[10]XIE J Y,GIRSHICK R,FARHADI A.Unsupervised deep embedding for clustering analysis[C]//ICML.2016:478-487.
[11]GUO X F,GAO L,LIU X W,et al.Improved deep embeddedclustering with local structure preservation[C]//IJCAI.2017:1753-1759.
[12]PENG X,XIAO S J,FENG J S,et al.Deep subspace clustering with sparsity prior[C]//IJCAI.2016:101-115.
[13]JIANG Z X,ZHENG Y,TAN H C,et al.Variational deep embedding:An unsupervised and generative approach to clustering[C]//IJCAI.2017:4305-4324.
[14]NKIPF T,WELLING M.Semi-supervised classification withgraph convolutional networks[C]//ICLR.2017:1-14.
[15]NKIPF T,WELLING M.Variational graph auto-encoders[J].NIPS,2016,21(11):1-3.
[16]WANG C,PAN S R,HU R Q,et al.Attributed Graph Clustering:A Deep Attentional Embedding Approach[C]//IJCAI.Marina del Rey CA USA:Association for the Advancement of Artificial Intelligence (AAAI),2019:3670-3676.
[17]LI X L,ZHANG H Y,ZHANG R.Embedding Graph Auto-Encoder with Joint Clustering via Adjacency Sharing[C]//WWW.2020:1-11.
[18]WANG C,PAN S R,LONG G D,et al.MGAE:MarginalizedGraph Autoencoder for Graph Clustering[C]//ACM on Conference on Information and Knowledge Management.2017:889-898.
[19]ZHANG X T,LIU H,LI Q M,et al.Attributed Graph Clustering via Adaptive Graph Convolution[C]//IJCAI.2019:4327-4333.
[20]BO D Y,WANG X,SHI C,et al.Structural Deep Clustering Network[C]//WWW.2020:1-11.
[21]SUN J G,LIU J,ZHAO L Y.Research on clustering algorithm[J].Journal of Software,2008,19(1):48-61.
[22]JAIN A K,DUBES R C.Algorithms for clustering data[J].Technometrics,1988,32(2):227-229.
[23]REYNOLDS D A.Gaussian mixture models[C]//Encyclopedia of Biometrics.2015:1-23.
[24]JOHNSON S C.Hierarchical clustering schemes[J].Psy-
chometrika,1967,32(3):241-254.
[25]NG A Y,JORDAN M I,WEISS Y.On spectral clustering:Analy-sis and an algorithm[C]//Advances in Neural Information Processing Systems.2002:849-856.
[26]HINTON G E,SALAKHUTDINOV R R.Reducing the dimensional ity of data with neural networks[J].Science,2006,7(28):504-507.
[27]HINTON G E.Learning multiple layers of representation[J].Science,2007,7(4):428-434.
[28]RAMACHANDRAN P,ZOPH B,LE Q V.Searching For Activation Functions[C]//ICLR.2018:1-13.
[29]CASANOVA A,ROMERO A,LIO P,et al.Graph AttentionNetworks[C]//IJCAI.2018:1-12.
[30]CHEPURI S P,LEUS G.Subsampling For Graph Power Spectrum Estimation[C]//IEEE SAM.2016:1250-1263.
[31]VAN DER MAATEN L,HINTON G.Visualizing data using t-sne[J].Journal of Machine Learning Research,2008,9(Nov):2579-2605.
[32]DENKER J,GARDNER W R,GRAF H,et al.Neural Network Recognizer for Hand-Written Zip Code Digits[C]//NIPS.1988:323-331.
[33]STISEN A,BLUNCK H,BHATTACHARYA S,et al.Smart devices are different:Assessing and mitigatingmobile sensing heterogeneities for activity recognition[C]//SenSys.ACM,2015:127-140.

相关文章 15

[1]	王冠宇, 钟婷, 冯宇, 周帆. 基于矢量量化编码的协同过滤推荐方法 Collaborative Filtering Recommendation Method Based on Vector Quantization Coding 计算机科学, 2022, 49(9): 48-54. https://doi.org/10.11896/jsjkx.210700109
[2]	黄丽, 朱焱, 李春平. 基于异构网络表征学习的作者学术行为预测 Author’s Academic Behavior Prediction Based on Heterogeneous Network Representation Learning 计算机科学, 2022, 49(9): 76-82. https://doi.org/10.11896/jsjkx.210900078
[3]	李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[4]	杜航原, 李铎, 王文剑. 一种面向电商网络的异常用户检测方法 Method for Abnormal Users Detection Oriented to E-commerce Network 计算机科学, 2022, 49(7): 170-178. https://doi.org/10.11896/jsjkx.210600092
[5]	胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[6]	郁舒昊, 周辉, 叶春杨, 王太正. SDFA:基于多特征融合的船舶轨迹聚类方法研究 SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion 计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[7]	彭云聪, 秦小林, 张力戈, 顾勇翔. 面向图像分类的小样本学习算法综述 Survey on Few-shot Learning Algorithms for Image Classification 计算机科学, 2022, 49(5): 1-9. https://doi.org/10.11896/jsjkx.210500128
[8]	韩洁, 陈俊芬, 李艳, 湛泽聪. 基于自注意力的自监督深度聚类算法 Self-supervised Deep Clustering Algorithm Based on Self-attention 计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001
[9]	武玉坤, 李伟, 倪敏雅, 许志骋. 单类支持向量机融合深度自编码器的异常检测模型 Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder 计算机科学, 2022, 49(3): 144-151. https://doi.org/10.11896/jsjkx.210100142
[10]	唐雨潇, 王斌君. 基于深度生成模型的人脸编辑研究进展 Research Progress of Face Editing Based on Deep Generative Model 计算机科学, 2022, 49(2): 51-61. https://doi.org/10.11896/jsjkx.210400108
[11]	张师鹏, 李永忠. 基于降噪自编码器和三支决策的入侵检测方法 Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions 计算机科学, 2021, 48(9): 345-351. https://doi.org/10.11896/jsjkx.200500059
[12]	杨蕾, 降爱莲, 强彦. 基于自编码器和流形正则的结构保持无监督特征选择 Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization 计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211
[13]	张仁杰, 陈伟, 杭梦鑫, 吴礼发. 基于变分自编码器的不平衡样本异常流量检测 Detection of Abnormal Flow of Imbalanced Samples Based on Variational Autoencoder 计算机科学, 2021, 48(7): 62-69. https://doi.org/10.11896/jsjkx.200600022
[14]	胡潇炜, 陈羽中. 一种结合自编码器与强化学习的查询推荐方法 Query Suggestion Method Based on Autoencoder and Reinforcement Learning 计算机科学, 2021, 48(6A): 206-212. https://doi.org/10.11896/jsjkx.200900196
[15]	叶洪良, 朱皖宁, 洪蕾. 基于CQT和梅尔频谱的带有人声的音乐风格转换方法 Music Style Transfer Method with Human Voice Based on CQT and Mel-spectrum 计算机科学, 2021, 48(6A): 326-330. https://doi.org/10.11896/jsjkx.200900104

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed