计算机科学 ›› 2022, Vol. 49 ›› Issue (11): 109-116.doi: 10.11896/jsjkx.210900101

• 数据库&大数据&数据科学 • 上一篇    下一篇

语义增强的完全不平衡标签网络表示学习算法

富坤, 郭云朋, 禚佳明, 李佳宁, 刘琪   

  1. 河北工业大学人工智能与数据科学学院 天津 300401
    河北省大数据计算重点实验室 天津 300401
  • 收稿日期:2021-09-13 修回日期:2022-02-26 出版日期:2022-11-15 发布日期:2022-11-03
  • 通讯作者: 富坤(fukun@hebut.edu.cn)
  • 基金资助:
    国家自然科学基金(62072154)

Semantic Information Enhanced Network Embedding with Completely Imbalanced Labels

FU Kun, GUO Yun-peng, ZHUO Jia-ming, LI Jia-ning, LIU Qi   

  1. College of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China
    Key Laboratory of Big Data Computing,Tianjin 300401,China
  • Received:2021-09-13 Revised:2022-02-26 Online:2022-11-15 Published:2022-11-03
  • About author:FU Kun,born in 1979,Ph.D,associate professor.Her main research interests include social network analysis and network representation learning.
  • Supported by:
    National Natural Science Foundation of China(62072154).

摘要: 在网络表示学习的研究中,数据的不完整性问题是一个重要问题,该问题使现有的表示学习算法难以达到预期效果。近年来,不少学者针对此类问题提出了解决方法,这些方法大多仅考虑标签信息本身的缺失问题,对数据不平衡性涉及较少,尤其是某一类别标签完全缺失的完全不平衡问题。解决这类问题的学习算法并不完善,主要存在的问题是在聚合邻域特征时侧重于考虑网络结构信息,未利用属性特征与语义特征间的关系来增强表示结果。为了解决以上问题,提出了融合属性特征与结构特征的SECT(Semantic Information Enhanced Network Embedding with Completely Imbalanced Labels)方法。首先,在考虑属性空间和语义空间关系的基础上,引入注意力机制进行监督学习,得到语义信息向量;然后,应用变分自编码器无监督提取结构特征以增强算法的鲁棒性;最后,在嵌入空间中融合语义与结构两种信息。将使用SECT算法得到的网络向量表示在Cora,Citeseer等数据集上进行测试,应用于节点分类任务时与RECT和GCN等算法相比,取得了0.86%~1.97%的效果提升。网络向量表示的可视化结果显示,与其他算法相比,SECT算法的类间距离变大,类簇内部更加紧凑,能较清晰地区分类别边界。实验结果表明了SECT算法的有效性,SECT得益于更好地在低维嵌入空间中融合语义信息,有效提升了存在完全不平衡标签情况下的节点分类任务性能。

关键词: 网络表示学习, 图嵌入, 图注意力网络, 完全不平衡标签, 变分自编码器

Abstract: The problem of data incompleteness has become an intractable problem for network representation learning(NRL) methods,which makes existing NRL algorithms fail to achieve the expected results.Despite numerous efforts have done to solve the issue,most of previous methods mainly focused on the lack of label information,and rarely consider data imbalance phenomenon,especially the completely imbalance problem that a certain class labels are completely missing.Learning algorithms to solve such problems are still explored,for example,some neighborhood feature aggregation process prefers to focus on network structure information,while disregarding relationships between attribute features and semantic features,of which utilization may enhance representation results.To address the above problems,a semantic information enhanced network embedding with completely imbalanced labels(SECT)method that combines attribute features and structural features is proposed in this paper.Firstly,SECT introduces attention mechanism in the supervised learning for obtaining the semantic information vector on precondition of considering the relationship between the attribute space and the semantic space.Secondly,a variational autoencoder is applied to extract structural features under an unsupervised mode to enhance the robustness of the algorithm.Finally,both semantic and structural information are integrated in the embedded space.Compared with two state-of-the-art algorithms,the node classification results on public data sets Cora and Citeseer indicate the network vector obtained by SECT algorithm outperforms others and increases by 0.86%~1.97% under Mirco-F1.As well as the node visualization results exhibit that compared with other algorithms,the vector distances among different-class clusters obtained by SECT are larger,the clusters of same class are more compact,and the class boundaries are more obvious.All these experimental results demonstrate the effectiveness of SECT,which mainly benefited from a better fusion of semantic information in the low-dimensional embedding space,thus extremely improves the performance of node classification tasks under completely imbalanced labels.

Key words: Network representation learning, Graph embedding, Graph attention network, Completely imbalanced label, Varia-tional autoencoders

中图分类号: 

  • TP391
[1]CUI P,WANG X,PEI J,et al.A survey on network embedding [J].IEEE Transactions on Knowledge and Data Engineering,2019,31(5):833-852.
[2]YIN Y,JI L X,HUANG R Y,et al.Research and developmentof network representation learning [J].Chinese Journal of Network and Information Security,2019,5(2):77-87.
[3]BALASUBRAMANIAN M,SCHWARTZ E L.The isomap algorithm and topological stability[J].Science,2002,295(5552):7.
[4]ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding [J].Science,2000:290(5500):2323-2326.
[5]BELKIN M,NIYOGI P.Laplacian eigenmaps and spectral techniques for embedding and clustering[C]//Proceedings of the 2001 14th International Conference on Neural Information Processing Systems:Natural and Synthetic.Cambridge,MA:MIT Press,2001:585-591.
[6]PEROZZI B,ALRFOU R,SKIENA S.Deepwalk:online learningof social representations[C]//Proceedings of the 20th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2014:701-710.
[7]GROVER A,LESKOVEC J.node2vec:Scalable feature lear-ning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2016:855-864.
[8]TANG J,QU M,WANG M,et al.LINE:large-scale information network embedding[C]//Proceedings of the 24th International Conference on World Wide Web.New York:ACM,2015:1067-1077.
[9]CAO S,LU W,XU Q.Grarep:learning graph representations with global structural information[C]//Proceedings of the 24th ACM International Conference on Information and Knowledge Management.New York:ACM,2015:891-900.
[10]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[C]//International Conference on Learning Representations(ICLR).2017.
[11]HAMILTON W L,YING R,LESKOVEC J.Inductive representation learning on large graphs[C]//Neural Information Processing Systems(NIPS).2017:1024-1034.
[12]PETAR V,GUILLEM C,ARANTXA C,et al.Graph attention networks [C]//Proceedings of the 6th International Conference on Learning Representations.Vancouver,BC:Elsevier,2018:1-12.
[13]KLICPERA J,BOJCHEVSKI A,GUNNEMANN S.Predictthen propagate:Graph neural networks meet personalized page-rank[C]//International Conference on Learning Representations.2019.
[14]WANG Z,YE X J,WANG C K,et al.Network Embedding with Completely-imbalanced Labels[J].IEEE Transactions on Know-ledge and Data Engineering(TKDE),2020,33(11):3634-3647.
[15]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[C]//International Conference on Learning Representations(ICLR).2013.
[16]MNIH A,HINTON G E.A scalable hierarchical distributed lan-guage model[C]//Advances in Neural Information Processing Systems.2009:1081-1088.
[17]MORIN F,BENGIO Y.Hierarchical probabilisticneural network language model[C]//Proceedings of the International Workshop on Artificial Intelligence and Statistics.2005:246-252.
[18]YANG C,LIU Z Y,ZHAO D L,et al.Network representation learning with rich text information[C]//Proceedings of IJCAI.2015.
[19]KIPF T N,WELLING M.Variational graph auto-encoders[C]//NIPS Workshop on Bayesian Deep Learning.2016.
[20]KINGMA D P,WELLING M.Auto-encoding variational bayes[C]//Proceedings of the International Conference on Learning Representations(ICLR).2014.
[21]BOUSQUETO,ELISSEEFF A.Stability and generalization [J].Journal of Machine Learning Research,2002,2(Mar):499-526.
[22]ZHOU Z H,WANG W,GAO W,et al.Introduction to the theory of Machine Learning[M].Beijing:China Machie Press(CMP),2020:92-94.
[23]STUDENT.The Probable Error of a Mean[J].Biometrika,1908,6(1):1-25.
[1] 王冠宇, 钟婷, 冯宇, 周帆.
基于矢量量化编码的协同过滤推荐方法
Collaborative Filtering Recommendation Method Based on Vector Quantization Coding
计算机科学, 2022, 49(9): 48-54. https://doi.org/10.11896/jsjkx.210700109
[2] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[3] 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军.
基于多智能体强化学习的端到端合作的自适应奖励方法
Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning
计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[4] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[5] 李勇, 吴京鹏, 张钟颖, 张强.
融合快速注意力机制的节点无特征网络链路预测算法
Link Prediction for Node Featureless Networks Based on Faster Attention Mechanism
计算机科学, 2022, 49(4): 43-48. https://doi.org/10.11896/jsjkx.210800276
[6] 杨辉, 陶力宏, 朱建勇, 聂飞平.
基于锚点的快速无监督图嵌入
Fast Unsupervised Graph Embedding Based on Anchors
计算机科学, 2022, 49(4): 116-123. https://doi.org/10.11896/jsjkx.210200098
[7] 唐雨潇, 王斌君.
基于深度生成模型的人脸编辑研究进展
Research Progress of Face Editing Based on Deep Generative Model
计算机科学, 2022, 49(2): 51-61. https://doi.org/10.11896/jsjkx.210400108
[8] 蒋宗礼, 樊珂, 张津丽.
基于生成对抗网络和元路径的异质网络表示学习
Generative Adversarial Network and Meta-path Based Heterogeneous Network Representation Learning
计算机科学, 2022, 49(1): 133-139. https://doi.org/10.11896/jsjkx.201000179
[9] 张仁杰, 陈伟, 杭梦鑫, 吴礼发.
基于变分自编码器的不平衡样本异常流量检测
Detection of Abnormal Flow of Imbalanced Samples Based on Variational Autoencoder
计算机科学, 2021, 48(7): 62-69. https://doi.org/10.11896/jsjkx.200600022
[10] 曾伟良, 陈漪皓, 姚若愚, 廖睿翔, 孙为军.
时空图注意力网络在交叉口车辆轨迹预测的应用
Application of Spatial-Temporal Graph Attention Networks in Trajectory Prediction for Vehicles at Intersections
计算机科学, 2021, 48(6A): 334-341. https://doi.org/10.11896/jsjkx.200800066
[11] 杜少华, 万怀宇, 武志昊, 林友芳.
融合文本序列和图信息的海关商品HS编码分类
Customs Commodity HS Code Classification Integrating Text Sequence and Graph Information
计算机科学, 2021, 48(4): 97-103. https://doi.org/10.11896/jsjkx.200900053
[12] 刘志鑫, 张泽华, 张杰.
基于多层次多视角的图注意力Top-N推荐方法
Top-N Recommendation Method for Graph Attention Based on Multi-level and Multi-view
计算机科学, 2021, 48(4): 104-110. https://doi.org/10.11896/jsjkx.200800027
[13] 富坤, 赵晓梦, 付紫桐, 高金辉, 马浩然.
基于不完全信息的深度网络表示学习方法
Deep Network Representation Learning Method on Incomplete Information Networks
计算机科学, 2021, 48(12): 212-218. https://doi.org/10.11896/jsjkx.201000015
[14] 邢长征, 朱金侠, 孟祥福, 齐雪月, 朱尧, 张峰, 杨一鸣.
兴趣点推荐方法研究综述
Point-of-interest Recommendation:A Survey
计算机科学, 2021, 48(11A): 176-183. https://doi.org/10.11896/jsjkx.201100021
[15] 潘雨, 邹军华, 王帅辉, 胡谷雨, 潘志松.
基于网络表示学习的深度社团发现方法
Deep Community Detection Algorithm Based on Network Representation Learning
计算机科学, 2021, 48(11A): 198-203. https://doi.org/10.11896/jsjkx.210200113
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!