计算机科学 ›› 2020, Vol. 47 ›› Issue (7): 231-235.doi: 10.11896/jsjkx.190600085

• 计算机网络 • 上一篇    下一篇

基于融合元路径图卷积的异质网络表示学习

蒋宗礼, 李苗苗, 张津丽   

  1. 北京工业大学信息学部 北京100124
  • 收稿日期:2019-06-17 出版日期:2020-07-15 发布日期:2020-07-16
  • 通讯作者: 李苗苗(867743373@qq.com)
  • 作者简介:jiangzl@bjut.edu.cn

Graph Convolution of Fusion Meta-path Based Heterogeneous Network Representation Learning

JIANG Zong-li, LI Miao-miao, ZHANG Jin-li   

  1. Department of Information Technology,Beijing University of Technology,Beijing 100124,China
  • Received:2019-06-17 Online:2020-07-15 Published:2020-07-16
  • About author:JIANG Zong-li,born in 1956,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include network information search and processing.
    LI Miao-miao,born in 1994,postgra-duate.Her main research interests include network representation learning.

摘要: 近年来,网络表示学习(Network Representation Learning,NRL)作为一种在低维空间中表示节点来分析异质信息网络(Heterogeneous Information Networks,HIN)的有效方法受到越来越多的关注。基于随机游走的方法是目前网络表示学习常用的方法,然而这些方法大多基于浅层神经网络,难以捕获异质网络结构信息。图卷积神经网络(Gragh Convolutional Network,GCN)是一种流行的能对图进行深度学习的方法,能够更好地利用网络拓扑结构,但目前的GCN设计针对的是同质信息网络,忽略了网络中丰富的语义信息。为了有效地挖掘异质信息网络中的语义信息和高度非线性的网络结构信息,进而提高网络表示的效果,文中提出了一种基于融合元路径的图卷积异质网络表示学习算法(MG2vec)。该算法首先通过基于元路径的关联度量方法来获取异质信息网络中丰富的语义信息;然后采用图卷积神经网络进行深度学习,捕捉节点和邻居节点的特征,弥补浅层模型捕捉网络结构信息能力不足的缺陷,从而实现将丰富的语义信息和结构信息更好地融入低维的节点表示中。在数据集DBLP和IMDB上分别进行实验,相比DeepWalk,node2vec和Metapath2vec算法,所提MG2vec算法在多标签分类任务上的分类精确率更高且性能更优,精确率和Macro-F1值分别达到了94.49%和94.16%,且与DeepWalk相比分别最高提升了26.05%和28.73%。实验结果证明,MG2vec算法的性能优于经典的网络表示学习算法,具有更好的异质信息网络表示效果。

关键词: 网络表示学习, 异质信息网络, 元路径, 语义信息, 网络结构信息, 图卷积网络

Abstract: In recent years,network representation learning has received more and more attention as an effective method for analyzing heterogeneous information networks by representing nodes in a low-dimensional space.Random walk based methods are currently popular methods to learn network embedding,however,most of these methods are based on shallow neural networks,which make it difficult to capture heterogeneous network structure information.The graph convolutional network (GCN) is a popular method for deep learning of graphs,which is known to be capable of better exploitation of network topology,but current design of GCN is intended for homogenous networks,ignoring the rich semantic information in the network.In order to effectively mine the semantic information and highly nonlinear network structure information in heterogeneous information networks,this paper proposes a heterogeneous network representation learning algorithm based on graph convolution of fusion meta-path(MG2vec)to improve the effect of network representation.Firstly,the algorithm obtains rich semantic information in heterogeneous information networks through relevance measurement based on meta-paths.Then the graph convolution network is used for deep learning to capture the characteristics of nodes and neighbor nodes,to make up for the deficiency of shallow model in capturing the information of the network structure,so as to better integrate rich semantic information and structural information into the low-dimensional node representation.Experiments are carried out on DBLP and IMDB,compared with DeepWalk,node2vec and Metapath2vec classical algorithms,the proposed MG2vec algorithm has higher classification accuracy and better performance in multi-label classification tasks,the precision and Macro-F1 value can be respectively up to 94.49% and 94.16%,and the both of values are up to 26.05% and 28.73% higher respectively than DeepWalk.The experimental results show that the performance of MG2vec algorithm is better than that of classical network representation learning algorithms,and MG2vec has better heterogeneous information network representation effect.

Key words: Network representation learning, Heterogeneous information network, Meta-path, Semantics information, Network structure information, Graph convolutional networks

中图分类号: 

  • TP183
[1] TU C C,YANG C,LIU Z Y,et al.Network representation learning:an overview [J].Scientia Sinica Informations,2017,47(8):980-996.
[2] SHEIKH N,KEFATO Z T,MONTRESOR A.Semi-Supervised Heterogeneous Information Network Embedding for Node Classification using 1D-CNN[C]//2018 Fifth International Confe-rence on Social Networks Analysis,Management and Security (SNAMS).IEEE,2018:177-181.
[3] YIN Y,JI L X,HUANG R Y,et al.Research and development of network representation learning[J].Chinese Journal of Network and Information Security,2019,5(2):77-87.
[4] JIANG Z L,ZHANG J L,DU Y P,et al.Hierarchical construction and node classification of heterogeneous network based on stacked denoising autoencoder[J].Journal of Beijing University of Technology,2018,44(9):1217-1226.
[5] DONG Y,CHAWLA N V,SWAMI A.metapath2vec:Scalable representation learning for heterogeneous networks[C]//Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining.ACM,2017:135-144.
[6] DEFFERRARD M,BRESSON X,VANDERGHEYNST P.Convolutional neural networks on graphs with fast localized spectral filtering[C]//Advances in neural information processing systems.2016:3844-3852.
[7] KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[J].arXiv:1609.02907,2016.
[8] ZHANG D,YIN J,ZHU X,et al.Network representation lear-ning:A survey[J].IEEE transactions on Big Data,2017,PP(99):1-1.
[9] PEROZZI B,AlRFOU R,SKIENA S.Deepwalk:Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining.ACM,2014:701-710.
[10] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in neural information processing systems.2013:3111-3119.
[11] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013.
[12] TANG J,QU M,WANG M,et al.Line:Large-scale information network embedding[C]//Proceedings of the 24th international conference on world wide web.International World Wide Web Conferences Steering Committee,2015:1067-1077.
[13] GROVER A,LESKOVEC J.node2vec:Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining.ACM,2016:855-864.
[14] LE C Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[15] ZHANG J,JIANG Z,LI T.CHIN:Classification with META-PATH in Heterogeneous Information Networks[C]//International Conference on Applied Informatics.Springer,Cham,2018:63-74.
[16] SHI C,LI Y,ZHANG J,et al.A survey of heterogeneous information network analysis[J].IEEE Transactions on Knowledge and Data Engineering,2016,29(1):17-37.
[17] HUANG Z,ZHENG Y,CHENG R,et al.Meta structure:Computing relevance in large heterogeneous information networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2016:1595-1604.
[18] GUPTA M,KUMAR P,BHASKER B.A new relevance mea-sure for heterogeneous networks[C]//International Conference on Big Data Analytics and Knowledge Discovery.Cham:Sprin-ger,2015:165-177.
[19] SEBASTIANI F.Machine learning in automated text categorization[J].ACM computing surveys (CSUR),2002,34(1):1-4.
[1] 丁钰, 魏浩, 潘志松, 刘鑫. 网络表示学习算法综述[J]. 计算机科学, 2020, 47(9): 52-59.
[2] 黄易, 申国伟, 赵文波, 郭春. 一种基于漏洞威胁模式的网络表示学习算法[J]. 计算机科学, 2020, 47(7): 292-298.
[3] 张虎, 周晶晶, 高海慧, 王鑫. 融合节点结构和内容的网络表示学习方法[J]. 计算机科学, 2020, 47(12): 119-124.
[4] 霍丹, 张生杰, 万路军. 基于上下文的情感词向量混合模型[J]. 计算机科学, 2020, 47(11A): 28-34.
[5] 顾秋阳, 琚春华, 吴功兴. 融入深度自编码器与网络表示学习的社交网络信息推荐模型[J]. 计算机科学, 2020, 47(11): 101-112.
[6] 卢海川, 符海东, 刘宇. 基于CAN的地理语义数据存储与检索机制[J]. 计算机科学, 2019, 46(2): 171-177.
[7] 冶忠林, 赵海兴, 张科, 朱宇. 基于多视图集成的网络表示学习算法[J]. 计算机科学, 2019, 46(1): 117-125.
[8] 于亚新,张海军. EBSN中基于潜在好友关系的活动推荐算法[J]. 计算机科学, 2018, 45(3): 196-203.
[9] 尤红桃,张延园,林奕,刘胜. 基于语义信息的存储能效的研究[J]. 计算机科学, 2013, 40(Z6): 112-114.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .