计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 302-312.doi: 10.11896/jsjkx.190700136

• 计算机网络 • 上一篇    下一篇

基于特征分类的链路预测方法综述

王慧1, 2, 乐孜纯3, 龚轩1, 武玉坤1, 左浩1   

  1. 1 浙江工业大学计算机科学与技术学院 杭州 310023
    2 江西理工大学应用科学学院 江西 赣州 341000
    3 浙江工业大学理学院 杭州 310023
  • 出版日期:2020-08-15 发布日期:2020-08-10
  • 通讯作者: 王慧(540168713@qq.com)
  • 基金资助:
    浙江省国际合作“一带一路”专项(2015C04005)

Review of Link Prediction Methods Based on Feature Classification

WANG Hui1, 2, LE Zi-chun3, GONG Xuan1, WU Yu-kun1, ZUO Hao1   

  1. 1 College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
    2 College of Applied Science, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
    3 College of Science, Zhejiang University of Technology, Hangzhou 310023, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:WANG Hui, born in 1983, Ph.D student, lecturer, is a member of China Computer Federation.Her main research interests include link prediction, deep learning, AI and big data.
  • Supported by:
    This work was supported by the Special Funding of “the Belt and Road” International Cooperation of Zhejiang Province (2015C04005).

摘要: 复杂网络链路预测作为网络科学研究中一个重要的研究方向, 受到了越来越多来自各个学科领域专家的关注, 它可以利用现有的网络信息, 如节点和边缘的特征, 来预测未来可能形成的关系、网络中缺失的信息以及新的或正在消失的信息, 识别虚假交互, 评估网络演化机制, 进行网络重构等。当前链路预测的文献主要来自工程学、计算机科学与物理学的专家, 它们各自为政, 缺少合作, 结合多学科进行链路预测的综述论文少之又少。因此, 文中从计算机科学和物理学的视角全面回顾、分析和讨论基于特征分类的链路预测算法的研究进展, 介绍了该领域专家们提出的多种特征提取技术, 首次把分层的思想引入链路预测算法分类中, 将分类模型分为3层, 即元数据层、特征分类层和特征抽取层。该分类模型包括“2个大块7个方面”, 即把常用的链路预测算法分为2个大块(特征提取方法和特征学习方法)和7个方面(基于相似性的方法、基于似然分析的方法、基于概率模型的方法、矩阵分解方法、基于随机游走的方法、基于神经网络的方法和基于自定义损失函数的方法)。该分类方法覆盖了各学科中许多经典的和最新的链路预测技术, 包括当前最流行的图神经网络链路预测技术GNN(Graph Neural Network), GCN(Graph Convolutional Network), RNN(Recurrent Neural Network)和RL(Reinforcement Learning)。文中研究了这些算法的模型复杂性和预测性能的差异, 并对当前链路预测技术未来所面临的挑战进行了讨论。

关键词: 链路预测, 复杂网络, 机器学习, 特征分类, 图神经网络

Abstract: As an important research direction of network science research, link prediction of complex network has been more and more attention from experts from various disciplines, and it can make use of existing network information, such as the features of the nodes and edges, to predict possible future relationships, missing information in the network, and new or disappearing information, identify the false interactions, evaluate network evolution mechanism and conduct network reconfiguration, etc.Current articles on link prediction mainly come from the expert in engineering, computer science and physics.They are independent and lack cooperation, and there are few review articles combining multidisciplinary link prediction.The goal of this article is from the perspective of computer science and physics comprehensive to review, analyze, and discusse the research progress of link prediction algorithm based on feature classification, introduces a variety of feature extraction techniques in this field.This paper firstly introduces the idea of layering into the classification of link prediction algorithm, which divides the classification model into three layers, the metadata layer, features classification layer and feature extraction layer.The classification model includes two parts and seven aspects, that is, the prediction algorithm is divided into two parts, features extraction method and features learning method.The seven aspects are the similarity based methods, probabilistic methods, likelihood based methods, matrix factorization based method, random walk based method, neural network based method and custom loss function based method.This classification method covers many classical and latest link prediction techniques in various disciplines, including GNN, GCN, RNN and RL, which are the most popular link prediction techniques in graph neural networks.The differences of these algorithms in model complexity and prediction performance are studied, and the challenges of current link prediction technology in the future are discussed.

Key words: Link prediction, Complex network, Machine learning, Feature classification, Graph neural network

中图分类号: 

  • TP391
[1] RAMESH R S.Link prediction and path analysis using markov chains[J].Computer Networks, 2000, 60(33):377-386.
[2] LIBEN-NOWEL D, KLEINBERGN, JON K.The link-prediction problem forsocial networks[J].Journal of The American Society For InformationScience and Technology, 2007, 58(7):1019-1031.
[3] LV L L.Link prediction on complex networks[J].Journal ofUniversity of Electronic Science and Technology of China, 2010, 39(5):651-661.
[4] HASAN M A, ZAKI M.A survey of link prediction in socialnetworks[J].Social Network Data Analytics, 2011, 40(5):243-275.
[5] LV L L, REN X L, ZHOU T.Network link prediction:concepts and frontiers [J].Communication of China Computer Society, 2016, 12(4):12-21.
[6] VALVERDE R J, LOPES A A.Exploiting behaviors of communities of twitter users for link prediction[J].Soc. Netw. Anal. Min., 2013, 20(3):1063-1074.
[7] LIU H, HU Z, HADDADI H.Hidden link prediction based on node centrality and weak ties[J].Europhys Lett., 2013, 101(1):65-75.
[8] YANG S H, LONG B, SMOLA A, et al.Like like alike:jointfriendship and interest propagation in social networks[C]∥Proceedings of the 20th International Conference on World Wide Web.ACM, 2011:537-546.
[9] CHO E, MYERS S A, LESKOVEC J.Friendship and mobility:user movement in location-based social networks[C]∥Procee-dings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM.2011:1082-1090.
[10] LEO K.A new status index derived from sociometric analysis[J].Psychometrika, 1953, 50(18):39-43.
[11] ELIZABETH A L, PETTER H, MARK E N.Vertex similarityin networks[J].Physical Review E, 2006, 73(2):121-130.
[12] JEH G, WIDOM J.SimRank:a measure of structural-contextsimilarity[C]∥Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02).ACM, 2002:538-543.
[13] KLOUMANN I M, UGANDER J, KLEINBERG J.Block mo-dels and personalized PageRank[J].The National Academy of Sciences, 2017, 114(1):33-38.
[14] SHANG M S, L L, ZENG W, et al.Relevance is more significant than correlation:Information filtering on sparse data[J].Europhys Lett., 2009, 88(6):68008p1-68008p4.
[15] NEWMAN M E J.Clustering and preferential attachment ingrowing networks[J].Phys. Rev. E., 2001, 64(10):251-264.
[16] SALTON G, MCGILLM.Introduction to Modern InformationRetrieval[J].McGraw-Hill, 1983, 32(6):528-536.
[17] ADAMIC L A, ADAR E.Friend and neighbors on the web[J].Soc Networks, 2003, 25(6):211-230.
[18] LIU Y Y, ZHAO C L, WANG X J.The degree-related clustering coefficient and its application to link prediction[J].Physica A, 2016, 454(15):24-33.
[19] ZHOU T, L L, ZHANG Y C.Predicting missing links via local information[J].Eur. Phys. J. B., 2009, 80(71):623-630.
[20] RAVASZ E, SOMERA A L, MONGRU D, et al.Hierarchicalorganization of modularity in metabolic networks[J].Science, 2002, 297(12):1551-1558.
[21] LV L, JIN C H, ZHOU T.Similarity index based on local paths for link prediction of complex network[J].Phys. Reve., 2009, 80(4):211-223.
[22] WANG P, XU B W, WU Y R.Link predictionin social net-works:the state-of-the-art[J].Science China Information Scien-ces, 2015, 58(1):1-38.
[23] HU R J, TANG J X, YUAN Y N.Link prediction in complex networks based on the interactions among paths[J].Physical A, 2019, 510(4):52-67.
[24] ALEXIS P, PANAGIOTIS S, YANNIS M.Fast and accuratelink prediction in social networking systems[J].Systems and Software, 2012, 60(9):2119-2132.
[25] ZHU X, TIAN H, CAI S, et al.Predicting missing links via significant paths[J].EPL, 2014, 108(4):18008-18014.
[26] AARON C, CRISTOPHER M, MARK E J.Hierarchicalstructure and theprediction of missing links in networks[J].Nature, 2008, 453(71):98-101.
[27] GUIMER R, SALES P M.Missing and spurious interactionsand the reconstruction of complex networks[J].PNAS, 2009, 106(52):22073-22078.
[28] BEN T, PIETER A, WONG M F.Relationalmarkov networks[J].Introduction to Statistical Relational Learning, 2007, 40(13):175-200.
[29] LV L Y.ZHOU T.Link prediction in complex networks:A survey[J].Physica A:Statistical Mechanics and Its Applications, 2011, 390(6):1150-1170.
[30] MENON A K, ELKAN C.Link prediction via matrix factorization in Machine Learning and Knowledge Discovery in Databases[C]∥Springer.2011:437-452.
[31] ADITYA K M, CHARLES E.Link Prediction via Matrix Factorization[C]∥Proceedings of the 2011 European Conference on Machine Learning and Knowledge Discovery in Databases.ACM, 2011:437-452.
[32] KUNEGIS J, LOMMATZSCH A.Learning spectral graphtransformations for link prediction[C]∥ Proceedings of the 26th Annual International Conference on Machine Learning.ACM, 2009:561-568.
[33] CHEN Z, ZHANG W.A Marginalized Denoising Method forLink Prediction in Relational Data[C]∥ Proceedings of the 2014 SIAM International Conference on Data Mining.ACM, 2015:64-72.
[34] DAI L Y, KONG X Z, YUAN S S, et al.Integrative graph regularized matrix factorization for drug-pathway associations analysis[J].Computational Biology and Chemistry, 2019, 78(7):474-480.
[35] REN Y W, ZHOU J L, WANG J.Quality-Relevant Fault Monitoring Based on Locally Linear Embedding Orthogonal Projection to Latent Structure[J].Industrial & Engineering Chemistry Research, 2019, 58(3):1262-1272.
[36] OU M D, CUI P, PEI J, et al.Asymmetrictransitivity preserving graph embedding[C]∥Proceedings of the 22nd ACMSIGKDD International Conference on Knowledge Discovery and Data Mining.ACM, 2017:1105-1114.
[37] BELKIN M, NIYOGI P.Laplacian eigenmaps and spectral techniques for embedding and clustering[C]∥Advances in Neural Information Processing Systems.2002:585-591.
[38] CAO S, LU W, XU Q.Grarep:Learning graph representations with global structural information[C]∥Proceedings of the 24th ACM International on Conference on Information and Know-ledge Management.ACM, 2015:891-900.
[39] WANG H W, WANG J L, XIE X, et al.GraphGAN:Graph representation learningwith generative adversarial nets[J].IEEE Transaction on Knowledge and Data Engineering, 2017, 30(22):11-19.
[40] CAI L J, XU Y B, HE T Q, et al.A New Algorithm of DeepWalk Based On Probability[J].Journal of Physics:Conference Series, 2019, 1069(1):130-135.
[41] GROVER A, LESKOVEC J.Node2Vec:Scalable Feature learning for Network[J].HHS Public Access, 2016, 14(8):855-864.
[42] WANG D, CUI P, ZHU W.Structural deep network embedding[C]∥Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining.ACM, 2016:1225-1234.
[43] FRENSNO D, CARLOS W L, SARAI M C, et al.DNGR-1 in dendritic cells limits tissue damage by dampening neutrophil recruitment[J].Science, 2019, 6412(362):351-356.
[44] TOMAS N K, MAX W, Variational Graph Auto-Encoders[J].Springer, 2016, 28(3):61-63.
[45] LUCA F, MATHIAS N, MASSIMILIANO P.Learning DiscreteStructures for Graph Neural Networks[J].ICML.2019, 30(11):960-973.
[46] YOU J X, REX Y, XIANG R.Graphrnn:Generating realisticgraphs with deepauto-regressive models[C]∥International Conference on Machine Learning.ACM, 2018:5694-5703.
[47] MONTI F, BRONSTEIN M, BRESSON X.Geometric matrixcompletionwith recurrent multi-graph neural networks[J].Advances in NeuralInformation Processing Systems, 2017, 30(4):3697-3707.
[48] BUI L Q, PHAN A V, NGUYEN Y L H.DGCNN:A convolutional neural network over large-scale labeled graphs[J].Neural Networks:The Official Journal of the International Neural Network Society, 2017, 108(11):533-543.
[49] CAO D N, KIPF T.Molgan:An implicit generative model forsmall molecular graphs[J].ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative Models, 2018, 97(30):181-192.
[50] TANG J, QU M, WANG M, et al.Line:Large-scale information network embedding[C]∥Ternational Conference on Word Wide Web.ACM, 2015:1067-1077.
[51] KIM Y H, YOO J, KIM S J, et al.Comparison of Scattering Cross-Sections by MCNP5 and TRANSX/TWODANT Codes in Sodium-Cooled Fast Reactor[J].Transactions of the American Nuclear Society, 2018, 106(1):719-722.
[52] WANG H, LE Z C, GONG X, et al.Link predicton of complex network is analyzed from the perspective of informatics[J].Journal of Chinese Computer Systems, 2020, 41(2):316-326.
[1] 李吟, 李必信. 基于脚本预测和重组的内存泄漏测试加速技术[J]. 计算机科学, 2020, 47(9): 31-39.
[2] 丁钰, 魏浩, 潘志松, 刘鑫. 网络表示学习算法综述[J]. 计算机科学, 2020, 47(9): 52-59.
[3] 苏畅, 张定权, 谢显中, 谭娅. 面向5G通信网络的NFV内存资源管理方法[J]. 计算机科学, 2020, 47(9): 246-251.
[4] 杨超, 刘志. 基于TASEP模型的复杂网络级联故障研究[J]. 计算机科学, 2020, 47(9): 265-269.
[5] 张梦月, 胡军, 严冠, 李慧嘉. 基于可见性图网络的中国专利申请关注度分析[J]. 计算机科学, 2020, 47(8): 189-194.
[6] 张清琪, 刘漫丹. 复杂网络社区发现的多目标五行环优化算法[J]. 计算机科学, 2020, 47(8): 284-290.
[7] 袁野, 和晓歌, 朱定坤, 王富利, 谢浩然, 汪俊, 魏明强, 郭延文. 视觉图像显著性检测综述[J]. 计算机科学, 2020, 47(7): 84-91.
[8] 彭伟, 胡宁, 胡璟璟. 图像隐写分析算法研究概述[J]. 计算机科学, 2020, 47(6A): 325-331.
[9] 董明刚, 弓佳明, 敬超. 基于谱聚类的多目标进化社区发现算法研究[J]. 计算机科学, 2020, 47(6A): 461-466.
[10] 包振山, 郭俊南, 谢源, 张文博. 基于LSTM-GA的股票价格涨跌预测模型[J]. 计算机科学, 2020, 47(6A): 467-473.
[11] 徐江峰谭玉龙. 基于机器学习的HBase配置参数优化研究[J]. 计算机科学, 2020, 47(6A): 474-479.
[12] 朱林立, 华钢, 高炜. 决定图框架下本体学习算法的稳定性分析[J]. 计算机科学, 2020, 47(5): 43-50.
[13] 富坤, 仇倩, 赵晓梦, 高金辉. 基于节点演化分阶段优化的事件检测方法[J]. 计算机科学, 2020, 47(5): 96-102.
[14] 袁榕, 宋玉蓉, 孟繁荣. 一种基于加权网络拓扑权重的链路预测方法[J]. 计算机科学, 2020, 47(5): 265-270.
[15] 李鑫超, 李培峰, 朱巧明. 一种基于改进向量投影距离的知识图谱表示方法[J]. 计算机科学, 2020, 47(4): 189-193.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .