计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 226-230.doi: 10.11896/jsjkx.200800026

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于拓扑相似和XGBoost的复杂网络链路预测方法

龚追飞, 魏传佳   

  1. 浙江工业大学计算机科学与技术学院 杭州310023
  • 收稿日期:2020-08-04 修回日期:2020-09-21 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 龚追飞(793688937@qq.com)
  • 基金资助:
    国家自然科学基金(61773348);浙江省自然科学基金(LY17F030016)

Complex Network Link Prediction Method Based on Topology Similarity and XGBoost

GONG Zhui-fei, WEI Chuan-jia   

  1. College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China
  • Received:2020-08-04 Revised:2020-09-21 Online:2021-12-15 Published:2021-11-26
  • About author:GONG Zhui-fei,born in 1977,postgra-duate,Ph.D,lecturer,senior engineer.Her main research interests include complex network and link prediction.
  • Supported by:
    National Natural Science Foundation of China(61773348) and Natural Science Foundation of Zhejiang Province,China(LY17F030016).

摘要: 为了提高复杂网络链路预测的性能,采用拓扑相似和XGBoost算法来完成复杂网络链路预测。利用复杂网络拓扑结构建立邻接矩阵,求解共同邻居集合,然后根据拓扑相似理论计算复杂网络相似得分函数,将各个时间窗的得分函数和权重参数作为输入,采用XGBoost算法实现复杂网络的链路预测。通过差异化设置XGBoost算法的两个正则化系数,测试其对链路预测准确率的影响,获取最优正则化系数,从而得到稳定的XGBoost链路预测模型。实验证明,时间窗数量设置合理的情况下,相比常用网络链路预测算法,基于拓扑相似和XGBoost算法的预测准确率优势明显,且预测时间性能和其他算法的差距较小,尤其适用于大规模的复杂网络链路预测。

关键词: 复杂网络, 链路预测, 拓扑相似, XGBoost算法, 时间窗, 正则化

Abstract: In order to improve the performance of complex network link prediction,topology similarity and XGBoost algorithm are used to complete link prediction in complex network.According to the topological structure of complex network,the adjacency matrix is established to solve the common neighbor set.Then the similarity score function of complex network is calculated according to the topological similarity theory.The score function and weight parameters of each time window are taken as input,and XGBoost algorithm is used to realize the link prediction of complex network.By setting two regularization coefficients of XGBoost algorithm through differentiation,the influence on link prediction accuracy is tested,and the optimal regularization coefficient is obtained,thus a stable XGBoost link prediction model is obtained.The experimental results show that,compared with the common network link prediction algorithms,the prediction accuracy based on topology similarity and XGBoost algorithm has obvious advantages,and the prediction time performance is smaller than other algorithms,especially suitable for large-scale complex network link prediction.

Key words: Complex network, Link prediction, Topology similarity, XGBoost algorithm, Time window, Regularization

中图分类号: 

  • TP391
[1]LI H,MA X P,SHI J,et al.Research on trust transfer based recommendation model in complex network environment[J].Acta Automatica Sinica,2018,44(2):363-376.
[2]XU X K,XU S,ZHU Y X,et al.Link predictability in complex networks[J].Complex Systems and Complexity Science,2014,11(1):41-47.
[3]WANG K,LIU S X,YU H T,et al.Complex network link prediction algorithm based on common neighbor validity[J].Journal of University of Electronic Science and Technology of China,2019,48(3):114-121.
[4]CHEN B,ZHU W,LIU Y.Algorithm for complex network diameter based on distance matrix[J].Journal of Systems Engineering & Electronics,2018,29(2):118-124.
[5]YANG X H,WANG C.Community Detection Algorithm in Complex Network Based on Network Embedding and Local Resultant Force[J].Computer Science,2021,48(4):229-236.
[6]WANG H,LE Z C,GONG X,et al.Review of Link Prediction Methods Based on Feature Classification[J].Computer Science,2020,47(8):302-312.
[7]WU Z F,LIANG Q,LIU Q,et al.link prediction optimization algorithm based on AdaBoost[J].Journal of Communications,2014(3):116-123.
[8]MO H M.Application of Belief Function to Identification of Node Influence in Complex Networks[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2018,35(1):71-78.
[9]WANG X,CHEN X,QIAN F L,et al.Node similarity link prediction algorithm based on common neighbor contribution[J].Data Acquisition and Processing,2018,33(5):900-910.
[10]GUO W Y,LIU H Y,SUN Q,et al.Topological similarity measurement of contour lines using tree edit distance[J].Journal of Surveying and Mapping Science and Technology,2019(1):79-85.
[11]FU L D,WEI H,LI D,et al.Community dividing algorithm based on similarity of common neighbor nodes[J].Journal of Computer Applications,2019,39(7):2024-2029.
[12]GÓMEZ-RÍOS A, LUENGO J,HERRERA F.A Study on the Noise Label Influence in Boosting Algorithms:AdaBoost,GBM and XGBoost[C]//International Conference on Hybrid Artificial Intelligence Systems.2017.
[13]ZHANG Y,YAO Y G.Research on network intrusion detection based on xgboost algorithm[J].Information Network Security,2018(9):102-105.
[14]LI Z S,LIU Z G.Feature selection algorithm based on xgboost[J].Journal of Communications,2019(10):101-108.
[15]ZHANG X Y,TANG K.Traffic identification method based on xgboost algorithm and domain name information screening[J].Electronic Design Engineering,2019(6):177-182,187.
[16]CUI Y P,SHI K X,HU J W.Research of Webshell Detection Method Based on XGBoost Algorithm[J].Computer Science,2018,45(0z1):375-379.
[17]SU B J,ZHOU Y P,LIANG X G.Emotion recognition model of e-commerce review text based on xgboost algorithm[J].Internet of Things Technology,2018,8(1):54-57.
[18]ZHANG Y X,FENG Y X.Overview of link prediction methods and development[J].TT & C Technology,2019,38(2):8-12.
[1] 郑建炜, 黄娟娟, 秦梦洁, 徐宏辉, 刘志. 基于非局部相似及加权截断核范数的高光谱图像去噪[J]. 计算机科学, 2021, 48(9): 160-167.
[2] 穆俊芳, 郑文萍, 王杰, 梁吉业. 基于重连机制的复杂网络鲁棒性分析[J]. 计算机科学, 2021, 48(7): 130-136.
[3] 胡艳梅, 杨波, 多滨. 基于网络结构的正则化逻辑回归[J]. 计算机科学, 2021, 48(7): 281-291.
[4] 胡军, 王雨桐, 何欣蔚, 武晖栋, 李慧嘉. 基于复杂网络的全球航空网络结构分析与应用[J]. 计算机科学, 2021, 48(6A): 321-325.
[5] 王学光, 张爱新, 窦炳琳. 复杂网络上的非线性负载容量模型[J]. 计算机科学, 2021, 48(6): 282-287.
[6] 马媛媛, 韩华, 瞿倩倩. 基于节点亲密度的重要性评估算法[J]. 计算机科学, 2021, 48(5): 140-146.
[7] 殷子樵, 郭炳晖, 马双鸽, 米志龙, 孙怡帆, 郑志明. 群智体系网络结构的自治调节:从生物调控网络结构谈起[J]. 计算机科学, 2021, 48(5): 184-189.
[8] 刘胜久, 李天瑞, 谢鹏, 刘佳. 带权图的多重分形度量[J]. 计算机科学, 2021, 48(3): 136-143.
[9] 龚追飞, 魏传佳. 基于改进AdaBoost算法的复杂网络链路预测[J]. 计算机科学, 2021, 48(3): 158-162.
[10] 李鑫超, 李培峰, 朱巧明. 一种基于层级信息优化的有向网络表示学习方法[J]. 计算机科学, 2021, 48(2): 100-104.
[11] 黄寿孟. 一种基于监督学习的异构网链路预测模型[J]. 计算机科学, 2021, 48(11A): 111-116.
[12] 潘雨, 邹军华, 王帅辉, 胡谷雨, 潘志松. 基于网络表示学习的深度社团发现方法[J]. 计算机科学, 2021, 48(11A): 198-203.
[13] 赵曼, 赵加坤, 刘金诺. 基于自我中心网络结构特征和网络表示学习的链路预测算法[J]. 计算机科学, 2021, 48(11A): 211-217.
[14] 赵曼宇, 叶军. 基于采样控制和输入饱和的不确定复杂网络同步研究[J]. 计算机科学, 2021, 48(11A): 481-484.
[15] 杨超, 刘志. 基于TASEP模型的复杂网络级联故障研究[J]. 计算机科学, 2020, 47(9): 265-269.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李佳星,赵书良,安磊,李长镜. 基于分形理论的多尺度分类尺度上推算法[J]. 计算机科学, 2018, 45(6A): 453 -459 .
[2] 赵维佺,袁华强,李迪,魏小锐. 一类物联网控制系统中的时延问题研究[J]. 计算机科学, 2014, 41(Z6): 303 -305 .
[3] 张梦. 基于C/S结构的中小企业人事管理系统的设计与开发[J]. 计算机科学, 2016, 43(Z6): 547 -550 .
[4] 耿卫建,徐小龙,李玲娟,陈建新,杨庚. 智能小区用电数据模型研究[J]. 计算机科学, 2011, 38(Z10): 412 -415 .
[5] 王佺,聂仁灿,金鑫,周冬明,贺康建,余介夫. 基于拉普拉斯金字塔与PCNN-SML的图像融合算法[J]. 计算机科学, 2016, 43(Z6): 122 -124 .
[6] 洪海燕,刘维. 基于PPI网络的关键蛋白质的高效预测算法[J]. 计算机科学, 2016, 43(Z11): 16 -20 .
[7] 董钰山,李春江. Intel64体系结构的数据预取机制及效果[J]. 计算机科学, 2016, 43(5): 34 -41 .
[8] 陈甜甜, 姚璜, 左明章, 田元, 杨梦婷. 基于深度信息的动态手势识别综述[J]. 计算机科学, 2018, 45(12): 42 -51 .
[9] 贾经冬, 张筱曼, 郝璐, 谭火彬. 工业界需求工程关注点分析[J]. 计算机科学, 2020, 47(12): 25 -34 .
[10] 潘孝勤, 芦天亮, 杜彦辉, 仝鑫. 基于深度学习的语音合成与转换技术综述[J]. 计算机科学, 2021, 48(8): 200 -208 .