计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 226-230.doi: 10.11896/jsjkx.200800026

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于拓扑相似和XGBoost的复杂网络链路预测方法

龚追飞, 魏传佳   

  1. 浙江工业大学计算机科学与技术学院 杭州310023
  • 收稿日期:2020-08-04 修回日期:2020-09-21 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 龚追飞(793688937@qq.com)
  • 基金资助:
    国家自然科学基金(61773348);浙江省自然科学基金(LY17F030016)

Complex Network Link Prediction Method Based on Topology Similarity and XGBoost

GONG Zhui-fei, WEI Chuan-jia   

  1. College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China
  • Received:2020-08-04 Revised:2020-09-21 Online:2021-12-15 Published:2021-11-26
  • About author:GONG Zhui-fei,born in 1977,postgra-duate,Ph.D,lecturer,senior engineer.Her main research interests include complex network and link prediction.
  • Supported by:
    National Natural Science Foundation of China(61773348) and Natural Science Foundation of Zhejiang Province,China(LY17F030016).

摘要: 为了提高复杂网络链路预测的性能,采用拓扑相似和XGBoost算法来完成复杂网络链路预测。利用复杂网络拓扑结构建立邻接矩阵,求解共同邻居集合,然后根据拓扑相似理论计算复杂网络相似得分函数,将各个时间窗的得分函数和权重参数作为输入,采用XGBoost算法实现复杂网络的链路预测。通过差异化设置XGBoost算法的两个正则化系数,测试其对链路预测准确率的影响,获取最优正则化系数,从而得到稳定的XGBoost链路预测模型。实验证明,时间窗数量设置合理的情况下,相比常用网络链路预测算法,基于拓扑相似和XGBoost算法的预测准确率优势明显,且预测时间性能和其他算法的差距较小,尤其适用于大规模的复杂网络链路预测。

关键词: XGBoost算法, 复杂网络, 链路预测, 时间窗, 拓扑相似, 正则化

Abstract: In order to improve the performance of complex network link prediction,topology similarity and XGBoost algorithm are used to complete link prediction in complex network.According to the topological structure of complex network,the adjacency matrix is established to solve the common neighbor set.Then the similarity score function of complex network is calculated according to the topological similarity theory.The score function and weight parameters of each time window are taken as input,and XGBoost algorithm is used to realize the link prediction of complex network.By setting two regularization coefficients of XGBoost algorithm through differentiation,the influence on link prediction accuracy is tested,and the optimal regularization coefficient is obtained,thus a stable XGBoost link prediction model is obtained.The experimental results show that,compared with the common network link prediction algorithms,the prediction accuracy based on topology similarity and XGBoost algorithm has obvious advantages,and the prediction time performance is smaller than other algorithms,especially suitable for large-scale complex network link prediction.

Key words: Complex network, Link prediction, Regularization, Time window, Topology similarity, XGBoost algorithm

中图分类号: 

  • TP391
[1]LI H,MA X P,SHI J,et al.Research on trust transfer based recommendation model in complex network environment[J].Acta Automatica Sinica,2018,44(2):363-376.
[2]XU X K,XU S,ZHU Y X,et al.Link predictability in complex networks[J].Complex Systems and Complexity Science,2014,11(1):41-47.
[3]WANG K,LIU S X,YU H T,et al.Complex network link prediction algorithm based on common neighbor validity[J].Journal of University of Electronic Science and Technology of China,2019,48(3):114-121.
[4]CHEN B,ZHU W,LIU Y.Algorithm for complex network diameter based on distance matrix[J].Journal of Systems Engineering & Electronics,2018,29(2):118-124.
[5]YANG X H,WANG C.Community Detection Algorithm in Complex Network Based on Network Embedding and Local Resultant Force[J].Computer Science,2021,48(4):229-236.
[6]WANG H,LE Z C,GONG X,et al.Review of Link Prediction Methods Based on Feature Classification[J].Computer Science,2020,47(8):302-312.
[7]WU Z F,LIANG Q,LIU Q,et al.link prediction optimization algorithm based on AdaBoost[J].Journal of Communications,2014(3):116-123.
[8]MO H M.Application of Belief Function to Identification of Node Influence in Complex Networks[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2018,35(1):71-78.
[9]WANG X,CHEN X,QIAN F L,et al.Node similarity link prediction algorithm based on common neighbor contribution[J].Data Acquisition and Processing,2018,33(5):900-910.
[10]GUO W Y,LIU H Y,SUN Q,et al.Topological similarity measurement of contour lines using tree edit distance[J].Journal of Surveying and Mapping Science and Technology,2019(1):79-85.
[11]FU L D,WEI H,LI D,et al.Community dividing algorithm based on similarity of common neighbor nodes[J].Journal of Computer Applications,2019,39(7):2024-2029.
[12]GÓMEZ-RÍOS A, LUENGO J,HERRERA F.A Study on the Noise Label Influence in Boosting Algorithms:AdaBoost,GBM and XGBoost[C]//International Conference on Hybrid Artificial Intelligence Systems.2017.
[13]ZHANG Y,YAO Y G.Research on network intrusion detection based on xgboost algorithm[J].Information Network Security,2018(9):102-105.
[14]LI Z S,LIU Z G.Feature selection algorithm based on xgboost[J].Journal of Communications,2019(10):101-108.
[15]ZHANG X Y,TANG K.Traffic identification method based on xgboost algorithm and domain name information screening[J].Electronic Design Engineering,2019(6):177-182,187.
[16]CUI Y P,SHI K X,HU J W.Research of Webshell Detection Method Based on XGBoost Algorithm[J].Computer Science,2018,45(0z1):375-379.
[17]SU B J,ZHOU Y P,LIANG X G.Emotion recognition model of e-commerce review text based on xgboost algorithm[J].Internet of Things Technology,2018,8(1):54-57.
[18]ZHANG Y X,FENG Y X.Overview of link prediction methods and development[J].TT & C Technology,2019,38(2):8-12.
[1] 郑文萍, 刘美麟, 杨贵.
一种基于节点稳定性和邻域相似性的社区发现算法
Community Detection Algorithm Based on Node Stability and Neighbor Similarity
计算机科学, 2022, 49(9): 83-91. https://doi.org/10.11896/jsjkx.220400146
[2] 杨浩雄, 高晶, 邵恩露.
考虑一单多品的外卖订单配送时间的带时间窗的车辆路径问题
Vehicle Routing Problem with Time Window of Takeaway Food ConsideringOne-order-multi-product Order Delivery
计算机科学, 2022, 49(6A): 191-198. https://doi.org/10.11896/jsjkx.210400005
[3] 何茜, 贺可太, 王金山, 林绅文, 杨菁林, 冯玉超.
比特币实体交易模式分析
Analysis of Bitcoin Entity Transaction Patterns
计算机科学, 2022, 49(6A): 502-507. https://doi.org/10.11896/jsjkx.210600178
[4] 杨波, 李远彪.
数据科学与大数据技术课程体系的复杂网络分析
Complex Network Analysis on Curriculum System of Data Science and Big Data Technology
计算机科学, 2022, 49(6A): 680-685. https://doi.org/10.11896/jsjkx.210800123
[5] 张文轩, 吴秦.
基于多分支注意力增强的细粒度图像分类
Fine-grained Image Classification Based on Multi-branch Attention-augmentation
计算机科学, 2022, 49(5): 105-112. https://doi.org/10.11896/jsjkx.210100108
[6] 王本钰, 顾益军, 彭舒凡, 郑棣文.
融合动态距离和随机竞争学习的社区发现算法
Community Detection Algorithm Based on Dynamic Distance and Stochastic Competitive Learning
计算机科学, 2022, 49(5): 170-178. https://doi.org/10.11896/jsjkx.210300206
[7] 李勇, 吴京鹏, 张钟颖, 张强.
融合快速注意力机制的节点无特征网络链路预测算法
Link Prediction for Node Featureless Networks Based on Faster Attention Mechanism
计算机科学, 2022, 49(4): 43-48. https://doi.org/10.11896/jsjkx.210800276
[8] 赵亮, 张洁, 陈志奎.
基于双图正则化的自适应多模态鲁棒特征学习
Adaptive Multimodal Robust Feature Learning Based on Dual Graph-regularization
计算机科学, 2022, 49(4): 124-133. https://doi.org/10.11896/jsjkx.210300078
[9] 陈世聪, 袁得嵛, 黄淑华, 杨明.
基于结构深度网络嵌入模型的节点标签分类算法
Node Label Classification Algorithm Based on Structural Depth Network Embedding Model
计算机科学, 2022, 49(3): 105-112. https://doi.org/10.11896/jsjkx.201000177
[10] 赵学磊, 季新生, 刘树新, 李英乐, 李海涛.
基于路径连接强度的有向网络链路预测方法
Link Prediction Method for Directed Networks Based on Path Connection Strength
计算机科学, 2022, 49(2): 216-222. https://doi.org/10.11896/jsjkx.210100107
[11] 李家文, 郭炳晖, 杨小博, 郑志明.
基于信息传播的致病基因识别研究
Disease Genes Recognition Based on Information Propagation
计算机科学, 2022, 49(1): 264-270. https://doi.org/10.11896/jsjkx.201100129
[12] 郑建炜, 黄娟娟, 秦梦洁, 徐宏辉, 刘志.
基于非局部相似及加权截断核范数的高光谱图像去噪
Hyperspectral Image Denoising Based on Non-local Similarity and Weighted-truncated NuclearNorm
计算机科学, 2021, 48(9): 160-167. https://doi.org/10.11896/jsjkx.200600135
[13] 胡艳梅, 杨波, 多滨.
基于网络结构的正则化逻辑回归
Logistic Regression with Regularization Based on Network Structure
计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106
[14] 穆俊芳, 郑文萍, 王杰, 梁吉业.
基于重连机制的复杂网络鲁棒性分析
Robustness Analysis of Complex Network Based on Rewiring Mechanism
计算机科学, 2021, 48(7): 130-136. https://doi.org/10.11896/jsjkx.201000108
[15] 胡军, 王雨桐, 何欣蔚, 武晖栋, 李慧嘉.
基于复杂网络的全球航空网络结构分析与应用
Analysis and Application of Global Aviation Network Structure Based on Complex Network
计算机科学, 2021, 48(6A): 321-325. https://doi.org/10.11896/jsjkx.200900112
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!