计算机科学 ›› 2025, Vol. 52 ›› Issue (2): 80-90.doi: 10.11896/jsjkx.240200005

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于个性化PageRank和对比学习的图异常检测模型

袁野1, 陈明2, 吴安彪2, 王一舒2   

  1. 1 北京理工大学计算机学院 北京 100081
    2 东北大学计算机科学与工程学院 沈阳 110169
  • 收稿日期:2024-02-01 修回日期:2024-06-30 出版日期:2025-02-15 发布日期:2025-02-17
  • 通讯作者: 袁野(yuan-ye@bit.edu.cn)
  • 基金资助:
    国家重点研发计划(2022YFB2702100);国家自然科学基金(61932004,62225203,U21A20516,62302084);中国博士后科学基金(2023M730518)

Graph Anomaly Detection Model Based on Personalized PageRank and Contrastive Learning

YUAN Ye1, CHEN Ming2, WU Anbiao2, WANG Yishu2   

  1. 1 School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China
    2 College of Computer Science and Engineering,Northeastern University,Liaoning 110169,China
  • Received:2024-02-01 Revised:2024-06-30 Online:2025-02-15 Published:2025-02-17
  • About author:YUAN Ye,born in 1981,Ph.D,professor,Ph.D supervisor.His main research interests include big data management and analytics,artificial intelligence based on big data and distributed big data computing.
  • Supported by:
    National Key R&D Program of China(2022YFB2702100),National Natural Science Foundation of China(61932004,62225203,U21A20516,62302084) and China Postdoctoral Science Foundation(2023M730518).

摘要: 图异常检测旨在从属性网络中检测出异常节点,其由于在许多应用领域如金融、电子贸易、垃圾邮件发送者检测中有着深远的实际意义而备受重视。传统的非深度学习方法只能捕捉图的浅层结构,对此,研究者们提出了基于深度神经网络的异常检测模型。然而,这些模型没有考虑到图中节点的中心性差异,这种差异在捕获节点的局部信息时会导致信息缺失或引入远端节点的噪声。此外,它们忽略了属性空间的特征信息,这些信息可以提供额外的异常监督信号。为此,从无监督的视角出发,提出了一种新颖的基于个性化PageRank和对比学习的图异常检测框架PC-GAD(Personalized PageRank and Contrastive Learning based Graph Anomaly Detection)。首先,提出一种动态采样策略,即通过计算图中每个节点的个性化PageRank向量确定其相应的子图采样数目,避免局部信息的缺失和引噪;其次,针对每个节点,分别从拓扑结构和属性空间的角度出发捕获节点的异常监督信号,并设计相应的对比学习目标,从而全面地学习潜在的异常模式;最后,经过多轮对比预测,根据输出的异常值得分评估每个节点的异常程度。为验证所提模型的有效性,分别在6个真实数据集上与基准模型开展了大量对比实验。实验结果验证了PC-GAD能够全面地识别出图中的异常节点,AUC值相比现有模型提升了1.42%。

关键词: 图异常检测, 个性化PageRank, 图神经网络, 图对比学习

Abstract: Graph anomaly detection aims to detect abnormal nodes from attribute networks,and is highly valued by researchers due to its profound practical significance in many application fields such as finance,electronic trade,and spam sender detection.Traditional non deep learning methods can only capture the shallow structure of the graph,and researchers have proposed anomaly detection models based on deep neural networks to address this issue.However,these models do not take into account the centrality differences of nodes in the graph,which can lead to information loss or introduce noise from remote nodes when capturing local information of nodes.In addition,they ignore the feature information in the attribute space,which can provide additional anomaly monitoring signals.Therefore,this paper proposes a novel graph anomaly detection framework PC-GAD(personalized PageRank and contrastive learning based graph anomaly detection) from an unsupervised perspective.Firstly,a dynamic sampling strategy is proposed,which calculates the personalized PageRank vector of each node in the graph to determine the corresponding size of subgraph samples,avoiding the loss of local information and noise introduction.Secondly,for each node,the abnormal supervision signals are captured from the perspective of topology and attribute space,and the corresponding contrastive learning objective is designed to comprehensively learn potential abnormal patterns.Finally,after multiple rounds of contrast and prediction,the degree of abnormality of each node is evaluated according to the score of the output outlier.To verify the effectiveness of the proposed model,a large number of comparative experiments are conducted with the benchmark models on six real datasets.Experimental results have verified that PC-GAD can comprehensively identify abnormal nodes in the graph,and the AUC value increases by 1.42% compared to existing models.

Key words: Graph anomaly detection, Personalized PageRank, Graph neural network, Graph contrastive learning

中图分类号: 

  • TP391
[1]YUAN Y,CHEN L,WANG G.Efficiently answering probability threshold-based shortest path queries over uncertain graphs[C]//Database Systems for Advanced Applications:15th International Conference.2010:155-170.
[2]YUAN Y,WANG G,WANG H,et al.Efficient subgraph search over large uncertain graphs[J].Proceedings of the VLDB Endowment,2011,4(11):876-886.
[3]YUAN Y,WANG G,CHEN L,et al.Efficient subgraph simila-rity search on large probabilistic graph databases[J].Proceedings of the VLDB Endowment,2012,5(9):800-811.
[4]YUAN Y,WANG G,CHEN L,et al.Efficient keyword search on uncertain graph data[J].IEEE Transactions on Knowledge and Data Engineering,2013,25(12):2767-2779.
[5]HU B,ZHANG Z,SHI C,et al.Cash-out user detection based on attributed heterogeneous information network with a hierarchical attention mechanism[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:946-953.
[6]WU Y,LIAN D,XU Y,et al.Graph convolutional networkswith markov random field reasoning for social spammer detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:1054-1061.
[7]LI A,QIN Z,LIU R,et al.Spam review detection with graphconvolutional networks[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management.2019:2703-2711.
[8]MA X,WU J,XUE S,et al.A comprehensive survey on graphanomaly detection with deep learning[J].IEEE Transactions on Knowledge and Data Engineering,2021,35(12):12012-12038.
[9]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[J].arXiv:1609.02907,2016.
[10]PEROZZI B,AKOGLU L.Scalable anomaly ranking of attributed neighborhoods[C]//Proceedings of the 2016 SIAM International Conference on Data Mining,Society for Industrial and Applied Mathematics.2016:207-215.
[11]LIU N,HUANG X,HU X.Accelerated Local Anomaly Detection via Resolving Attributed Networks[C]//IJCAI.2017:2337-2343.
[12]LI J,DANI H,HU X,et al.Radar:Residual Analysis for Ano-maly Detection in Attributed Networks[C]//IJCAI.2017:2152-2158.
[13]PENG Z,LUO M,LI J,et al.ANOMALOUS:A Joint Modeling Approach for Anomaly Detection on Attributed Networks[C]//IJCAI,2018:3513-3519.
[14]DING K,LI J,BHANUSHALI R,et al.Deep anomaly detection on attributed networks[C]//Proceedings of the 2019 SIAM International Conference on Data Mining,Society for Industrial and Applied Mathematics.2019:594-602.
[15]FAN H,ZHANG F,LI Z.Anomalydae:Dual autoencoder foranomaly detection on attributed networks[C]//2020 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).2020:5685-5689.
[16]BANDYOPADHYAY S,VIVEK S V,MURTY M N.Outlierresistant unsupervised deep architectures for attributed network embedding[C]//Proceedings of the 13th International Confe-rence on Web Search and Data Mining.2020:25-33.
[17]PEI Y,HUANG T,VAN IPENBURG W,et al.ResGCN:attention-based deep residual modeling for anomaly detection on attributed networks[C]//2021 IEEE 8th International Conference on Data Science and Advanced Analytics(DSAA).2021:1-2.
[18]LIU Y,LI Z,PAN S,et al.Anomaly detection on attributed networks via contrastive self-supervised learning[J].IEEE Tran-sactions on Neural Networks and Learning Systems,2021,33(6):2378-2392.
[19]DUAN J,WANG S,ZHANG P,et al.Graph anomaly detection via multi-scale contrastive learning networks with augmented view[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:7459-7467.
[20]VELICKOVIC P,FEDUS W,HAMILTON W L,et al.Deepgraph infomax[J].ICLR(Poster),2019,2(3):4.
[21]HASSANI K,KHASAHMADI A H.Contrastive multi-viewrepresentation learning on graphs[C]//International Confe-rence on Machine Learning.2020:4116-4126.
[22]YOU Y,CHEN T,SUI Y,et al.Graph contrastive learning with augmentations[J].Advances in Neural Information Processing Systems,2020,33:5812-5823.
[23]ZHU Y,XU Y,YU F,et al.Graph contrastive learning withadaptive augmentation[C]//Proceedings of the Web Conference 2021.2021:2069-2080.
[24]LI D,WANG W,SHAO M,et al.Contrastive Representation Learning Based on Multiple Node-centered Subgraphs[C]//Proceedings of the 32nd ACM International Conference on Information and Knowledge Management.2023:1338-1347.
[25]PAGE L,BRIN S,MOTWANI R,et al.The PageRank citation ranking:Bringing order to the web[R].Stanford Infolab,1999.
[26]GASTEIGER J,BOJCHEVSKI A,GÜNNEMANN S.Predictthen propagate:Graph neural networks meet personalized page-rank[J].arXiv:1810.05997,2018.
[27]BOJCHEVSKI A,GASTEIGER J,PEROZZI B,et al.Scalinggraph neural networks with approximate pagerank[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2020:2464-2473.
[28]ZHANG M L,ZHOU Z H.ML-KNN:A lazy learning approach to multi-label learning[J].Pattern recognition,2007,40(7):2038-2048.
[29]ZHANG S,LI X,ZONG M,et al.Efficient kNN classification with different numbers of nearest neighbors[J].IEEE Transactions on Neural Networks and Learning Systems,2017,29(5):1774-1785.
[30]DENG Z,ZHU X,CHENG D,et al.Efficient kNN classification algorithm for big data[J].Neurocomputing,2016,195:143-148.
[31]ANDERSEN R,CHUNG F,LANG K.Local graph partitioning using pagerank vectors[C]//2006 47th Annual IEEE Sympo-sium on Foundations of Computer Science(FOCS'06).2006:475-486.
[32]TONG H,FALOUTSOS C,PAN J Y.Fast random walk with restart and its applications[C]//Sixth International Conference on Data Mining(ICDM'06).2006:613-622.
[33]LIU Y,JIN M,PAN S,et al.Graph self-supervised learning:A survey[J].IEEE Transactions on Knowledge and Data Engineering,2022,35(6):5879-5900.
[34]OORD A,LI Y,VINYALS O.Representation learning with contrastive predictive coding[J].arXiv:1807.03748,2018.
[35]LI S,AMENTA N.Brute-force k-nearest neighbors search on the GPU[C]//Similarity Search and Applications:8th International Conference.2015:259-270.
[36]YANG Z,COHEN W,SALAKHUDINOV R.Revisiting semi-supervised learning with graph embeddings[C]//International Conference on Machine Learning.2016:40-48.
[37]GILES C L,BOLLACKER K D,LAWRENCE S.CiteSeer:An automatic citation indexing system[C]//Proceedings of the Third ACM Conference on Digital Libraries.1998:89-98.
[38]CANESE K,WEIS S.PubMed:the bibliographic database[J].The NCBI handbook,2013,2(1):1-9.
[39]TANG J,ZHANG J,YAO L,et al.Arnetminer:extraction and mining of academic social networks[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2008:990-998.
[40]TANG L,LIU H.Relational learning via latent social dimen-sions[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2009:817-826.
[41]LIU F T,TING K M,ZHOU Z H.Isolation Forest[C]//Proceedings of the 2008 Eighth IEEE International Conference on Data Mining.2008:413-422.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!