计算机科学 ›› 2021, Vol. 48 ›› Issue (6): 332-337.doi: 10.11896/jsjkx.200700151

• 信息安全 • 上一篇    下一篇

基于主动学习的社交网络恶意用户检测方法

张人之, 朱焱   

  1. 西南交通大学信息科学与技术学院 成都611756
  • 收稿日期:2020-07-24 修回日期:2020-09-16 出版日期:2021-06-15 发布日期:2021-06-03
  • 通讯作者: 朱焱(yzhu@swjtu.edu.cn)
  • 基金资助:
    四川省科技计划项目(2019YFSY0032)

Malicious User Detection Method for Social Network Based on Active Learning

ZHANG Ren-zhi, ZHU Yan   

  1. School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China
  • Received:2020-07-24 Revised:2020-09-16 Online:2021-06-15 Published:2021-06-03
  • About author:ZHANG Ren-zhi,born in 1996,postgraduate.His main research interests include Web spam detection and graph neural network.(zrz59@qq.com)
    ZHU Yan,born in 1965,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.Her main research interests include data mining,Web anomaly detection,big data mana-gement and intelligent analysis.
  • Supported by:
    Sichuan Science and Technology Project(2019YFSY0032).

摘要: 社交网络恶意用户检测作为分类任务,需要标注训练样本。但社交网络规模通常较大,标注全部样本的花费巨大。为了能在标注预算有限的情况下找出更值得标注的样本,同时充分利用未标注样本,以此提高对恶意用户的检测表现,提出了一种基于图神经网络GraphSAGE和主动学习的检测方法。该方法分为检测模块和主动学习模块两部分。受Transformer的启发,检测模块改进了GraphSAGE,扁平化其聚合节点各阶次邻居信息的过程,使高阶邻居能直接聚合到中心节点,减少了高阶邻居的信息损失;然后通过集成学习,从不同角度利用提取得到的表征,完成检测任务。主动学习模块根据集成分类的结果衡量未标注样本的价值,在样本标注阶段交替使用检测模块和主动学习模块,指导为样本标注的过程,从而更有助于模型分类的样本标注。实验阶段使用AUROC和AUPR作为评价指标,在真实的大规模社交网络数据集上验证了改进的检测模块的有效性,并分析了改进有效的原因;然后将所提方法与现有的两种同类主动学习方法进行比较,实验结果表明在标注相同数量的训练样本的情况下,所提方法挑选标注的训练样本有更好的分类表现。

关键词: 不平衡数据, 恶意用户检测, 社交网络, 图神经网络, 主动学习

Abstract: As a classification task,malicious user detection needs to label training samples.However,the scale of social networks is usually large,and it costs a lot to label all samples.In order to find out the more worthy samples in the case of limited labeled budget,and make full use of unlabeled samples to improve the detection performance of malicious users,a detection method based on graph neural network and active learning is proposed.The method is divided into two parts:detection module and active lear-ning module.Inspired by Transformer,the detection module improves the graph neural network GraphSAGE,flattens the aggregation process of each order neighbors of its nodes,so that higher-order neighbors can directly aggregate to the central node and reduce the information loss of high-order neighbors.Then,through ensemble learning,the extracted representations are used from different perspectives to complete the detection task.The active learning module measures the value of unlabeled samplesaccor-ding to the results of ensemble classification,and alternately uses detection module and active learning module in the sample labeling stage to guide the process of labeling sample,which is more conducive to the model classification.In the experimental stage,AUROC and AUPR are used as evaluation indexes to verify the effectiveness of the improved detection module on a real large-scale social network data set,and the reasons for the improvement are analyzed.Then,compared with the existing two similar active learning methods,the experimental results show that the proposed method has better classification performance in the case of labeling the same number of training samples.

Key words: Active learning, Graph neural network, Imbalanced data, Malicious user detection, Social network

中图分类号: 

  • TP183
[1]LI Y,WANG Y,MA X,et al.A Graph-Based Method for Active Outlier Detection With Limited Expert Feedback[J].IEEE Access,2019,7:152267-152277.
[2]DAS B,TOLONE W,PARANJAPE V.Identifying malicious social media contents using multi-view Context-Aware active learning[J].Future Generation Computer Systems-the International Journal of Escience,2019,100:365-379.
[3]JIA J T,MICHAEL T S,SANTIAGO S.Graph-based Semi-Supervised & Active Learning for Edge Flows[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2019:761-771.
[4]DONG Z,ZHANG R,SHAO X.Automatic Annotation and Segmentation of Object Instances With Deep Active Curve Network[J].IEEE Access,2019,7:147501-147512.
[5]CHENG Y,NICOLÒ C,RICARDO S.Bayesian Semi-Supervised Learning with Graph Gaussian Processes[C]//Advances in Neural Information Processing Systems.2018:1683-1694.
[6]LI J,RONG Y,CHENG H,et al.Semi-Supervised Graph Classification:A Hierarchical Graph Perspective[C]//Proceedings of International Conference on World Wide Web.ACM,2019:972-982.
[7]HOU Y F,CHEN H Z,LI C J.A Representation LearningFramework for Property Graphs[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.ACM,2019:65-73.
[8]HUANG W B,ZHANG T,RONG Y.Adaptive Sampling To-wards Fast Graph Representation Learning[C]//Advances in Neural Information Processing Systems.2018:4558-4567.
[9]CHIANG W L,LIU X Q,SI S,et al.Cluster-GCN:An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2019:257-266.
[10]OZAN S,SILVIO S.Active Learning for Convolutional Neural Networks:A Core-Set Approach [C]//Proceedings of International Conference on Learning Representations.2018.
[11]PENG P,ZHANG W,ZHANG Y,et al.Cost sensitive active learning using bidirectional gated recurrent neural networks for imbalanced fault diagnosis[J].Neurocomputing,2020,407:232-245.
[12]HAMILTON W,YING Z,LESKOVEC J.Inductive Representation Learning on Large Graphs[C]//Advances in Neural Information Processing Systems.2017:1024-1034.
[13]CHEN J,MA T F,XIAO C.FastGCN:Fast Learning withGraph Convolutional Networks via Importance Sampling[C]//Proceedings of International Conference on Learning Representations.2018.
[14]ZENG H Q,ZHOU H K,AJITESH S,et al.GraphSAINT:Graph Sampling Based Inductive Learning Method[C]//Proceedings of International Conference on Learning Representations.2020.
[15]GAL Y,GHAHRAMANI Z.Dropout as A Bayesian Approxi-mation:Representing Model Uncertainty in Deep Learning[C]//Proceedings of International Conference on Machine Learning.2016:1050-1059.
[16]LAKSHMINARAYANAN B,PRITZEL A,BLUNDELL C.Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles[C]//Advances in Neural Information Proces-sing Systems.2017:6402-6413.
[17]LIN T Y,PRIYA G,HE K M,et al.Focal Loss for Dense Object Detection[C]//Proceedings of IEEE InternationalConference on Computer Vision.2017:2999-3007.
[18]SHOBEIR F,JAMES F,MADHUSUDANA S,et al.Collective Spammer Detection in Evolving Multi-Relational Social Networks[C]//Proceedings of the 25th ACM SIGKDD Internatio-nal Conference on Knowledge Discovery and Data Mining.ACM,2015:1769-1778.
[1] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[2] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[3] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[4] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[5] 齐秀秀, 王佳昊, 李文雄, 周帆.
基于概率元学习的矩阵补全预测融合算法
Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning
计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126
[6] 杨炳新, 郭艳蓉, 郝世杰, 洪日昌.
基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用
Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition
计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070
[7] 侯夏晔, 陈海燕, 张兵, 袁立罡, 贾亦真.
一种基于支持向量机的主动度量学习算法
Active Metric Learning Based on Support Vector Machines
计算机科学, 2022, 49(6A): 113-118. https://doi.org/10.11896/jsjkx.210500034
[8] 林夕, 陈孜卓, 王中卿.
基于不平衡数据与集成学习的属性级情感分类
Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning
计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205
[9] 熊中敏, 舒贵文, 郭怀宇.
融合用户偏好的图神经网络推荐模型
Graph Neural Network Recommendation Model Integrating User Preferences
计算机科学, 2022, 49(6): 165-171. https://doi.org/10.11896/jsjkx.210400276
[10] 邓朝阳, 仲国强, 王栋.
基于注意力门控图神经网络的文本分类
Text Classification Based on Attention Gated Graph Neural Network
计算机科学, 2022, 49(6): 326-334. https://doi.org/10.11896/jsjkx.210400218
[11] 魏鹏, 马玉亮, 袁野, 吴安彪.
用户行为驱动的时序影响力最大化问题研究
Study on Temporal Influence Maximization Driven by User Behavior
计算机科学, 2022, 49(6): 119-126. https://doi.org/10.11896/jsjkx.210700145
[12] 董奇达, 王喆, 吴松洋.
结合注意力机制与几何信息的特征融合框架
Feature Fusion Framework Combining Attention Mechanism and Geometric Information
计算机科学, 2022, 49(5): 129-134. https://doi.org/10.11896/jsjkx.210300180
[13] 余皑欣, 冯秀芳, 孙静宇.
结合物品相似性的社交信任推荐算法
Social Trust Recommendation Algorithm Combining Item Similarity
计算机科学, 2022, 49(5): 144-151. https://doi.org/10.11896/jsjkx.210300217
[14] 李勇, 吴京鹏, 张钟颖, 张强.
融合快速注意力机制的节点无特征网络链路预测算法
Link Prediction for Node Featureless Networks Based on Faster Attention Mechanism
计算机科学, 2022, 49(4): 43-48. https://doi.org/10.11896/jsjkx.210800276
[15] 畅雅雯, 杨波, 高玥琳, 黄靖云.
基于SEIR的微信公众号信息传播建模与分析
Modeling and Analysis of WeChat Official Account Information Dissemination Based on SEIR
计算机科学, 2022, 49(4): 56-66. https://doi.org/10.11896/jsjkx.210900169
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!