计算机科学 ›› 2021, Vol. 48 ›› Issue (6): 332-337.doi: 10.11896/jsjkx.200700151
张人之, 朱焱
ZHANG Ren-zhi, ZHU Yan
摘要: 社交网络恶意用户检测作为分类任务,需要标注训练样本。但社交网络规模通常较大,标注全部样本的花费巨大。为了能在标注预算有限的情况下找出更值得标注的样本,同时充分利用未标注样本,以此提高对恶意用户的检测表现,提出了一种基于图神经网络GraphSAGE和主动学习的检测方法。该方法分为检测模块和主动学习模块两部分。受Transformer的启发,检测模块改进了GraphSAGE,扁平化其聚合节点各阶次邻居信息的过程,使高阶邻居能直接聚合到中心节点,减少了高阶邻居的信息损失;然后通过集成学习,从不同角度利用提取得到的表征,完成检测任务。主动学习模块根据集成分类的结果衡量未标注样本的价值,在样本标注阶段交替使用检测模块和主动学习模块,指导为样本标注的过程,从而更有助于模型分类的样本标注。实验阶段使用AUROC和AUPR作为评价指标,在真实的大规模社交网络数据集上验证了改进的检测模块的有效性,并分析了改进有效的原因;然后将所提方法与现有的两种同类主动学习方法进行比较,实验结果表明在标注相同数量的训练样本的情况下,所提方法挑选标注的训练样本有更好的分类表现。
中图分类号:
[1]LI Y,WANG Y,MA X,et al.A Graph-Based Method for Active Outlier Detection With Limited Expert Feedback[J].IEEE Access,2019,7:152267-152277. [2]DAS B,TOLONE W,PARANJAPE V.Identifying malicious social media contents using multi-view Context-Aware active learning[J].Future Generation Computer Systems-the International Journal of Escience,2019,100:365-379. [3]JIA J T,MICHAEL T S,SANTIAGO S.Graph-based Semi-Supervised & Active Learning for Edge Flows[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2019:761-771. [4]DONG Z,ZHANG R,SHAO X.Automatic Annotation and Segmentation of Object Instances With Deep Active Curve Network[J].IEEE Access,2019,7:147501-147512. [5]CHENG Y,NICOLÒ C,RICARDO S.Bayesian Semi-Supervised Learning with Graph Gaussian Processes[C]//Advances in Neural Information Processing Systems.2018:1683-1694. [6]LI J,RONG Y,CHENG H,et al.Semi-Supervised Graph Classification:A Hierarchical Graph Perspective[C]//Proceedings of International Conference on World Wide Web.ACM,2019:972-982. [7]HOU Y F,CHEN H Z,LI C J.A Representation LearningFramework for Property Graphs[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.ACM,2019:65-73. [8]HUANG W B,ZHANG T,RONG Y.Adaptive Sampling To-wards Fast Graph Representation Learning[C]//Advances in Neural Information Processing Systems.2018:4558-4567. [9]CHIANG W L,LIU X Q,SI S,et al.Cluster-GCN:An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2019:257-266. [10]OZAN S,SILVIO S.Active Learning for Convolutional Neural Networks:A Core-Set Approach [C]//Proceedings of International Conference on Learning Representations.2018. [11]PENG P,ZHANG W,ZHANG Y,et al.Cost sensitive active learning using bidirectional gated recurrent neural networks for imbalanced fault diagnosis[J].Neurocomputing,2020,407:232-245. [12]HAMILTON W,YING Z,LESKOVEC J.Inductive Representation Learning on Large Graphs[C]//Advances in Neural Information Processing Systems.2017:1024-1034. [13]CHEN J,MA T F,XIAO C.FastGCN:Fast Learning withGraph Convolutional Networks via Importance Sampling[C]//Proceedings of International Conference on Learning Representations.2018. [14]ZENG H Q,ZHOU H K,AJITESH S,et al.GraphSAINT:Graph Sampling Based Inductive Learning Method[C]//Proceedings of International Conference on Learning Representations.2020. [15]GAL Y,GHAHRAMANI Z.Dropout as A Bayesian Approxi-mation:Representing Model Uncertainty in Deep Learning[C]//Proceedings of International Conference on Machine Learning.2016:1050-1059. [16]LAKSHMINARAYANAN B,PRITZEL A,BLUNDELL C.Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles[C]//Advances in Neural Information Proces-sing Systems.2017:6402-6413. [17]LIN T Y,PRIYA G,HE K M,et al.Focal Loss for Dense Object Detection[C]//Proceedings of IEEE InternationalConference on Computer Vision.2017:2999-3007. [18]SHOBEIR F,JAMES F,MADHUSUDANA S,et al.Collective Spammer Detection in Evolving Multi-Relational Social Networks[C]//Proceedings of the 25th ACM SIGKDD Internatio-nal Conference on Knowledge Discovery and Data Mining.ACM,2015:1769-1778. |
[1] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[2] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[3] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[4] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[5] | 齐秀秀, 王佳昊, 李文雄, 周帆. 基于概率元学习的矩阵补全预测融合算法 Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning 计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126 |
[6] | 杨炳新, 郭艳蓉, 郝世杰, 洪日昌. 基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用 Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition 计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070 |
[7] | 侯夏晔, 陈海燕, 张兵, 袁立罡, 贾亦真. 一种基于支持向量机的主动度量学习算法 Active Metric Learning Based on Support Vector Machines 计算机科学, 2022, 49(6A): 113-118. https://doi.org/10.11896/jsjkx.210500034 |
[8] | 林夕, 陈孜卓, 王中卿. 基于不平衡数据与集成学习的属性级情感分类 Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning 计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205 |
[9] | 熊中敏, 舒贵文, 郭怀宇. 融合用户偏好的图神经网络推荐模型 Graph Neural Network Recommendation Model Integrating User Preferences 计算机科学, 2022, 49(6): 165-171. https://doi.org/10.11896/jsjkx.210400276 |
[10] | 邓朝阳, 仲国强, 王栋. 基于注意力门控图神经网络的文本分类 Text Classification Based on Attention Gated Graph Neural Network 计算机科学, 2022, 49(6): 326-334. https://doi.org/10.11896/jsjkx.210400218 |
[11] | 魏鹏, 马玉亮, 袁野, 吴安彪. 用户行为驱动的时序影响力最大化问题研究 Study on Temporal Influence Maximization Driven by User Behavior 计算机科学, 2022, 49(6): 119-126. https://doi.org/10.11896/jsjkx.210700145 |
[12] | 董奇达, 王喆, 吴松洋. 结合注意力机制与几何信息的特征融合框架 Feature Fusion Framework Combining Attention Mechanism and Geometric Information 计算机科学, 2022, 49(5): 129-134. https://doi.org/10.11896/jsjkx.210300180 |
[13] | 余皑欣, 冯秀芳, 孙静宇. 结合物品相似性的社交信任推荐算法 Social Trust Recommendation Algorithm Combining Item Similarity 计算机科学, 2022, 49(5): 144-151. https://doi.org/10.11896/jsjkx.210300217 |
[14] | 李勇, 吴京鹏, 张钟颖, 张强. 融合快速注意力机制的节点无特征网络链路预测算法 Link Prediction for Node Featureless Networks Based on Faster Attention Mechanism 计算机科学, 2022, 49(4): 43-48. https://doi.org/10.11896/jsjkx.210800276 |
[15] | 畅雅雯, 杨波, 高玥琳, 黄靖云. 基于SEIR的微信公众号信息传播建模与分析 Modeling and Analysis of WeChat Official Account Information Dissemination Based on SEIR 计算机科学, 2022, 49(4): 56-66. https://doi.org/10.11896/jsjkx.210900169 |
|