计算机科学 ›› 2019, Vol. 46 ›› Issue (4): 77-82.doi: 10.11896/j.issn.1002-137X.2019.04.012

• 大数据与数据科学 • 上一篇    下一篇

基于多信息融合表示学习的关联用户挖掘算法

韩忠明1,2, 郑晨烨1, 段大高1, 董健3   

  1. 北京工商大学计算机与信息工程学院 北京1000481
    食品安全大数据技术北京市重点实验室 北京1000482
    信息网络安全公安部重点实验室公安部第三研究所 上海2000313
  • 收稿日期:2018-11-09 出版日期:2019-04-15 发布日期:2019-04-23
  • 通讯作者: 韩忠明(1972-),男,博士,教授,主要研究方向为社会网络、数据挖掘、大数据处理工作,E-mail:hanzm@th.btbu.edu.cn(通信作者)
  • 作者简介:郑晨烨(1994-),女,硕士生,主要研究方向为社会网络、数据挖掘;段大高(1976-),男,博士,副教授,主要研究方向为社会计算、多媒体信息处理;董 健(1974-),男,博士,高级工程师,主要研究方向为网络数据挖掘、网络安全。
  • 基金资助:
    本文受国家自然科学基金(61170112)资助。

Associated Users Mining Algorithm Based on Multi-information Fusion Representation Learning

HAN Zhong-ming1,2, ZHENG Chen-ye1, DUAN Da-gao1, DONG Jian3   

  1. School of Computer and Information Engineering,Beijing Technology and Business University,Beijing 100048,China1
    Beijing Key Laboratory of Food Safety Big Data Technology,Beijing 100048,China2
    The Third Research Institute of The Ministry of Public Security,The Ministry of Public Security Key Laboratory of Information Network Security,Shanghai 200031,China3
  • Received:2018-11-09 Online:2019-04-15 Published:2019-04-23

摘要: 随着互联网技术的迅速发展和普及,越来越多的用户开始通过社会网络进行各种信息的分享与交流。网络中同一用户可能申请多个不同账号进行信息发布,这些账号构成了网络中的关联用户。准确、有效地挖掘社会网络中的关联用户能够抑制网络中的虚假信息和不法行为,从而保证网络环境的安全性和公平性。现有的关联用户挖掘方法仅考虑了用户属性或用户关系信息,未对网络中含有的多类信息进行有效融合以及综合考虑。此外,大多数方法借鉴其他领域的方法进行研究,如去匿名化问题,这些方法不能准确解决关联用户挖掘问题。为此,文中针对网络关联用户挖掘问题,提出了基于多信息融合表示学习的关联用户挖掘算法(Associated Users Mining Algorithm based on Multi-information fusion Representation Learning,AUMA-MRL)。该算法使用网络表示学习的思想对网络中多种不同维度的信息(如用户属性、网络拓扑结构等)进行学习,并将学习得到的表示进行有效融合,从而得到多信息融合的节点嵌入。这些嵌入可以准确表征网络中的多类信息,基于习得的节点嵌入构造相似性向量,从而对网络中的关联用户进行挖掘。文中基于3个真实网络数据对所提算法进行验证,实验网络数据包括蛋白质网络PPI以及社交网络Flickr和Facebook,使用关联用户挖掘结果的精度和召回率作为性能评价指标对所提算法进行有效性验证。结果表明,与现有经典算法相比,所提算法的召回率平均提高了17.5%,能够对网络中的关联用户进行有效挖掘。

关键词: 表示学习, 关联用户, 社会网络安全, 用户嵌入

Abstract: With rapid development and popularization of Internet technologies,more and more users have begun to share and exchange various information through social networks.The same user in the network may apply for multiple diffe-rent accounts to distribute information,and these accounts constitute the associated users in the network.Effectively mining associated users in social networks can suppress false information and illegal behaviors in the network,and thus ensure the security and fairness of the network environment.Existing associated user mining methods only consider user attributes or user relationship information without merging multiple types of information contained in the network comprehensively.In addition,most methods draw lessons from the methods in other fields,such as de-anonymization,and they can’t accurately solve the problem of associated user mining.In light of this,this paper proposed an associated user mining algorithm based on multi-information fusion representation learning(AUMA-MRL).In this algorithm,the idea of network representation learning is utilized to learn various dimensional information in the networks,such as user attributes,network topology,etc.Then the learned multi-information is effectively fused to obtain multi-information node embedding,which can accurately characterize multiple types of information in networks,and mine associated users in networks through similarity vectors between node embedding.The proposed algorithm was validated on three real networks namely protein network PPI and social network Flickr,Facebook.In the experiment,the accuracy and recall rate is selected as the performance evaluation indexes.The results show that the recall rate of proposed algorithm is increased by 17.5% on average compared with the existing classical algorithms,and it can effectively mine associated users in networks.

Key words: Associated users, Node embedding, Representation learning, Social networks security

中图分类号: 

  • TP311
[1]ZHOU X P,LIANG X,ZHAO J C,et al.A Survey of Related User Mining Methods for Social Network[J].Journal of Software,2017,28(6):1565-1583.(in Chinese) 周小平,梁循,赵吉超,等.面向社会网络融合的关联用户挖掘方法综述[J].软件学报,2017,28(6):1565-1583.
[2]CAI J,STRUBE M.End-to-end coreference resolution via hy- pergraph partitioning[C]∥Proceedings of the 23rd InternationalConference on Computational Linguistics.Association for Computational Linguistics,2010:143-151.
[3]WANG J,LI G,YU J X,et al.Entity matching:How similar is similar[J] Proceedings of the VLDB Endowment,2011,4(10):622-633.
[4]KALASHNIKOVD V,CHEN Z Q,MEHROTRA S,et al.Web People Search via Connection Analysis[J].IEEE Transactions on Knowledge and Data Engineering,2008,20(11):1550-1565.
[5]QIAN Y,HU Y,CUI J,et al.Combining machine learning and human judgment in author disambiguation[C]∥Proceedings of the 20th ACM International Conference on Information and Knowledge Management.ACM,2011:1241-1246.
[6]TANG J,FONG A C M,WANG B,et al.A Unified Probabilistic Framework for Name Disambiguation in Digital Library[J].IEEE Transactions on Knowledge and Data Engineering,2012,24(6):975-987.
[7]LIU J,ZHANG F,SONG X,et al.What’s in a name? an unsupervised approach to link users across communities[C]∥ACM International Conference on Web Search and Data Mining.ACM,2013:495-504.
[8]ZAFARANI R,LIU H.Connecting users across social media sites:a behavioral-modeling approach[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning.ACM,2013:41-49.
[9]ZHANG H,KAN M Y,LIU Y,et al.Online Social Network Profile Linkage[M]∥Information Retrieval Technology.Springer International Publishing,2014:197-208.
[10]NARAYANAN A,SHMATIKOV V.De-anonymizing Social Networks[C]∥Security and Privacy IEEE Symposium.IEEE,2009:173-187.
[11]ZHOU X,LIANG X,ZHANG H,et al.Cross-Platform Identification of Anonymous Identical Users in Multiple Social Media Networks[J].IEEE Transactions on Knowledge and Data Engineering,2015,28(2):411-424.
[12]FU H,ZHANG A,XIE X.Effective social graph deanonymization based on graph structure and descriptive information[C]∥ACM Transactions on Intelligent Systems and Technology (TIST),2015,6(4):1-29.
[13]SINGH R,XU J B,BERGER B.Global alignment of multiple protein interaction networks with application to functional orthology detection[J].Proceedings of the National Academy of Sciences of the United States of America,2008,105(35):12763-12768.
[14]CAI H Y,ZHENG V W,CHANG K.A Comprehensive Survey of Graph Embedding:Problems,Techniques and Applications[J].IEEE Transactions on Knowledge and Data Engineering,2018,30(9):1616-1637.
[15]WANG D,CUI P,ZHU W.Structural Deep Network Embed- ding[C]∥Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2016:1225-1234.
[16]KIPF T N,WELLING M.Semi-Supervised Classification with Graph Convolutional Networks[J].arXiv preprint arXiv:1609.02907,2016.
[17]HAMILTON W,YING Z,LESKOVEC J.Inductive representation learning on large graphs[M]∥Advances in Neural Information Processing Systems.Bertin:Springer,2017:1024-1034.
[18]ANDREW G,ARORA R,BILMES J,et al.Deep canonical correlation analysis[C]∥International Conference on Machine Learning.JMLR.org,2013:1247-1255.
[19]BURGES C J C.A Tutorial on Support Vector Machines for Pattern Recognition[J].Data Mining and Knowledge Discovery,1998,2(2):121-167.
[1] 宋杰, 梁美玉, 薛哲, 杜军平, 寇菲菲.
基于无监督集群级的科技论文异质图节点表示学习方法
Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level
计算机科学, 2022, 49(9): 64-69. https://doi.org/10.11896/jsjkx.220500196
[2] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[3] 黄璞, 杜旭然, 沈阳阳, 杨章静.
基于局部正则二次线性重构表示的人脸识别
Face Recognition Based on Locality Regularized Double Linear Reconstruction Representation
计算机科学, 2022, 49(6A): 407-411. https://doi.org/10.11896/jsjkx.210700018
[4] 蒋宗礼, 樊珂, 张津丽.
基于生成对抗网络和元路径的异质网络表示学习
Generative Adversarial Network and Meta-path Based Heterogeneous Network Representation Learning
计算机科学, 2022, 49(1): 133-139. https://doi.org/10.11896/jsjkx.201000179
[5] 王营丽, 姜聪聪, 冯小年, 钱铁云.
时间感知的兴趣点推荐方法
Time Aware Point-of-interest Recommendation
计算机科学, 2021, 48(9): 43-49. https://doi.org/10.11896/jsjkx.210400130
[6] 赵金龙, 赵中英.
基于异质信息网络表示学习与注意力神经网络的推荐算法
Recommendation Algorithm Based on Heterogeneous Information Network Embedding and Attention Neural Network
计算机科学, 2021, 48(8): 72-79. https://doi.org/10.11896/jsjkx.200800226
[7] 杨如涵, 戴毅茹, 王坚, 董津.
基于表示学习的工业领域人机物本体融合
Humans-Cyber-Physical Ontology Fusion of Industry Based on Representation Learning
计算机科学, 2021, 48(5): 190-196. https://doi.org/10.11896/jsjkx.200500023
[8] 钱胜胜, 张天柱, 徐常胜.
多媒体社会事件分析综述
Survey of Multimedia Social Events Analysis
计算机科学, 2021, 48(3): 97-112. https://doi.org/10.11896/jsjkx.210200023
[9] 王雪岑, 张昱, 刘迎婕, 于戈.
基于表示学习的在线学习交互质量评价方法
Evaluation of Quality of Interaction in Online Learning Based on Representation Learning
计算机科学, 2021, 48(2): 207-211. https://doi.org/10.11896/jsjkx.201000042
[10] 李鑫超, 李培峰, 朱巧明.
一种基于层级信息优化的有向网络表示学习方法
Directed Network Representation Method Based on Hierarchical Structure Information
计算机科学, 2021, 48(2): 100-104. https://doi.org/10.11896/jsjkx.191200033
[11] 富坤, 赵晓梦, 付紫桐, 高金辉, 马浩然.
基于不完全信息的深度网络表示学习方法
Deep Network Representation Learning Method on Incomplete Information Networks
计算机科学, 2021, 48(12): 212-218. https://doi.org/10.11896/jsjkx.201000015
[12] 潘雨, 邹军华, 王帅辉, 胡谷雨, 潘志松.
基于网络表示学习的深度社团发现方法
Deep Community Detection Algorithm Based on Network Representation Learning
计算机科学, 2021, 48(11A): 198-203. https://doi.org/10.11896/jsjkx.210200113
[13] 赵曼, 赵加坤, 刘金诺.
基于自我中心网络结构特征和网络表示学习的链路预测算法
Link Prediction Algorithm Based on Ego Networks Structure and Network Representation Learning
计算机科学, 2021, 48(11A): 211-217. https://doi.org/10.11896/jsjkx.201200231
[14] 纪南巡, 孙晓燕, 李祯其.
多源异构用户生成内容的融合向量化表示学习
Fusion Vectorized Representation Learning of Multi-source Heterogeneous User-generated Contents
计算机科学, 2021, 48(10): 51-58. https://doi.org/10.11896/jsjkx.200900194
[15] 樊连玺, 刘彦北, 王雯, 耿磊, 吴骏, 张芳, 肖志涛.
基于多模态表示学习的阿尔兹海默症诊断算法
Multimodal Representation Learning for Alzheimer's Disease Diagnosis
计算机科学, 2021, 48(10): 107-113. https://doi.org/10.11896/jsjkx.200900178
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!