计算机科学 ›› 2019, Vol. 46 ›› Issue (4): 77-82.doi: 10.11896/j.issn.1002-137X.2019.04.012

• 大数据与数据科学 • 上一篇    下一篇

基于多信息融合表示学习的关联用户挖掘算法

韩忠明1,2, 郑晨烨1, 段大高1, 董健3   

  1. 北京工商大学计算机与信息工程学院 北京1000481
    食品安全大数据技术北京市重点实验室 北京1000482
    信息网络安全公安部重点实验室公安部第三研究所 上海2000313
  • 收稿日期:2018-11-09 出版日期:2019-04-15 发布日期:2019-04-23
  • 通讯作者: 韩忠明(1972-),男,博士,教授,主要研究方向为社会网络、数据挖掘、大数据处理工作,E-mail:hanzm@th.btbu.edu.cn(通信作者)
  • 作者简介:郑晨烨(1994-),女,硕士生,主要研究方向为社会网络、数据挖掘;段大高(1976-),男,博士,副教授,主要研究方向为社会计算、多媒体信息处理;董 健(1974-),男,博士,高级工程师,主要研究方向为网络数据挖掘、网络安全。
  • 基金资助:
    本文受国家自然科学基金(61170112)资助。

Associated Users Mining Algorithm Based on Multi-information Fusion Representation Learning

HAN Zhong-ming1,2, ZHENG Chen-ye1, DUAN Da-gao1, DONG Jian3   

  1. School of Computer and Information Engineering,Beijing Technology and Business University,Beijing 100048,China1
    Beijing Key Laboratory of Food Safety Big Data Technology,Beijing 100048,China2
    The Third Research Institute of The Ministry of Public Security,The Ministry of Public Security Key Laboratory of Information Network Security,Shanghai 200031,China3
  • Received:2018-11-09 Online:2019-04-15 Published:2019-04-23

摘要: 随着互联网技术的迅速发展和普及,越来越多的用户开始通过社会网络进行各种信息的分享与交流。网络中同一用户可能申请多个不同账号进行信息发布,这些账号构成了网络中的关联用户。准确、有效地挖掘社会网络中的关联用户能够抑制网络中的虚假信息和不法行为,从而保证网络环境的安全性和公平性。现有的关联用户挖掘方法仅考虑了用户属性或用户关系信息,未对网络中含有的多类信息进行有效融合以及综合考虑。此外,大多数方法借鉴其他领域的方法进行研究,如去匿名化问题,这些方法不能准确解决关联用户挖掘问题。为此,文中针对网络关联用户挖掘问题,提出了基于多信息融合表示学习的关联用户挖掘算法(Associated Users Mining Algorithm based on Multi-information fusion Representation Learning,AUMA-MRL)。该算法使用网络表示学习的思想对网络中多种不同维度的信息(如用户属性、网络拓扑结构等)进行学习,并将学习得到的表示进行有效融合,从而得到多信息融合的节点嵌入。这些嵌入可以准确表征网络中的多类信息,基于习得的节点嵌入构造相似性向量,从而对网络中的关联用户进行挖掘。文中基于3个真实网络数据对所提算法进行验证,实验网络数据包括蛋白质网络PPI以及社交网络Flickr和Facebook,使用关联用户挖掘结果的精度和召回率作为性能评价指标对所提算法进行有效性验证。结果表明,与现有经典算法相比,所提算法的召回率平均提高了17.5%,能够对网络中的关联用户进行有效挖掘。

关键词: 关联用户, 社会网络安全, 表示学习, 用户嵌入

Abstract: With rapid development and popularization of Internet technologies,more and more users have begun to share and exchange various information through social networks.The same user in the network may apply for multiple diffe-rent accounts to distribute information,and these accounts constitute the associated users in the network.Effectively mining associated users in social networks can suppress false information and illegal behaviors in the network,and thus ensure the security and fairness of the network environment.Existing associated user mining methods only consider user attributes or user relationship information without merging multiple types of information contained in the network comprehensively.In addition,most methods draw lessons from the methods in other fields,such as de-anonymization,and they can’t accurately solve the problem of associated user mining.In light of this,this paper proposed an associated user mining algorithm based on multi-information fusion representation learning(AUMA-MRL).In this algorithm,the idea of network representation learning is utilized to learn various dimensional information in the networks,such as user attributes,network topology,etc.Then the learned multi-information is effectively fused to obtain multi-information node embedding,which can accurately characterize multiple types of information in networks,and mine associated users in networks through similarity vectors between node embedding.The proposed algorithm was validated on three real networks namely protein network PPI and social network Flickr,Facebook.In the experiment,the accuracy and recall rate is selected as the performance evaluation indexes.The results show that the recall rate of proposed algorithm is increased by 17.5% on average compared with the existing classical algorithms,and it can effectively mine associated users in networks.

Key words: Associated users, Social networks security, Representation learning, Node embedding

中图分类号: 

  • TP311
[1]ZHOU X P,LIANG X,ZHAO J C,et al.A Survey of Related User Mining Methods for Social Network[J].Journal of Software,2017,28(6):1565-1583.(in Chinese) 周小平,梁循,赵吉超,等.面向社会网络融合的关联用户挖掘方法综述[J].软件学报,2017,28(6):1565-1583.
[2]CAI J,STRUBE M.End-to-end coreference resolution via hy- pergraph partitioning[C]∥Proceedings of the 23rd InternationalConference on Computational Linguistics.Association for Computational Linguistics,2010:143-151.
[3]WANG J,LI G,YU J X,et al.Entity matching:How similar is similar[J] Proceedings of the VLDB Endowment,2011,4(10):622-633.
[4]KALASHNIKOVD V,CHEN Z Q,MEHROTRA S,et al.Web People Search via Connection Analysis[J].IEEE Transactions on Knowledge and Data Engineering,2008,20(11):1550-1565.
[5]QIAN Y,HU Y,CUI J,et al.Combining machine learning and human judgment in author disambiguation[C]∥Proceedings of the 20th ACM International Conference on Information and Knowledge Management.ACM,2011:1241-1246.
[6]TANG J,FONG A C M,WANG B,et al.A Unified Probabilistic Framework for Name Disambiguation in Digital Library[J].IEEE Transactions on Knowledge and Data Engineering,2012,24(6):975-987.
[7]LIU J,ZHANG F,SONG X,et al.What’s in a name? an unsupervised approach to link users across communities[C]∥ACM International Conference on Web Search and Data Mining.ACM,2013:495-504.
[8]ZAFARANI R,LIU H.Connecting users across social media sites:a behavioral-modeling approach[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning.ACM,2013:41-49.
[9]ZHANG H,KAN M Y,LIU Y,et al.Online Social Network Profile Linkage[M]∥Information Retrieval Technology.Springer International Publishing,2014:197-208.
[10]NARAYANAN A,SHMATIKOV V.De-anonymizing Social Networks[C]∥Security and Privacy IEEE Symposium.IEEE,2009:173-187.
[11]ZHOU X,LIANG X,ZHANG H,et al.Cross-Platform Identification of Anonymous Identical Users in Multiple Social Media Networks[J].IEEE Transactions on Knowledge and Data Engineering,2015,28(2):411-424.
[12]FU H,ZHANG A,XIE X.Effective social graph deanonymization based on graph structure and descriptive information[C]∥ACM Transactions on Intelligent Systems and Technology (TIST),2015,6(4):1-29.
[13]SINGH R,XU J B,BERGER B.Global alignment of multiple protein interaction networks with application to functional orthology detection[J].Proceedings of the National Academy of Sciences of the United States of America,2008,105(35):12763-12768.
[14]CAI H Y,ZHENG V W,CHANG K.A Comprehensive Survey of Graph Embedding:Problems,Techniques and Applications[J].IEEE Transactions on Knowledge and Data Engineering,2018,30(9):1616-1637.
[15]WANG D,CUI P,ZHU W.Structural Deep Network Embed- ding[C]∥Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2016:1225-1234.
[16]KIPF T N,WELLING M.Semi-Supervised Classification with Graph Convolutional Networks[J].arXiv preprint arXiv:1609.02907,2016.
[17]HAMILTON W,YING Z,LESKOVEC J.Inductive representation learning on large graphs[M]∥Advances in Neural Information Processing Systems.Bertin:Springer,2017:1024-1034.
[18]ANDREW G,ARORA R,BILMES J,et al.Deep canonical correlation analysis[C]∥International Conference on Machine Learning.JMLR.org,2013:1247-1255.
[19]BURGES C J C.A Tutorial on Support Vector Machines for Pattern Recognition[J].Data Mining and Knowledge Discovery,1998,2(2):121-167.
[1] 丁钰, 魏浩, 潘志松, 刘鑫. 网络表示学习算法综述[J]. 计算机科学, 2020, 47(9): 52-59.
[2] 蒋宗礼, 李苗苗, 张津丽. 基于融合元路径图卷积的异质网络表示学习[J]. 计算机科学, 2020, 47(7): 231-235.
[3] 黄易, 申国伟, 赵文波, 郭春. 一种基于漏洞威胁模式的网络表示学习算法[J]. 计算机科学, 2020, 47(7): 292-298.
[4] 张志扬, 张凤荔, 陈学勤, 王瑞锦. 基于分层注意力的信息级联预测模型[J]. 计算机科学, 2020, 47(6): 201-209.
[5] 李鑫超, 李培峰, 朱巧明. 一种基于改进向量投影距离的知识图谱表示方法[J]. 计算机科学, 2020, 47(4): 189-193.
[6] 张虎, 周晶晶, 高海慧, 王鑫. 融合节点结构和内容的网络表示学习方法[J]. 计算机科学, 2020, 47(12): 119-124.
[7] 顾秋阳, 琚春华, 吴功兴. 融入深度自编码器与网络表示学习的社交网络信息推荐模型[J]. 计算机科学, 2020, 47(11): 101-112.
[8] 陈晓军, 向阳. STransH:一种改进的基于翻译模型的知识表示模型[J]. 计算机科学, 2019, 46(9): 184-189.
[9] 冶忠林, 赵海兴, 张科, 朱宇. 基于多视图集成的网络表示学习算法[J]. 计算机科学, 2019, 46(1): 117-125.
[10] 朱陶,任海军,洪卫军. 一种基于前向无监督卷积神经网络的人脸表示学习方法[J]. 计算机科学, 2016, 43(6): 303-307.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .