计算机科学 ›› 2019, Vol. 46 ›› Issue (6): 143-147.doi: 10.11896/j.issn.1002-137X.2019.06.021

• 信息安全 • 上一篇    下一篇

基于主题模型的社交网络匿名用户重识别

吕志泉1, 李昊2, 张宗福2, 张敏2   

  1. (国家计算机网络应急技术处理协调中心 北京100029)1
    (中国科学院软件研究所可信计算与信息保障实验室 北京100190)2
  • 收稿日期:2019-02-21 发布日期:2019-06-24
  • 通讯作者: 李 昊(1983-),男,博士,副研究员,CCF会员,主要研究方向为数据隐私保护、可信计算、访问控制,E-mail:lihao@iscas.ac.cn
  • 作者简介:吕志泉(1986-),男,博士,主要研究方向为网络与系统安全;张宗福(1991-),男,硕士,主要研究方向为数据隐私保护;张 敏(1975-),女,博士,研究员,主要研究方向为数据隐私保护、可信计算。
  • 基金资助:
    国家自然科学基金(61402456)资助。

Topic-based Re-identification for Anonymous Users in Social Network

LV Zhi-quan1, LI Hao2, ZHANG Zong-fu2, ZHANG Min2   

  1. (National Computer Network Emergency Response Technical Team & Coordination Center of China,Beijing 100029,China)1
    (Department of TCA,Institute of Software,Chinese Academy of Sciences,Beijing 100190,China)2
  • Received:2019-02-21 Published:2019-06-24

摘要: 近年来,社交网络已成为人们日常生活的一部分。社交网络在为人们的社交活动带来便利的同时,也对个人隐私造成了威胁。通常情况下,人们都希望对自身的部分私密社交活动信息进行保护,以阻止亲属、朋友、同事或其他特定群体的访问。较为常见的一种保护措施是以匿名方式进行社交。一些社交网络会为用户提供匿名机制,允许用户以匿名的形式进行部分社交活动,从而将这部分社交活动与主账号分隔开,以达到隐私保护的目的。此外,用户也可以创建额外的账号(小号),并将该账号的属性、朋友关系与主账号进行区别。针对这些保护措施,文中提出了一种基于主题模型的社交网络匿名用户重识别方法。该方法将用户匿名方式(或小号)和非匿名方式(主账号)发布的文本内容进行主题挖掘,并在主题模型的基础上引入时间因素和文本长度因素来构建用户画像,最后通过分析匿名(小号)和非匿名(主账号)用户画像之间的相似度来实现用户身份的重识别。在真实社交网络数据集上的实验表明,该方法能够有效地对社交网络匿名用户或“小号”用户实施身份重识别攻击。

关键词: 大数据, 匿名, 社交网络, 身份重识别, 隐私保护

Abstract: Social network has become part of people’s daily life recently,and brings convenience to our social activities.However,it poses threats to our personal privacy at the same time.Usually,people want to protect part of their private social activity information to prevent relatives,friends,colleagues or other specific groups from visiting.One common protective method is to socialize anonymously.And some social networks provide anonymity mechanisms for users,allowing them to hide some private information about social activities,thus separating these social activities from the main account.In addition,users can create alternate accounts and set different attributes,friendships to achieve the same aim.This paper proposed a topic-based re-identification method for social network users to make an attack on these protection mechanisms.The text contents published by anonymous users (or alternate accounts) and non-anonymous users (main accounts) are analyzed based on topic model.And the time factor and text length factor are introduced to construct user profiles in order to improve the accuracy ofthe proposed method.Then the similarity between anonymous and non-anonymous user profiles is analyzed to match their identities.Finally,experiments on real social network dataset show that the proposed method can effectively improve the accuracy of re-identification for users in social networks.

Key words: Anonymity, Big data, Privacy protection, Re-identification, Social networks

中图分类号: 

  • TP309
[1]FENG D G,ZHANG M,LI H.Big Data Security and Privacy Protection[J].Chinese Journal of Computers,2014,37(1):246-258.(in Chinese)
冯登国,张敏,李昊.大数据安全与隐私保护[J].计算机学报,2014,37(1):246-258.
[2]PERITO D,CASTELLUCCIA C,KAAFAR M A,et al.How Unique and Traceable Are Usernames?[C]∥Proceedings of the 11th international conference on Privacy enhancing techno-logies.2011:1-17.
[3]LIU J,ZHANG F,SONG X,et al.What’s in a name?:an unsupervised approach to link users across communities[C]∥ACM International Conference on Web Search and Data Mining.ACM,2013:495-504.
[4]MALHOTRA A,TOTTI L,MEIRA W,et al.Studying User Footprints in Different Online Social Networks[C]∥IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.IEEE,2012:1065-1070.
[5]VOSECKY J,HONG D,SHEN V Y.User identification across multiple social networks[C]∥2009 First International Confe-rence on Networked Digital Technologies.IEEE,2009:360-365.
[6]ZANG H,BOLOT J.Anonymization of location data does not work:A large-scale measurement study[C]∥Proceedings of the 17th Annual International Conference on Mobile Computing and Networking.New York:ACM,2011:145-156.
[7]WANG H,GAO C,LI Y,et al.De-anonymization of mobility trajectories:Dissecting the gaps between theory and practice[C]∥Proceedings of The 25th Annual Network & Distributed System Security Symposium (NDSS’18).2018.
[8]WANG R,ZHANG M,FENG D,et al.A de-anonymization attack on geo-located data considering spatio-temporal influences[C]∥Proceedings of the 2015 International Conference on Information and Communications Security.Springer,Cham,2015:478-484.
[9]CHEN Z,FU Y,ZHANG M,et al.The De-anonymization Method Based on User Spatio-Temporal Mobility Trace[C]∥Proceedings of the 2017 International Conference on Information and Communications Security.Cham:Springer,2017:459-471.
[10]NARAYANAN A,SHMATIKOV V.De-anonymizing social networks[C]∥30th IEEE Symposium on Security and Privacy.IEEE,2009:173-187.
[11]FU H,ZHANG A,XIE X.De-anonymizing social graphs via node similarity[C]∥International Conference on World Wide Web.2014:263-264.
[12]LIN S H,LIAO M H.Towards publishing social network data with graph anonymization[J].Journal of Intelligent & Fuzzy Systems,2016,30(1):333-345.
[13]YUAN Y,WANG G,XU J Y,et al.Efficient distributed subgraph similarity matching[J].The VLDB Journal,2015,24(3):369-394.
[14]SERGEY B,ANTON K,SEUNGTAEK P,et al.Joint link-at-tribute user identity resolution in online social networks[C]∥The 6th SNA-KDD Workshop.2012:1-9.
[15]ZHANG L,ZHANG W.Edge anonymity in social network graphs[C]∥Proceedings of the 2009 International Conference on Computational Science and Engineering,Piscataway,NJ:IEEE.2009(4):1-8.
[16]TASSA T,COHEN D J.Anonymization of Centralized and Distributed Social Networks by Sequential Clustering[J].IEEE Transactions on Knowledge and Data Engineering,2013,25(2):311-324.
[17]ZHENG R,LI J,CHEN H,et al.A framework for authorship identification of online messages:Writing-style features and classification techniques[J].Journal of the Association for Information Science and Technology,2006,57(3):378-393.
[18]KONG X,ZHANG J,YU P S.Inferring anchor links across multiple heterogeneous social networks[C]∥Proceedings of the 22nd ACM International Conference on Information & Know-ledge Management.ACM,2013:179-188.
[19]ZHANG Y,WU Y,YANG Q.Community Discovery in Twitter Based on User Interests[J].Journal of Computational Information Systems,2012,8(3):991-1000.
[20]YAN G H,SHU X,MA Z C,et al.Community discovery for microblog based on topic and link analysis[J].Application Research of Computers,2013,30(7):1953-1957.(in Chinese)
闫光辉,舒昕,马志程,等.基于主题和链接分析的微博社区发现算法[J].计算机应用研究,2013,30(7):1953-1957.
[1] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于分层抽样优化的面向异构客户端的联邦学习
Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients
计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[2] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3] 吕由, 吴文渊.
隐私保护线性回归方案与应用
Privacy-preserving Linear Regression Scheme and Its Application
计算机科学, 2022, 49(9): 318-325. https://doi.org/10.11896/jsjkx.220300190
[4] 陈晶, 吴玲玲.
多源异构环境下的车联网大数据混合属性特征检测方法
Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment
计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273
[5] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[6] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[7] 王健.
基于隐私保护的反向传播神经网络学习算法
Back-propagation Neural Network Learning Algorithm Based on Privacy Preserving
计算机科学, 2022, 49(6A): 575-580. https://doi.org/10.11896/jsjkx.211100155
[8] 魏鹏, 马玉亮, 袁野, 吴安彪.
用户行为驱动的时序影响力最大化问题研究
Study on Temporal Influence Maximization Driven by User Behavior
计算机科学, 2022, 49(6): 119-126. https://doi.org/10.11896/jsjkx.210700145
[9] 李利, 何欣, 韩志杰.
群智感知的隐私保护研究综述
Review of Privacy-preserving Mechanisms in Crowdsensing
计算机科学, 2022, 49(5): 303-310. https://doi.org/10.11896/jsjkx.210400077
[10] 余皑欣, 冯秀芳, 孙静宇.
结合物品相似性的社交信任推荐算法
Social Trust Recommendation Algorithm Combining Item Similarity
计算机科学, 2022, 49(5): 144-151. https://doi.org/10.11896/jsjkx.210300217
[11] 畅雅雯, 杨波, 高玥琳, 黄靖云.
基于SEIR的微信公众号信息传播建模与分析
Modeling and Analysis of WeChat Official Account Information Dissemination Based on SEIR
计算机科学, 2022, 49(4): 56-66. https://doi.org/10.11896/jsjkx.210900169
[12] 孙轩, 王焕骁.
政务大数据安全防护能力建设:基于技术和管理视角的探讨
Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives
计算机科学, 2022, 49(4): 67-73. https://doi.org/10.11896/jsjkx.211000010
[13] 左园林, 龚月姣, 陈伟能.
成本受限条件下的社交网络影响最大化方法
Budget-aware Influence Maximization in Social Networks
计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228
[14] 王美珊, 姚兰, 高福祥, 徐军灿.
面向医疗集值数据的差分隐私保护技术研究
Study on Differential Privacy Protection for Medical Set-Valued Data
计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032
[15] 郭磊, 马廷淮.
基于好友亲密度的用户匹配
Friend Closeness Based User Matching
计算机科学, 2022, 49(3): 113-120. https://doi.org/10.11896/jsjkx.210200137
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!