计算机科学 ›› 2019, Vol. 46 ›› Issue (6): 143-147.doi: 10.11896/j.issn.1002-137X.2019.06.021

• 信息安全 • 上一篇    下一篇

基于主题模型的社交网络匿名用户重识别

吕志泉1, 李昊2, 张宗福2, 张敏2   

  1. (国家计算机网络应急技术处理协调中心 北京100029)1
    (中国科学院软件研究所可信计算与信息保障实验室 北京100190)2
  • 收稿日期:2019-02-21 发布日期:2019-06-24
  • 通讯作者: 李 昊(1983-),男,博士,副研究员,CCF会员,主要研究方向为数据隐私保护、可信计算、访问控制,E-mail:lihao@iscas.ac.cn
  • 作者简介:吕志泉(1986-),男,博士,主要研究方向为网络与系统安全;张宗福(1991-),男,硕士,主要研究方向为数据隐私保护;张 敏(1975-),女,博士,研究员,主要研究方向为数据隐私保护、可信计算。
  • 基金资助:
    国家自然科学基金(61402456)资助。

Topic-based Re-identification for Anonymous Users in Social Network

LV Zhi-quan1, LI Hao2, ZHANG Zong-fu2, ZHANG Min2   

  1. (National Computer Network Emergency Response Technical Team & Coordination Center of China,Beijing 100029,China)1
    (Department of TCA,Institute of Software,Chinese Academy of Sciences,Beijing 100190,China)2
  • Received:2019-02-21 Published:2019-06-24

摘要: 近年来,社交网络已成为人们日常生活的一部分。社交网络在为人们的社交活动带来便利的同时,也对个人隐私造成了威胁。通常情况下,人们都希望对自身的部分私密社交活动信息进行保护,以阻止亲属、朋友、同事或其他特定群体的访问。较为常见的一种保护措施是以匿名方式进行社交。一些社交网络会为用户提供匿名机制,允许用户以匿名的形式进行部分社交活动,从而将这部分社交活动与主账号分隔开,以达到隐私保护的目的。此外,用户也可以创建额外的账号(小号),并将该账号的属性、朋友关系与主账号进行区别。针对这些保护措施,文中提出了一种基于主题模型的社交网络匿名用户重识别方法。该方法将用户匿名方式(或小号)和非匿名方式(主账号)发布的文本内容进行主题挖掘,并在主题模型的基础上引入时间因素和文本长度因素来构建用户画像,最后通过分析匿名(小号)和非匿名(主账号)用户画像之间的相似度来实现用户身份的重识别。在真实社交网络数据集上的实验表明,该方法能够有效地对社交网络匿名用户或“小号”用户实施身份重识别攻击。

关键词: 大数据, 社交网络, 隐私保护, 匿名, 身份重识别

Abstract: Social network has become part of people’s daily life recently,and brings convenience to our social activities.However,it poses threats to our personal privacy at the same time.Usually,people want to protect part of their private social activity information to prevent relatives,friends,colleagues or other specific groups from visiting.One common protective method is to socialize anonymously.And some social networks provide anonymity mechanisms for users,allowing them to hide some private information about social activities,thus separating these social activities from the main account.In addition,users can create alternate accounts and set different attributes,friendships to achieve the same aim.This paper proposed a topic-based re-identification method for social network users to make an attack on these protection mechanisms.The text contents published by anonymous users (or alternate accounts) and non-anonymous users (main accounts) are analyzed based on topic model.And the time factor and text length factor are introduced to construct user profiles in order to improve the accuracy ofthe proposed method.Then the similarity between anonymous and non-anonymous user profiles is analyzed to match their identities.Finally,experiments on real social network dataset show that the proposed method can effectively improve the accuracy of re-identification for users in social networks.

Key words: Big data, Social networks, Privacy protection, Anonymity, Re-identification

中图分类号: 

  • TP309
[1]FENG D G,ZHANG M,LI H.Big Data Security and Privacy Protection[J].Chinese Journal of Computers,2014,37(1):246-258.(in Chinese)
冯登国,张敏,李昊.大数据安全与隐私保护[J].计算机学报,2014,37(1):246-258.
[2]PERITO D,CASTELLUCCIA C,KAAFAR M A,et al.How Unique and Traceable Are Usernames?[C]∥Proceedings of the 11th international conference on Privacy enhancing techno-logies.2011:1-17.
[3]LIU J,ZHANG F,SONG X,et al.What’s in a name?:an unsupervised approach to link users across communities[C]∥ACM International Conference on Web Search and Data Mining.ACM,2013:495-504.
[4]MALHOTRA A,TOTTI L,MEIRA W,et al.Studying User Footprints in Different Online Social Networks[C]∥IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.IEEE,2012:1065-1070.
[5]VOSECKY J,HONG D,SHEN V Y.User identification across multiple social networks[C]∥2009 First International Confe-rence on Networked Digital Technologies.IEEE,2009:360-365.
[6]ZANG H,BOLOT J.Anonymization of location data does not work:A large-scale measurement study[C]∥Proceedings of the 17th Annual International Conference on Mobile Computing and Networking.New York:ACM,2011:145-156.
[7]WANG H,GAO C,LI Y,et al.De-anonymization of mobility trajectories:Dissecting the gaps between theory and practice[C]∥Proceedings of The 25th Annual Network & Distributed System Security Symposium (NDSS’18).2018.
[8]WANG R,ZHANG M,FENG D,et al.A de-anonymization attack on geo-located data considering spatio-temporal influences[C]∥Proceedings of the 2015 International Conference on Information and Communications Security.Springer,Cham,2015:478-484.
[9]CHEN Z,FU Y,ZHANG M,et al.The De-anonymization Method Based on User Spatio-Temporal Mobility Trace[C]∥Proceedings of the 2017 International Conference on Information and Communications Security.Cham:Springer,2017:459-471.
[10]NARAYANAN A,SHMATIKOV V.De-anonymizing social networks[C]∥30th IEEE Symposium on Security and Privacy.IEEE,2009:173-187.
[11]FU H,ZHANG A,XIE X.De-anonymizing social graphs via node similarity[C]∥International Conference on World Wide Web.2014:263-264.
[12]LIN S H,LIAO M H.Towards publishing social network data with graph anonymization[J].Journal of Intelligent & Fuzzy Systems,2016,30(1):333-345.
[13]YUAN Y,WANG G,XU J Y,et al.Efficient distributed subgraph similarity matching[J].The VLDB Journal,2015,24(3):369-394.
[14]SERGEY B,ANTON K,SEUNGTAEK P,et al.Joint link-at-tribute user identity resolution in online social networks[C]∥The 6th SNA-KDD Workshop.2012:1-9.
[15]ZHANG L,ZHANG W.Edge anonymity in social network graphs[C]∥Proceedings of the 2009 International Conference on Computational Science and Engineering,Piscataway,NJ:IEEE.2009(4):1-8.
[16]TASSA T,COHEN D J.Anonymization of Centralized and Distributed Social Networks by Sequential Clustering[J].IEEE Transactions on Knowledge and Data Engineering,2013,25(2):311-324.
[17]ZHENG R,LI J,CHEN H,et al.A framework for authorship identification of online messages:Writing-style features and classification techniques[J].Journal of the Association for Information Science and Technology,2006,57(3):378-393.
[18]KONG X,ZHANG J,YU P S.Inferring anchor links across multiple heterogeneous social networks[C]∥Proceedings of the 22nd ACM International Conference on Information & Know-ledge Management.ACM,2013:179-188.
[19]ZHANG Y,WU Y,YANG Q.Community Discovery in Twitter Based on User Interests[J].Journal of Computational Information Systems,2012,8(3):991-1000.
[20]YAN G H,SHU X,MA Z C,et al.Community discovery for microblog based on topic and link analysis[J].Application Research of Computers,2013,30(7):1953-1957.(in Chinese)
闫光辉,舒昕,马志程,等.基于主题和链接分析的微博社区发现算法[J].计算机应用研究,2013,30(7):1953-1957.
[1] 余雪勇, 陈涛. 边缘计算场景中基于虚拟映射的隐私保护卸载算法[J]. 计算机科学, 2021, 48(1): 65-71.
[2] 叶雅珍, 刘国华, 朱扬勇. 数据产品流通的两阶段授权模式[J]. 计算机科学, 2021, 48(1): 119-124.
[3] 马理博, 秦小麟. 话题-位置-类别感知的兴趣点推荐[J]. 计算机科学, 2020, 47(9): 81-87.
[4] 赵会群, 吴凯锋. 一种大数据估价算法[J]. 计算机科学, 2020, 47(9): 110-116.
[5] 马梦宇, 吴烨, 陈荦, 伍江江, 李军, 景宁. 显示导向型的大规模地理矢量实时可视化技术[J]. 计算机科学, 2020, 47(9): 117-122.
[6] 李彦, 申德荣, 聂铁铮, 寇月. 面向加密云数据的多关键字语义搜索方法[J]. 计算机科学, 2020, 47(9): 318-323.
[7] 朝乐门. 数据科学导论的课程设计及教学改革[J]. 计算机科学, 2020, 47(7): 1-7.
[8] 郭蕊, 芦天亮, 杜彦辉, 周杨, 潘孝勤, 刘晓晨. 基于改进蚁群算法的WSN源位置隐私保护[J]. 计算机科学, 2020, 47(7): 307-313.
[9] 陈晋音, 张敦杰, 林翔, 徐晓东, 朱子凌. 基于影响力最大化策略的抑制虚假消息传播的方法[J]. 计算机科学, 2020, 47(6A): 17-23.
[10] 张王策, 范菁, 王渤茹, 倪旻. 面向缺损数据的(α,k)-匿名模型[J]. 计算机科学, 2020, 47(6A): 395-399.
[11] 顾荣杰, 吴治平, 石焕. 基于TFR 模型的公安云平台数据分级分类安全访问控制模型研究[J]. 计算机科学, 2020, 47(6A): 400-403.
[12] 李泳. 基于BigQuant 大数据平台的股票投资策略开发[J]. 计算机科学, 2020, 47(6A): 612-615.
[13] 葛雨明, 韩庆文, 王妙琼, 曾令秋, 李璐. 汽车大数据应用模式与挑战分析[J]. 计算机科学, 2020, 47(6): 59-65.
[14] 刘纪芹, 史开泉. 大数据分解-融合及其智能获取[J]. 计算机科学, 2020, 47(6): 66-73.
[15] 梁俊斌, 张敏, 蒋婵. 社交传感云安全研究进展[J]. 计算机科学, 2020, 47(6): 276-283.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .