计算机科学 ›› 2018, Vol. 45 ›› Issue (12): 111-116.doi: 10.11896/j.issn.1002-137X.2018.12.017

• 信息安全 • 上一篇    下一篇

基于行为特征分析的微博恶意用户识别

夏崇欢, 李华康, 孙国梓   

  1. (南京邮电大学计算机学院软件学院 南京210003)
  • 收稿日期:2017-11-29 出版日期:2018-12-15 发布日期:2019-02-25
  • 作者简介:夏崇欢(1991-),男,硕士生,主要研究方向为信息安全、大数据应用;李华康(1982-),男,博士,讲师,CCF会员,主要研究方向为智慧城市、大数据应用、互联网安全;孙国梓(1972-),男,博士,教授,CCF高级会员,主要研究方向为网络空间安全、电子数据取证,E-mail:sun@njupt.edu.cn(通信作者)。
  • 基金资助:
    本文受国家自然科学基金青年项目(61502247),公安部重点实验室开放课题(2015DSJSYS001),江苏省高校自然科学研究面上项目(14KJB520028)资助。

Microblogging Malicious User Identification Based on Behavior Characteristic Analysis

XIA Chong-huan, LI Hua-kang, SUN Guo-zi   

  1. (School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
  • Received:2017-11-29 Online:2018-12-15 Published:2019-02-25

摘要: 近年来,社交网络数据挖掘作为物理网络空间数据挖掘的一大热点,目前在用户行为分析、兴趣识别、产品推荐等方面都取得了令人可喜的成果。随着社交网络商业契机的到来,出现了很多恶意用户及恶意行为,给数据挖掘的效果产生了极大的影响。基于此,提出基于用户行为特征分析的恶意用户识别方法,该方法引入主成分分析方法对微博网络用户行为数据进行挖掘,对各维度特征的权重进行排序,选取前六维主成分特征可以有效识别恶意用户,主成分特征之间拟合出的新特征也能提升系统的识别性能。实验结果表明,引入的方法对微博用户特征进行了有效的排序,很好地识别出了微博社交网络中的恶意用户,为其他方向的社交网络数据挖掘提供了良好的数据清洗技术。

关键词: 恶意用户, 机器学习, 特征提取, 微博, 主成分分析法(PCA)

Abstract: In recent years,as a hotspot in data mining of physical network,social network data mining has made grati-fying achievements in the current user behavior analysis,interest recognition and product recommendation.With the advent of social networking business opportunities,many malicious users and malicious behaviors have also emerged,which have a great impact on the effectiveness of data mining.A malicious user identification method based on user behavior feature analysis was proposed.This method uses the principal component analysis(PCA) to mine the user behavior data in microblogging network,and ranks the weight of each feature.It can effectively identify malicious users with first six-dimensional principal component features.The new features fitted by the principal component features are used to improve the recognition performance of the system.The experimental results show that the proposed method can effectively sort the microblogging user features and identify the malicious users in the microblogging social network,which provides a good data cleaning technique for social network data mining in other directions.

Key words: Feature extraction, Machine learning, Malicious users, Microblogging, Principal component analysis(PCA)

中图分类号: 

  • TP309
[1]WANG Y L,ZHANG M.Summary of the Current ResearchStatus of Weibo in China [J].Library Science Research,2014(12):2-8.(in Chinese)
王莹莉,张敏.国内微博研究现状综述[J].图书馆学研究,2014(12):2-8.
[2]Wikipedia:Spamming[EB/OL].[2017-03-25].http://en.wik-ipedia.org/wiki/Spamming.
[3]CHU Z,GIANVECCHIO S,WANG H N,et al.Detecting Automation of Twitter Accounts:ARE you a human,bot,or cyborg?[J].IEEE Transactions on Dependable and Secure Computing,2012,9(6):811-824.
[4]ZHU X,TANG J,LIU H.Leveraging knowledge across media for spammer detection in microbiogging[C]∥Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval.ACM,2014:547-556.
[5]WANG A H.Don’t follow me:Spam detection in Twitter[C]∥Secrypt 2010-Proceedings of the International Conference on Security and Cryptography.2010:1-10.
[6]ZHENG X,ZENG Z,CHEN Z,et al.Detecting spammers on social networks[J].Neurocomputing,2015,159(C):27-34.
[7]ZHANG X Y,CHE X,TIAN X Y.A Malicious User Identification Method Based on Weibo User Behavior [J].Natural Science Journal of Heilongjiang University,2014,10(1):250-254.(in Chinese)
张锡英,车鑫,田宪允.一种基于微博用户行为的恶意用户识别方法[J].黑龙江大学自然科学学报,2014,10(1):250-254.
[8]CHU Z,GIANVECCHIO S,WANG H,et al.Who is tweeting on twitter:human,bot,or cyborg?[C]∥Twenty-Sixth Computer Security Applications Conference.2011:21-30.
[9]LI G C.Weibo spam user behavior modeling and screening [D].Beijing:Beijing University of Posts and Telecommunications,2014.(in Chinese)
李冠辰.微博垃圾用户行为建模和甄别[D].北京:北京邮电大学,2014.
[10]MCCORD M,CHUAH M.Spam detection on twitter using traditional classifiers [C]∥International Conference on Autonomic and Trusted Computing.2011:175-186.
[11]ANTONAKAKI D,POLAKIS I,ATHANASOPOULOS E,etal.Social Network Analysis and Mining[J].International Journal of Advanced Computer Science & Applications,2016,6(1):48.
[12]PERVEEN N,MISSEN M S,RASOOL Q,et al.SentimentBased Twitter Spam Detection[J].International Journal of Advanced Computer Science & Applications,2016,7(7):568-573.
[13]FU H,XIE X,RUI Y.Leveraging Careful Microblog Users for Spammer Detection[C]∥Proceedings of the 24th International Conference on World Wide Web Companion.International World Wide Web Conferences Steering Committee,2015:419-429.
[14]MOH T S,MURMANN A J.Can you judge a man by hisfriends?-enhancing spammer detection on the twitter microblogging platform using friends and followers [M]∥Information Systems,Technology and Management.Springer Berlin Heidelberg,2010:210-220.
[15]BECCHETTI L,BOLDI P,CASTILLO C,et al.Efficient semi-streaming algorithms for local triangle counting in massive graphs[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2008:16-24.
[16]HU X,TANG J,ZHANG Y,et al.Social spammer detection in microblogging[C]∥Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence.AAAI Press,2013:2633-2639.
[17]ZHANG X,LI Z,ZHU S,et al.Detecting Spam and Promoting Campaigns in Twitter[J].Acm Transactions on the Web,2016,10(1):4-8.
[18]BENEVENUTO F,MAGNO G,RODRIGUES T,et al.Detecting spammers on twitter [C]∥International Joint Conference on Artificial Intelligence.2010:1723-1728.
[19]HU X,TANG J,ZHANG Y,et al.Social spammer detection in microblogging [C]∥International Joint Conference on Artificial Intelligence.2013:1709-1714.
[20]NESTEROV Y.Introductory lectures on convex optimization[M].IEEE Transactions on Dependable and Secure Computing,2007.
[21]LIU K,YUAN Y Y,LIU P.A Weibo Bot-users Indentification Model Based on Random Forest [J].Acta Scientiarum Natura-lium Universitatis Pekinensis,2015,10(2):10-13.(in Chinese)
刘勘,哀蕴英,刘萍.基于随机森林分类的微博机器用户识别研究[J].北京大学学报(自然科学版),2015,10(2):10-13.
[1] 冷典典, 杜鹏, 陈建廷, 向阳.
面向自动化集装箱码头的AGV行驶时间估计
Automated Container Terminal Oriented Travel Time Estimation of AGV
计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[2] 宁晗阳, 马苗, 杨波, 刘士昌.
密码学智能化研究进展与分析
Research Progress and Analysis on Intelligent Cryptology
计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[3] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[4] 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩.
基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究
Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network
计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
[5] 张光华, 高天娇, 陈振国, 于乃文.
基于N-Gram静态分析技术的恶意软件分类研究
Study on Malware Classification Based on N-Gram Static Analysis Technology
计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[6] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[7] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[8] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[9] 陈明鑫, 张钧波, 李天瑞.
联邦学习攻防研究综述
Survey on Attacks and Defenses in Federated Learning
计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[10] 谢柏林, 黎琦, 邝建.
基于隐半马尔可夫模型的微博流行信息检测方法
Microblog Popular Information Detection Based on Hidden Semi-Markov Model
计算机科学, 2022, 49(6A): 291-296. https://doi.org/10.11896/jsjkx.210800011
[11] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[12] 李亚茹, 张宇来, 王佳晨.
面向超参数估计的贝叶斯优化方法综述
Survey on Bayesian Optimization Methods for Hyper-parameter Tuning
计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208
[13] 赵璐, 袁立明, 郝琨.
多示例学习算法综述
Review of Multi-instance Learning Algorithms
计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047
[14] 王飞, 黄涛, 杨晔.
基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究
Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion
计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030
[15] 肖治鸿, 韩晔彤, 邹永攀.
基于多源数据和逻辑推理的行为识别技术研究
Study on Activity Recognition Based on Multi-source Data and Logical Reasoning
计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!