计算机科学 ›› 2020, Vol. 47 ›› Issue (6A): 364-368.doi: 10.11896/JsJkx.190700008

• 信息安全 • 上一篇    下一篇

基于用户特征提取的新浪微博异常用户检测方法

袁得嵛1, 2, 章逸钒1, 高见1, 2, 孙海春1, 2   

  1. 1 中国人民公安大学信息技术与网络安全学院 北京 102623;
    2 安全防范与风险评估公安部重点实验室 北京 102623
  • 发布日期:2020-07-07
  • 通讯作者: 高见(gaoJian@ppsuc.edu.cn)
  • 作者简介:yuandeyu@ppsuc.edu.cn
  • 基金资助:
    国家自然科学基金(61771072);中国人民公安大学专项项目(2020JWCX01)

Abnormal User Detection Method in Sina Weibo Based on User Feature Extraction

YUAN De-yu1, 2, ZHANG Yi-fan1, GAO Jian1, 2 and SUN Hai-chun1, 2   

  1. 1 Institute of Information Technology and Cyber Security,People’s Public Security University of China,BeiJing 102623,China
    2 Key Laboratory of Safety Precautions and Risk Assessment,Ministry of Public Security,BeiJing 102623,China
  • Published:2020-07-07
  • About author:YUAN De-yu, born in 1986, Ph.D, lecturer.His main research interests include cyber security, and complex networks.
    GAO Jian, born in 1982, Ph.D, lecturer.His main research interests include botnet, malware analysis and cyber crime.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61771072) and Special ProJect of People’s Public Security University of China (2020JWCX01).

摘要: 随着互联网的发展,微博逐渐成为重要的社交媒体。然而,在微博中异常用户通过传播有害信息、发送恶意链接,甚至发起恶意攻击等方式影响用户的行为,从而影响了社交网络的价值。因此,实现对异常用户的检测具有重要的意义。文中以多途径获取的微博异常用户和正常用户数据集为基础,对其进行数据清洗后,提出综合提取并分析用户的多种属性。多种数据挖掘方法建立异常用户检测模型,从而进行异常用户账号的识别。对C4.5决策树、随机森林等算法的实验结果表明,所提方法选取的特征有效,检测异常用户的精度较高。

关键词: 数据挖掘, 特征提取, 微博, 异常用户

Abstract: With the development of the Internet,Weibo has gradually become an important social media.However,in Weibo,abnormal users influence the behaviors of users by spreading harmful information,sending malicious links,and even launching malicious attacks,thus affecting the value of social networks.Therefore,it is important to realize the detection of abnormal users.Based on the Weibo abnormal users and normal user data sets obtained from multiple ways,this paper proposes to comprehensive extract and analyze various attributes of users.An abnormal user detection model is established through various data mining methods to identify abnormal user accounts.Experimental results of C4.5 decision tree and random forest algorithms show that by using the proposed method,the selected features are effective and the detection accuracy of abnormal users is high.

Key words: Abnormal user, Data mining, Feature extraction, Weibo

中图分类号: 

  • TP309
[1] 中国互联网信息中心.第43次中国互联网络发展状况统计报告.北京.CNNIC,2019.
[2] FABRICO B,MAGNO G,RODRIGUES T,et al.Detecting spammers and content promoters in online video social networks//Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2009:620-627.
[3] BENEVENUTO F,MAGNO G,RODRIGUES T,et al.Detecting spammers on twitter//Collaboration,Electronicmessaging,Anti-Abuse and Spam Conference.Washington,2010:6-12.
[4] STRINGHINI G,KRUEGEL C,VIGNA G.Detecting spammers on socia networks//Proceedings of the 26th Annual Computer Security Applications Conference.ACM,2010:1-9.
[5] 彭希羡,朱庆华,刘璇.微博客用户特征分析及分类研究——以“新浪微博”为例.情报科学,2015,33(1):69-75.
[6] 刘勘,袁蕴英,刘萍.基于随机森林分类的微博机器用户识别研究.北京大学学报,2015,51(2):289-300.
[7] APHINYANAPHONGS Y,RAY B,STATNIKOV A,et al.
[8] Text classification for automatic detection of alcohol use-related tweets:A feasibility study//2014 IEEE 15th International Conference on Information Reuse and Integration(IRI).IEEE,2014:93-97.
[9] 蒋鑫.基于属性约简的社交网络异常用户识别系统的设计与实现.北京:北京邮电大学,2016:2-3.
[10] 夏崇欢.基于行为特征分析的微博恶意用户检测方法.南京:南京邮电大学,2018:5-6.
[11] 郝亚洲,郑庆华,陈艳,等.面向网络舆情数据的异常行为识别.计算机研究与发展,2016,53(3):611-620.
[12] 张玉清,吕少卿,范丹.在线社交网络中异常帐号检测方法研究.计算机学报,2015,38(10):2011-2027.
[13] 吴大鹏,司书山,闫俊杰,等.基于行为特征分析的社交网络女巫节点检测机制.电子与信息学报,2017,39(9):2089-2096.
[14] 刘琛.基于行为分析的社交网络异常账号的检测.北京:北京交通大学,2017.
[15] 孙洋.LBSN中基于好友聚类的社交搜索系统设计与实现.南京:东南大学,2017.
[1] 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平.
基于时空注意力克里金的边坡形变数据插值方法
Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation
计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161
[2] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[3] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[4] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[5] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[6] 谢柏林, 黎琦, 邝建.
基于隐半马尔可夫模型的微博流行信息检测方法
Microblog Popular Information Detection Based on Hidden Semi-Markov Model
计算机科学, 2022, 49(6A): 291-296. https://doi.org/10.11896/jsjkx.210800011
[7] 徐建民, 孙朋, 吴树芳.
传播路径树核学习的微博谣言检测方法
Microblog Rumor Detection Method Based on Propagation Path Tree Kernel Learning
计算机科学, 2022, 49(6): 342-349. https://doi.org/10.11896/jsjkx.210400096
[8] 高元浩, 罗晓清, 张战成.
基于特征分离的红外与可见光图像融合算法
Infrared and Visible Image Fusion Based on Feature Separation
计算机科学, 2022, 49(5): 58-63. https://doi.org/10.11896/jsjkx.210200148
[9] 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明.
大数据驱动的社会经济地位分析研究综述
Big Data-driven Based Socioeconomic Status Analysis:A Survey
计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[10] 左杰格, 柳晓鸣, 蔡兵.
基于图像分块与特征融合的户外图像天气识别
Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion
计算机科学, 2022, 49(3): 197-203. https://doi.org/10.11896/jsjkx.201200263
[11] 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉.
基于差分隐私的K-means算法优化研究综述
Review of K-means Algorithm Optimization Based on Differential Privacy
计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[12] 李玉强, 张伟江, 黄瑜, 李琳, 刘爱华.
基于高斯分布的改进词嵌入主题情感模型
Improved Topic Sentiment Model with Word Embedding Based on Gaussian Distribution
计算机科学, 2022, 49(2): 256-264. https://doi.org/10.11896/jsjkx.201200082
[13] 任首朋, 李劲, 王静茹, 岳昆.
基于集成回归决策树的lncRNA-疾病关联预测方法
Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction
计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132
[14] 张亚迪, 孙悦, 刘锋, 朱二周.
结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究
Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index
计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148
[15] 马董, 李新源, 陈红梅, 肖清.
星型高影响的空间co-location模式挖掘
Mining Spatial co-location Patterns with Star High Influence
计算机科学, 2022, 49(1): 166-174. https://doi.org/10.11896/jsjkx.201000186
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!