计算机科学 ›› 2018, Vol. 45 ›› Issue (12): 111-116.doi: 10.11896/j.issn.1002-137X.2018.12.017
夏崇欢, 李华康, 孙国梓
XIA Chong-huan, LI Hua-kang, SUN Guo-zi
摘要: 近年来,社交网络数据挖掘作为物理网络空间数据挖掘的一大热点,目前在用户行为分析、兴趣识别、产品推荐等方面都取得了令人可喜的成果。随着社交网络商业契机的到来,出现了很多恶意用户及恶意行为,给数据挖掘的效果产生了极大的影响。基于此,提出基于用户行为特征分析的恶意用户识别方法,该方法引入主成分分析方法对微博网络用户行为数据进行挖掘,对各维度特征的权重进行排序,选取前六维主成分特征可以有效识别恶意用户,主成分特征之间拟合出的新特征也能提升系统的识别性能。实验结果表明,引入的方法对微博用户特征进行了有效的排序,很好地识别出了微博社交网络中的恶意用户,为其他方向的社交网络数据挖掘提供了良好的数据清洗技术。
中图分类号:
[1]WANG Y L,ZHANG M.Summary of the Current ResearchStatus of Weibo in China [J].Library Science Research,2014(12):2-8.(in Chinese) 王莹莉,张敏.国内微博研究现状综述[J].图书馆学研究,2014(12):2-8. [2]Wikipedia:Spamming[EB/OL].[2017-03-25].http://en.wik-ipedia.org/wiki/Spamming. [3]CHU Z,GIANVECCHIO S,WANG H N,et al.Detecting Automation of Twitter Accounts:ARE you a human,bot,or cyborg?[J].IEEE Transactions on Dependable and Secure Computing,2012,9(6):811-824. [4]ZHU X,TANG J,LIU H.Leveraging knowledge across media for spammer detection in microbiogging[C]∥Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval.ACM,2014:547-556. [5]WANG A H.Don’t follow me:Spam detection in Twitter[C]∥Secrypt 2010-Proceedings of the International Conference on Security and Cryptography.2010:1-10. [6]ZHENG X,ZENG Z,CHEN Z,et al.Detecting spammers on social networks[J].Neurocomputing,2015,159(C):27-34. [7]ZHANG X Y,CHE X,TIAN X Y.A Malicious User Identification Method Based on Weibo User Behavior [J].Natural Science Journal of Heilongjiang University,2014,10(1):250-254.(in Chinese) 张锡英,车鑫,田宪允.一种基于微博用户行为的恶意用户识别方法[J].黑龙江大学自然科学学报,2014,10(1):250-254. [8]CHU Z,GIANVECCHIO S,WANG H,et al.Who is tweeting on twitter:human,bot,or cyborg?[C]∥Twenty-Sixth Computer Security Applications Conference.2011:21-30. [9]LI G C.Weibo spam user behavior modeling and screening [D].Beijing:Beijing University of Posts and Telecommunications,2014.(in Chinese) 李冠辰.微博垃圾用户行为建模和甄别[D].北京:北京邮电大学,2014. [10]MCCORD M,CHUAH M.Spam detection on twitter using traditional classifiers [C]∥International Conference on Autonomic and Trusted Computing.2011:175-186. [11]ANTONAKAKI D,POLAKIS I,ATHANASOPOULOS E,etal.Social Network Analysis and Mining[J].International Journal of Advanced Computer Science & Applications,2016,6(1):48. [12]PERVEEN N,MISSEN M S,RASOOL Q,et al.SentimentBased Twitter Spam Detection[J].International Journal of Advanced Computer Science & Applications,2016,7(7):568-573. [13]FU H,XIE X,RUI Y.Leveraging Careful Microblog Users for Spammer Detection[C]∥Proceedings of the 24th International Conference on World Wide Web Companion.International World Wide Web Conferences Steering Committee,2015:419-429. [14]MOH T S,MURMANN A J.Can you judge a man by hisfriends?-enhancing spammer detection on the twitter microblogging platform using friends and followers [M]∥Information Systems,Technology and Management.Springer Berlin Heidelberg,2010:210-220. [15]BECCHETTI L,BOLDI P,CASTILLO C,et al.Efficient semi-streaming algorithms for local triangle counting in massive graphs[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2008:16-24. [16]HU X,TANG J,ZHANG Y,et al.Social spammer detection in microblogging[C]∥Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence.AAAI Press,2013:2633-2639. [17]ZHANG X,LI Z,ZHU S,et al.Detecting Spam and Promoting Campaigns in Twitter[J].Acm Transactions on the Web,2016,10(1):4-8. [18]BENEVENUTO F,MAGNO G,RODRIGUES T,et al.Detecting spammers on twitter [C]∥International Joint Conference on Artificial Intelligence.2010:1723-1728. [19]HU X,TANG J,ZHANG Y,et al.Social spammer detection in microblogging [C]∥International Joint Conference on Artificial Intelligence.2013:1709-1714. [20]NESTEROV Y.Introductory lectures on convex optimization[M].IEEE Transactions on Dependable and Secure Computing,2007. [21]LIU K,YUAN Y Y,LIU P.A Weibo Bot-users Indentification Model Based on Random Forest [J].Acta Scientiarum Natura-lium Universitatis Pekinensis,2015,10(2):10-13.(in Chinese) 刘勘,哀蕴英,刘萍.基于随机森林分类的微博机器用户识别研究[J].北京大学学报(自然科学版),2015,10(2):10-13. |
[1] | 冷典典, 杜鹏, 陈建廷, 向阳. 面向自动化集装箱码头的AGV行驶时间估计 Automated Container Terminal Oriented Travel Time Estimation of AGV 计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028 |
[2] | 宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053 |
[3] | 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇. 基于大数据的进化网络影响力分析研究综述 Survey of Influence Analysis of Evolutionary Network Based on Big Data 计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240 |
[4] | 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩. 基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究 Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network 计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094 |
[5] | 张光华, 高天娇, 陈振国, 于乃文. 基于N-Gram静态分析技术的恶意软件分类研究 Study on Malware Classification Based on N-Gram Static Analysis Technology 计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203 |
[6] | 张源, 康乐, 宫朝辉, 张志鸿. 基于Bi-LSTM的期货市场关联交易行为检测方法 Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM 计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304 |
[7] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
[8] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[9] | 陈明鑫, 张钧波, 李天瑞. 联邦学习攻防研究综述 Survey on Attacks and Defenses in Federated Learning 计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079 |
[10] | 谢柏林, 黎琦, 邝建. 基于隐半马尔可夫模型的微博流行信息检测方法 Microblog Popular Information Detection Based on Hidden Semi-Markov Model 计算机科学, 2022, 49(6A): 291-296. https://doi.org/10.11896/jsjkx.210800011 |
[11] | 刘伟业, 鲁慧民, 李玉鹏, 马宁. 指静脉识别技术研究综述 Survey on Finger Vein Recognition Research 计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056 |
[12] | 李亚茹, 张宇来, 王佳晨. 面向超参数估计的贝叶斯优化方法综述 Survey on Bayesian Optimization Methods for Hyper-parameter Tuning 计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208 |
[13] | 赵璐, 袁立明, 郝琨. 多示例学习算法综述 Review of Multi-instance Learning Algorithms 计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047 |
[14] | 王飞, 黄涛, 杨晔. 基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究 Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion 计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030 |
[15] | 肖治鸿, 韩晔彤, 邹永攀. 基于多源数据和逻辑推理的行为识别技术研究 Study on Activity Recognition Based on Multi-source Data and Logical Reasoning 计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270 |
|