Computer Science ›› 2019, Vol. 46 ›› Issue (11): 202-208.doi: 10.11896/jsjkx.180901617

• Artificial Intelligence • Previous Articles     Next Articles

Microblogging Water Army Identification Based on Semi-supervised Collaborative Training Algorithm

HAN Qing-qing, ZHANG Yan-mei, NIU Wa   

  1. (Information School,Central University of Finance and Economics,Beijing 102206,China)
  • Received:2018-09-01 Online:2019-11-15 Published:2019-11-14

Abstract: In the fast-developing Internet era,Weibo brings a large amount of information,but there exists water army in Weibo topic.To a certain extent,the water army affects ordinary users to understand the real situation.In order to efficiently and accurately identify the water army,the semi-supervised collaborative training algorithm is considered comprehensively in view of the small number of water military samples and the large number of non-water military samples.By studying and analyzing multiple characteristics of Weibousers,the proposed algorithm redefines six attribute feature values,such as account attention,daily microblog number,and microblog influence.According to the characteristics of the algorithm,the six attribute feature values are divided into two attribute sets,each attribute set corresponds to one view,and each view uses seven classification methods in the Scikit-Learn machine learning library to train the classifier to identify the water army.Finally,experiments are conducted on dataset.The results show that the accuracy,recall rate,accuracy and F1-measure value of the classification results are higher when the two views use the naive Bayes algorithm and the logistic regression algorithm to train the classifier.Therefore,comprehensive analysis of Weibo user cha-racteristics and the use of semi-supervised collaborative training algorithms in line with the actual situation can accurately,efficiently and quickly identify Weibo water army.

Key words: Classifier, Collaborative training, Semi-supervised, Water army identification

CLC Number: 

  • TP393
[1]LIU S W,XU Y,WANG B L,et al.Water Army Detection of Weibo Using User Representation Learning.Journal of Intelligence,2018,37(7):95-100.(in Chinese)
刘姝雯,徐扬,王冰璐,等.基于用户表示学习的微博水军识别研究.情报杂志,2018,37(7):95-100.
[2]CHEN K,CHEN L,ZHU P D,et al.Interaction based on method for spam detection in online social networks.Journal on Communications,2015,36(7):120-128.(in Chinese)
陈侃,陈亮,朱培栋,等.基于交互行为的在线社会网络水军检测方法.通信学报,2015,36(7):120-128.
[3]GAYO-AVELLO D,BRENES D J.Overcoming Spammers inTwitter-A Tale of Five Algorithms∥CERI.Madrid:Spain,2010:41-52.
[4]CHEN H ,LIU J ,LV Y ,et al.Semi-supervised Clue Fusion for Spammer Detection in Sina Weibo.Information Fusion,2017:S1566253517300714.
[5]ZHANG M L,ZHOU Z H.CoTrade:Confident Co-TrainingWith Data Editing.IEEE Transactions on Systems Man & Cybernetics Part B Cybernetics A Publication of the IEEE Systems Man & Cybernetics Society,2011,41(6):1612-1626.
[6]BLUM A.Combining labeled and unlabeled data with co-training∥Conference on Computational Learning Theory.1998:92-100.
[7]MILLER Z,DICKINSON B,DEITRICK W,et al.Twitter spammer detection using data stream clustering.Information Sciences,2014,260(1):64-73.
[8]HAN Z M,YANG K,TAN X S.Analyzing Spectrum Featuresof Weight User Relation Graph to Identify Large Spammer Groups in Online Shopping Websites.Chinese Journal of Computers,2017,40(4):939-954.(in Chinese)
韩忠明,杨珂,谭旭升.利用加权用户关系图的谱分析探测大规模电子商务水军团体.计算机学报,2017,40(4):939-954.
[9]KIM C,HWANG K.Naive Bayes Classifier Learning with Feature Selection for Spam Detection in Social Bookmarking.Pennsylvania,USA:Penn State,2008.
[10]ZHANG Y M,HUANG Y Y,GAN S J,et al.Weibo spammers’ identification algorithm based on Bayesian model.Journal on Communications,2017,38(1):44-53.(in Chinese)
张艳梅,黄莹莹,甘世杰,等.基于贝叶斯模型的微博网络水军识别算法研究.通信学报,2017,38(1):44-53.
[11]ZHENG X,ZENG Z,CHEN Z,et al.Detecting spammers on social networks.Neurocomputing,2015,159(C):27-34.
[12]YUAN X P,WANG R W,ZHAI B Y.Automatic Recognition of Micro-blog Water Army Based on Multi-index Comprehensive Index Method and Entropy Method.Journal of Intelligence,2014(7):176-179.(in Chinese)
袁旭萍,王仁武,翟伯荫.基于综合指数和熵值法的微博水军自动识别.情报杂志,2014(7):176-179.
[13]CHENG X T,LIU C X,LIU S X.Graph-based Features forIdentifying Spammers in Microblog Networks.Acta Automa-tica Sinica,2015,41(9):1533-1541.(in Chinese)
程晓涛,刘彩霞,刘树新.基于关系图特征的微博水军发现方法.自动化学报,2015,41(9):1533-1541.
[14]ZHANG L.The Research and Implementation on the Technology of Spammer Detection for Sina Mircoblog.Changsha:National University of Defense Technology,2015.(in Chinese)
张良.面向新浪微博的水军识别技术的研究与实现.长沙:国防科学技术大学,2015.
[15]LV C.Research and Implementation of Internet Forum WaterArmy Detection Based on User Behaiors.Chengdu:Southwest Jiaotong University,2017.(in Chinese)
吕晨.基于用户行为的网络论坛水军检测研究与实现.成都:西南交通大学,2017.
[16]BLUM A.Combining Labeled and unlabeled Data with Cotraining∥Proc.of the Conference on Computational Learning Theory.1998.
[17]ZHI-HUA Z.Disagreement-based Semi-supervised Learning.Acta Automatica Sinica,2013,39(11):1871-1878.
[18]NIGAM K,GHANI R.Analyzing the effectiveness and applicability of co-training∥International Conference on Information and Knowledge Management.ACM,2000:86-93.
[19]PENG Y,ZHANG D Q.Semi-Supervised Canonical Correlation Analysis Algorithm.Journal of Software,2008,19(11):2822-2832.(in Chinese)
彭岩,张道强.半监督典型相关分析算法.软件学报,2008,19(11):2822-2832.
[20]LI F,HUANG M,YANG Y,et al.Learning to identify review spam∥International Joint Conference on Artificial Intelligence.AAAI Press,2011:2488-2493.
[21]ZHU J.Semi-supervised learning literature survey.Computer Sciences Department,2005,37(1):63-77.
[1] WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[2] HOU Xia-ye, CHEN Hai-yan, ZHANG Bing, YUAN Li-gang, JIA Yi-zhen. Active Metric Learning Based on Support Vector Machines [J]. Computer Science, 2022, 49(6A): 113-118.
[3] WANG Yu-fei, CHEN Wen. Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment [J]. Computer Science, 2022, 49(6): 127-133.
[4] XU Hua-jie, CHEN Yu, YANG Yang, QIN Yuan-zhuo. Semi-supervised Learning Method Based on Automated Mixed Sample Data Augmentation Techniques [J]. Computer Science, 2022, 49(3): 288-293.
[5] HOU Hong-xu, SUN Shuo, WU Nier. Survey of Mongolian-Chinese Neural Machine Translation [J]. Computer Science, 2022, 49(1): 31-40.
[6] ZHAO Min, LIU Jing-lei. Semi-supervised Clustering Based on Gaussian Fields and Adaptive Graph Regularization [J]. Computer Science, 2021, 48(7): 137-144.
[7] LI Meng-he, XU Hong-ji, SHI Lei-xin, ZHAO Wen-jie, LI Juan. Multi-person Activity Recognition Based on Bone Keypoints Detection [J]. Computer Science, 2021, 48(4): 138-143.
[8] WANG Xing , KANG Zhao. Smooth Representation-based Semi-supervised Classification [J]. Computer Science, 2021, 48(3): 124-129.
[9] CHU Jie, ZHANG Zheng-jun, TANG Xin-yao, HUANG Zhen-sheng. Label Propagation Algorithm Based on Weighted Samples and Consensus-rate [J]. Computer Science, 2021, 48(3): 214-219.
[10] YANG Fan, WANG Jun-bin, BAI Liang. Extended Algorithm of Pairwise Constraints Based on Security [J]. Computer Science, 2020, 47(9): 324-329.
[11] XIE Yuan, MIAO Yu-bin, XU Feng-lin, ZHANG Ming. Injection-molded Bottle Defect Detection Using Semi-supervised Deep Convolutional Generative Adversarial Network [J]. Computer Science, 2020, 47(7): 92-96.
[12] LIU Xiao, YUAN Guan, ZHANG Yan-mei, YAN Qiu-yan, WANG Zhi-xiao. Hand Gesture Recognition Based on Self-adaptive Multi-classifiers Fusion [J]. Computer Science, 2020, 47(7): 103-110.
[13] QI Bao-lian, ZHONG Kun-hua and CHEN Yu-wen. Semi-supervised Surgical Video Workflow Recognition Based on Convolution Neural Network [J]. Computer Science, 2020, 47(6A): 172-175.
[14] QIN Yue, DING Shi-fei. Survey of Semi-supervised Clustering [J]. Computer Science, 2019, 46(9): 15-21.
[15] WU Zhen-yu, LI Yun-lei, WU Fan. Semi-supervised Support Tensor Based on Tucker Decomposition [J]. Computer Science, 2019, 46(9): 195-200.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!