Computer Science ›› 2017, Vol. 44 ›› Issue (7): 191-196.doi: 10.11896/j.issn.1002-137X.2017.07.034

Previous Articles     Next Articles

Micro-blog Retweet Behavior Prediction Algorithm Based on Anomaly Detection and Random Forest

ZHOU Xian-ting, HUANG Wen-ming and DENG Zhen-rong   

  • Online:2018-11-13 Published:2018-11-13

Abstract: Aiming to solve the issue that the accuracy of micro-blog retweet behavior prediction is not good enough and features are selected with an arbitrary choice,a new method using anomaly detection and random forest algorithms to predict micro-blog retweet behavior was proposed.Firstly,the basic features of the user,the basic characteristics of blog and blog content theme features are extracted,and the user activity and blog influence are calculated based on relative entropy.Secondly,the best feature set are selected by combining the filter and wrapper feature selection method.Finally,anomaly detection and random forest algorithms are fused to predict micro-blog retweet behavior based on selected features.The algorithm parameters of random forest are selected by analyzing the error estimation of out of bag data.By contrasting with Logistic Regression,Decision Tree,Naive Bias and Random Forest algorithms,which are used in the analysis for micro-blog retweet behavior,the prediction accuracy of the proposed method is higher than that of the optimal random forest method on real data,and reaches 90.5%.Meanwhile,the validity of feature selection method is verified.

Key words: Retweet prediction,Random forest,Anomaly detection,Feature filter,Relative entropy

[1] CAO J X,WU J L,SHI W,et al.Sina microblog information diffusion analysis and prediction[J].Chinese Journal of Compu-ters,2014,37(4):779-790.(in Chinese) 曹玖新,吴江林,石伟,等.新浪微博网信息传播分析与预测[J].计算机学报,2014,37(4):779-790.
[2] 中国互联网络信息中心.第37次中国互联网络发展状况统计报告[R].北京:中国互联网络信息中心,2016.
[3] PETROVIC S,OSBORNE M,LAVRENKO V.RT to win! Predicting message propagation in Twitter[C]∥ Proceedings of the Fifth International Conference on Weblogs and Social Media.Barcelonia,Spain,2011.
[4] MORCHID M,DUFOUR R,BOUSQUET P M,et al.Feature selection using Principal Component Analysis for massive retweet detection[J].Pattern Recognition Letters,2014,49:33-39.
[5] YANG Z,GUO J,CAI K,et al.Understanding retweeting be-haviors in social networks[C]∥Proceedings of the 19th ACM Conference on Information and Knowledge Management(CIKM 2010).Toronto,Ontario,Canada,2010:1633-1636.
[6] ROMERO D M,MEEDER B,KLEINBERG J.Differences in themechanics of information diffusion across topics:idioms,political hashtags,and complex contagion on twitter[C]∥Proceedings of the 20th International Conference on World Wide Web(WWW 2011).Hyderabad,India,2011:695-704.
[7] ZHANG Y,LU R,YANG Q.Predicting retweeting in microblogs[J].Journal of Chinese Information Processing,2012,26(4):109-114.(in Chinese) 张旸,路荣,杨青.微博客中转发行为的预测研究[J].中文信息学报,2012,26(4):109-114.
[8] LI Y L,YU H T,LIU L X.Predict algorithm of micro-blog retweet scale based on SVM[J].Application Research of Computers,2013,30(9):2594-2597.(in Chinese) 李英乐,于洪涛,刘力雄.基于SVM的微博转发规模预测方法[J].计算机应用研究,2013,30(9):2594-2597.
[9] ZHAO Y,SHAO B L,BIAN G Q,et al.Prediction of retweeting behavior for imbalanced dataset in microblogs[J].Journal of Computer Applications,2015,35(7):1959-1964.(in Chinese) 赵煜,邵必林,边根庆,等.面向不平衡微博数据集的转发行为预测方法[J].计算机应用,2015,35(7):1959-1964.
[10] SUH B,HONG L,PIROLLI P,et al.Want to be Retweeted? Large scale analytics on factors impacting retweet in Twitter network[C]∥2010 IEEE International Conference on Social Computing / IEEE International Conference on Privacy,Security,Risk and Trust.IEEE,2010:177-184.
[11] HU W.Real-time Twitter sentiment toward midterm exams[J].Sociology Mind,2012,2(2):177-184.
[12] WU J H,ZUO K Z,JIE B,et al.New discriminative feature selection method[J].Journal of Computer Applications,2015,35(10):2752-2756.(in Chinese) 吴锦华,左开中,接标,等.新颖的判别性特征选择方法[J].计算机应用,2015,35(10):2752-2756.
[13] KAR M,NUNES S,RIBEIRO C.Summarization of changes in dynamic text collections using Latent Dirichlet Allocation model[J].Information Processing & Management,2015,51(6):809-833.
[14] LIU S P,YIN J,OUYANG J,et al.Topic mining from microblogs based on MB-HDP model[J].Chinese Journal of Compu-ters,2015(7):1408-1419.(in Chinese) 刘少鹏,印鉴,欧阳佳,等.基于MB-HDP 模型的微博主题挖掘[J].计算机学报,2015(7):1408-1419.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!