计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 291-296.doi: 10.11896/jsjkx.210800011

• 大数据&数据科学 • 上一篇    下一篇

基于隐半马尔可夫模型的微博流行信息检测方法

谢柏林, 黎琦, 邝建   

  1. 广东外语外贸大学信息科学与技术学院 广州 510006
    广东外语外贸大学网络空间安全学院 广州 510006
  • 出版日期:2022-06-10 发布日期:2022-06-08
  • 通讯作者: 谢柏林(bailinxie@gdufs.edu.cn)
  • 基金资助:
    广东省基础与应用基础研究基金(2018A0303130045);广州市科技计划项目(201904010334).

Microblog Popular Information Detection Based on Hidden Semi-Markov Model

XIE Bai-lin, LI Qi, KUANG Jiang   

  1. School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou 510006,China
    School of Cyber Security,Guangdong University of Foreign Studies,Guangzhou 510006,China
  • Online:2022-06-10 Published:2022-06-08
  • About author:XIE Bai-lin,born in 1982,Ph.D,assistant professor,is a member of China Computer Federation.His main research interests include online social network,network security.
  • Supported by:
    Guangdong Basic and Applied Basic Research Foundation(2018A0303130045) and Science and Technology Program of Guangzhou(201904010334).

摘要: 目前微博已成为人们发布信息和获取信息的一个重要平台。为了及早发现微博上的流行信息,以便及时发现微博上的热点事件,同时及时发现、抑制谣言信息的传播,使微博在网民的信息获取和信息发布中发挥更积极的作用,文中提出了一种基于隐半马尔可夫模型的微博流行信息检测方法。该方法以信息转发者的影响力等级和相邻两个转发者的时间间隔构建观测值,使用随机森林分类算法来自动得到转发者的影响力等级,利用隐半马尔可夫模型来刻画流行信息的传播过程,基于此来及早发现潜在的流行信息。该方法分为模型训练和流行信息检测两个阶段,在流行信息检测阶段,计算每条信息在传播过程中产生的观测序列相对于模型的平均对数似然概率,实时更新每条信息的流行度。使用采集的新浪微博数据集和Twitter数据集对所提方法进行了测试,实验结果表明了该方法的有效性。

关键词: 传播过程, 流行度, 流行信息, 微博, 隐半马尔可夫模型

Abstract: In recent years,microblog has become great places for people to communicate with each other and share knowledge.However,microblog has also become the main grounds for rumors' transmission.If we can identify popular information in early stage,then we can identify and quell rumors early,we can also identify hot topics early in microblog.Therefore,the research on popular information detection is important.In this paper a new method is presented for identifying popular information based on hidden semi-Markov model(HSMM),from the perspective of the transmission processes of popular information in microblog.In this method,the observation value is constructed based on the influence level of the information forwarder and the time interval between two adjacent forwarders,and the influence level of the forwarder is automatically obtained by using the random forest classification algorithm.The proposed method includes a training phase and an identification phase.In the identification phase,the average log likelihood of every observation sequence is calculated,and the popularity of information is updated in real time.So this method can identify the popular information in early stage.An experiment based on real datasets of Sina Weibo and Twitter is conducted to evaluate this method.The experiment results validate the effectiveness of this method.

Key words: Hidden semi-Markov model, Microblog, Popular information, Popularity, Transmission process

中图分类号: 

  • TP391
[1] YE S,WU S F.Measuring message propagation and social influence on Twitter.com[C]//Proceedings of the Second International Conference on Social Informatics.2010:216-231.
[2] GUILLE A,HACID H,FAVRE C,et al.Information diffusion in online social networks:A survey[J].ACM Sigmod Record,2013,42(2):17-28.
[3] WESTERMAN D,SPENCE P R,VAN DER HEIDE B.A social network as information:The effect of system generated reports of connectedness on credibility on Twitter[J].Computers in Human Behavior,2012,28(1):199-206.
[4] HONG L,DAN O,DAVISON B D.Predicting popular messages in twitter[C]//Proceedings of the 20th International Conference Companion on World Wide Web.ACM,2011:57-58.
[5] BANDARI R,ASUR S,HUBERMAN B A.The pulse of news in social media:Forecasting popularity[C]//Sixth International AAAI Conference on Weblogs and Social Media.2012:26-33.
[6] NAVEED N,GOTTRON T,KUNEGIS J,et al.Bad news travel fast:A content-based analysis of interestingness on twitter[C]//Proceedings of the 3rd International Web Science Confe-rence.ACM,2011:1-7.
[7] PENG H K,ZHU J,PIAO D,et al.Retweet modeling using conditional random fields[C]//2011 IEEE 11th International Conference on Data Mining Workshops.IEEE,2011:336-343.
[8] GAO S,MA J,CHEN Z.Popularity prediction in microblogging network[C]//Asia-Pacific Web Conference.Cham:Springer,2014:379-390.
[9] ZHU H L,YUN X C,HAN Z S.Weibo Popularity PredictionMethod Based on Propagation Acceleration[J].Journal of Computer Research and Development,2018,55(6):1282-1293.
[10] BAO P,SHEN H W,HUANG J,et al.Popularity prediction in microblogging network:a case study on sina weibo[C]//Proceedings of the 22nd International Conference on World Wide Web.ACM,2013:177-178.
[11] GAO S,MA J,CHEN Z.Modeling and predicting retweeting dynamics on microblogging platforms[C]//Proceedings of the Eighth ACM International Conference on Web Search and Data Mining.ACM,2015:107-116.
[12] CAO Q,SHEN H,GAO H,et al.Predicting the popularity of online content with group-specific models[C]//Proceedings of the 26th International Conference on World Wide Web Compa-nion.International World Wide Web Conferences Steering Committee.2017:765-766.
[13] GAO X,CAO Z,LI S,et al.Taxonomy and Evaluation for Microblog Popularity Prediction[J].ACM Transactions on Know-ledge Discovery from Data(TKDD),2019,13(2):15-54.
[14] WANG X M,FANG B X,ZHANG H L,et al.TSL:predicting popularity of Facebook content based on tie strength[J].Journal on Communications,2019,40(10):1-9.
[15] XIE J Y,ZHU Y C,ZHANG Z B,et al.A Multimodal Variational Encoder-Decoder Framework for Micro-video Popularity Prediction[C]//Proceedings of the Web Conference 2020.2020:2542-2548.
[16] YU S Z.Hidden semi-Markov models[J].Artificial intelligence,2010,174(2):215-243.
[17] YU S Z,KOBAYASHI H.An efficient forward-backward algorithm for an explicit-duration hidden Markov model[J].IEEE Signal Processing Letters,2003,10(1):11-14.
[18] RABINER L R.A tutorial on hidden Markov models and selec-ted applications in speech recognition[J].Proceedings of the IEEE,1989,77(2):257-286.
[19] ZHOU F,XU X,TRAJCEVSKI G,et al.A survey of information cascade analysis:Models,predictions,and recent advances[J].ACM Computing Surveys(CSUR),2021,54(2):1-36.
[20] LIU Y,ZHAO J,XIAO Y.C-RBFNN:A user retweet behavior prediction method for hotspot topics based on improved RBF neural network[J].Neurocomputing,2018,275:733-746.
[21] YIN H,YANG S,SONG X,et al.Deep fusion of multimodal features for social media retweet time prediction[J].World Wide Web,2020,24(4):1027-1044.
[22] ROY S,SUMAN B K,CHANDRA J,et al.Forecasting the Future:Leveraging RNN based Feature Concatenation for Tweet Outbreak Prediction[C]//Proceedings of the 7th ACM IKDD CoDS and 25th COMAD.2020:219-223.
[23] LYMPEROPOULOS I N.RC-Tweet:Modeling and predictingthe popularity of tweets through the dynamics of a capacitor[J].Expert Systems with Applications,2021,163:113785.
[24] XIAO C,LIU C,MA Y,et al.Time sensitivity-based popularity prediction for online promotion on Twitter[J].Information Sciences,2020,525:82-92.
[25] ZHANG Z,YIN Z,WEN J,et al.DeepBlue:Bi-layered LSTM for tweet popularity Estimation[J/OL].IEEE Transactions on Knowledge and Data Engineering,2021.https://ieeexplore.ieee.org/abstract/document/9314897.
[26] XIE Y.An efficient algorithm for parameterizing HsMM withGaussian and Gamma distributions[J].Information Processing Letters,2012,112(19):732-737.
[27] CHEN L,DENG H.Predicting User Retweeting Behavior in Social Networks With a Novel Ensemble Learning Approach[C]//IEEE Access.2020:148250-148263.
[28] SHANG J,HUANG S,ZHANG D,et al.RNe2Vec:information diffusion popularity prediction based on repost network embedding[J].Computing,2021,103(2):271-289.
[29] CAO Q,SHEN H,GAO J,et al.Popularity prediction on social platforms with coupled graph neural networks[C]//Proceedings of the 13th International Conference on Web Search and Data Mining.2020:70-78.
[30] ZHOU F,YU L,XU X,et al.Decoupling Representation andRegressor for Long-Tailed Information Cascade Prediction[C]//Proceedings of the 44thInternational ACM SIGIR Conference on Research and Development in Information Retrieval.2021:1875-1879.
[1] 徐建民, 孙朋, 吴树芳.
传播路径树核学习的微博谣言检测方法
Microblog Rumor Detection Method Based on Propagation Path Tree Kernel Learning
计算机科学, 2022, 49(6): 342-349. https://doi.org/10.11896/jsjkx.210400096
[2] 李玉强, 张伟江, 黄瑜, 李琳, 刘爱华.
基于高斯分布的改进词嵌入主题情感模型
Improved Topic Sentiment Model with Word Embedding Based on Gaussian Distribution
计算机科学, 2022, 49(2): 256-264. https://doi.org/10.11896/jsjkx.201200082
[3] 史伟, 付月.
考虑语境的微博短文本挖掘:情感分析的方法
Microblog Short Text Mining Considering Context:A Method of Sentiment Analysis
计算机科学, 2021, 48(6A): 158-164. https://doi.org/10.11896/jsjkx.210200089
[4] 刘忠慧, 赵琦, 邹璐, 闵帆.
三元概念的启发式构建及其在社会化推荐中的应用
Heuristic Construction of Triadic Concept and Its Application in Social Recommendation
计算机科学, 2021, 48(6): 234-240. https://doi.org/10.11896/jsjkx.200500136
[5] 韩立锋, 陈莉.
融合用户属性与项目流行度的用户冷启动推荐模型
User Cold Start Recommendation Model Integrating User Attributes and Item Popularity
计算机科学, 2021, 48(2): 114-120. https://doi.org/10.11896/jsjkx.200900152
[6] 郁友琴, 李弼程.
基于多粒度文本特征表示的微博用户兴趣识别
Microblog User Interest Recognition Based on Multi-granularity Text Feature Representation
计算机科学, 2021, 48(12): 219-225. https://doi.org/10.11896/jsjkx.201100128
[7] 王晓涵, 谭陈琛, 相艳, 余正涛.
基于双嵌入卷积神经网络的涉案微博评价对象抽取
Aspect Extraction of Case Microblog Based on Double Embedded Convolutional Neural Network
计算机科学, 2021, 48(12): 319-323. https://doi.org/10.11896/jsjkx.201100105
[8] 张志扬, 张凤荔, 谭琪, 王瑞锦.
基于深度学习的信息级联预测方法综述
Review of Information Cascade Prediction Methods Based on Deep Learning
计算机科学, 2020, 47(7): 141-153. https://doi.org/10.11896/jsjkx.200300130
[9] 袁得嵛, 章逸钒, 高见, 孙海春.
基于用户特征提取的新浪微博异常用户检测方法
Abnormal User Detection Method in Sina Weibo Based on User Feature Extraction
计算机科学, 2020, 47(6A): 364-368. https://doi.org/10.11896/JsJkx.190700008
[10] 刘宇东, 孙豪, 蒋运承.
融合内容相似度与多特征计算的个性化微博推荐模型
Personalized Microblog Recommendation Model Integrating Content Similarity and Multi-feature Computing
计算机科学, 2020, 47(10): 97-101. https://doi.org/10.11896/jsjkx.190700073
[11] 王新胜,马树章.
融合用户自身因素与互动行为的微博用户影响力计算方法
Method of Weibo User Influence Calculation Integrating Users’ Own Factors and Interaction Behavior
计算机科学, 2020, 47(1): 96-101. https://doi.org/10.11896/jsjkx.181202253
[12] 周艳芳, 周刚, 鹿忠磊.
一种基于迁移学习及多表征的微博立场分析方法
Approach of Stance Detection in Micro-blog Based on Transfer Learning and Multi-representation
计算机科学, 2018, 45(9): 243-247. https://doi.org/10.11896/j.issn.1002-137X.2018.09.040
[13] 刘慧婷, 程雷, 郭孝雪, 赵鹏.
实时个性化微博推荐系统
Real-time Personalized Micro-blog Recommendation System
计算机科学, 2018, 45(9): 253-259. https://doi.org/10.11896/j.issn.1002-137X.2018.09.042
[14] 罗建桢,蔡君,刘燕,赵慧民.
一种基于内容流行度和社团重要度的ICN缓存与替换策略
Caching and Replacing Strategy in Information-centric Network Based on Content Popularity and Community Importance
计算机科学, 2018, 45(7): 116-121. https://doi.org/10.11896/j.issn.1002-137X.2018.07.019
[15] 何佶星,陈汶滨,牟斌皓.
流行度划分结合平均偏好权重的协同过滤个性化推荐算法
Coordination Filtering Personalized Recommendation Algorithm Considering Average
Preference Weight and Popularity Division
计算机科学, 2018, 45(6A): 493-496.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!