计算机科学 ›› 2017, Vol. 44 ›› Issue (8): 270-273.doi: 10.11896/j.issn.1002-137X.2017.08.046

• 人工智能 • 上一篇    下一篇

面向时间序列的微博话题演化模型研究

王振飞,刘凯莉,郑志蕴,王飞   

  1. 郑州大学信息工程学院 郑州450001,郑州大学信息工程学院 郑州450001,郑州大学信息工程学院 郑州450001,郑州大学信息工程学院 郑州450001
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受郑州大学新媒体公共传播学科招标课题阶段性成果(XMTGGCBJSZ11),河南省科技攻关项目(142102310531)资助

Research on Evolution Model of Microblog Topic Based on Time Sequence

WANG Zhen-fei, LIU Kai-li, ZHENG Zhi-yun and WANG Fei   

  • Online:2018-11-13 Published:2018-11-13

摘要: 话题演化研究有助于追踪用户的喜好和话题的发展趋势,对于舆情预警具有重要意义。目前,话题演化方法注重运用话题生成模型实现话题演化分析,忽略了话题中时间因素和背景词的存在。以传统话题生成模型LDA为基础,将其扩展为微博话题生成模型MTLDA。MTLDA模型增加了对背景词的考虑,提高了话题生成的效率,同时对微博话题集进行时间片划分,利用KL距离计算相邻时间片话题距离,分析话题演化情况。以新浪微博数据为例进行实验,结果表明,MTLDA模型通过时间片划分完成了微博话题的生成,话题演化结果与实际情况吻合。

关键词: 微博,话题演化,社交网络,MTLDA模型,KL距离

Abstract: Topic evolution research is helpful to track the user preferences and development trend of topics,and it is of great significance for public sentiment warning.Current topic evolution methods focus on using topic generation model to achieve the topic evolution analysis,and ignore the time factors of topic and background word.Based on the tradi-tional topic generation model LDA,this paper extended it to the micro-blog topic generation model MTLDA.Conside-ring the background word,MTLDA model improves the efficiency of the topic generation.Meanwhile,the micro-blog topic set is divided into time slices,KL divergence is used to calculate the distance between adjacent time slices,and topic evolution is analyzed.Taking Sina Micro-blog data as an example,the experimental results show that the MTLDA model completes the generation of micro-blog topic by using the time slice,and the topic evolution results are tally with the actual situation.

Key words: Microblog,Topic evolution,Social network,MTLDA model,Kullback Leibler(KL) divergence

[1] REN L,DU Y,MA S.Visual Analytics Toward Big Data[J].Journal of Software,2014,25(9):1909-1936.(in Chinese) 任磊,杜一,马帅.大数据可视分析综述[J].软件学报,2014,5(9):1909-1936.
[2] XU J,WANG G Y,YU H.Review of Big Data Processing Based on Granular Compu-ting[J].Chinese Journal of Computers,2015,38(8):1497-1517.(in Chinese) 徐计,王国胤,于洪.基于粒计算的大数据处理[J].计算机学报,2015,38(8):1497-1517.
[3] ZHAO X J,YANG C M,LI B.A Topic Evolution Minning Algorithm of News Text Based on Feature Evolving[J].Chinese Journal of Computers,2014(4):819-832.(in Chinese) 赵旭剑,杨春明,李波.一种基于特征演变的新闻话题演化挖掘方法[J].计算机学报,2014(4):819-832.
[4] CUI K,ZHOU B,JIA Y.LDA-based Model for Online Topic Evolution Mining[J].Computer Science,2010,37(11):156-193.(in Chinese) 崔凯,周斌,贾焰.一种基于LDA的在线主题演化挖掘模型[J].计算机科学,2010,37(11):156-193.
[5] HU Y L,BAI L,ZHANG W M.Modeling and Analyzing Topic Evolution[J].Acta Automatica Sinica,2012,38(10):1690-1697.(in Chinese) 胡艳丽,白亮,张维明.一种话题演化建模与分析方法[J].自动化学报,2012,38(10):1690-1697.
[6] FANG Y,HUANG H Y,XIN X.Topic Evolutionary Analysisfor Dynamic Topic Number[J].Journal of Chinese Information Processing,2014,28(3):142-149.(in Chinese) 方莹,黄河燕,辛欣.面向动态主题数的话题演化分析[J].中文信息学报,2014,28(3):142-149.
[7] XU W,ZHAO B,JI G L.Microblog Topic Evolution Algorithm Based on Retweeti-ng Relationship[J].Computer Science,2016,3(2):79-100.(in Chinese) 徐伟,赵斌,吉根林.基于转发关系的微博话题演化算法[J].计算机科学,2016,3(2):79-100.
[8] JAYASHRI M,CHITRA P.Topic Clustering and Topic Evolution Based On Temporal Parameters[C]∥International Confe-rence on Recent Trends in Information Technology.Chennai,India:IEEE,2012:559-564.
[9] JENSEN S,LIU X Z,YU Y G.Generation of topic evolution trees from heterogeneous bibliographic networks[J].Journal of Informetrics,2016,4(2):606-621.
[10] JO Y,HOPCROFT J E,LAGOZE C.The Web of Topics:Discovering the Topology of Topic Evolution in a Corpus[C]∥WWW 2011-Session:Spatio-Temporal Analysis.Hyderabad,India:ACM,2011:257-266.
[11] ZHAO A H,LIU P U,ZHENG Y.Subtopic Division in News Topic Based on Latent Dirichlet Allocation[J].Journal of Chinese Computer Systems,2013,4(4):732-737.(in Chinese) 赵爱华,刘培玉,郑燕.基于LDA的新闻话题子话题划分方法[J].小型微型计算机系统,2013,34(4):732-737.
[12] DING Z Y,ZHOU B,JIA Y.Detecting Spammers with a Bidirectional Vote Algorithm Based on Statistical Features in Microblogs[J].Journal of Computer Research and Development,2013,0(11):2336-2348.(in Chinese) 丁兆云,周斌,贾焰.微博中基于统计特征与双向投票的垃圾用户发现[J].计算机研究与发展,2013,0(11):2336-2348.
[13] CAI G Y,PENG L B,WANG Y.Topic Detection and Evolution Analysis on Microblog[C]∥International Federation for Information Processing.Trondheim,Norway:2014:67-77.
[14] ZAHO B,XU W,JI G L.Discovering Topic Evolution Topology in a Microblog Corpus [C]∥Third International Conference on Advanced Cloud and Big Data.YangZhou,JiangSu,China:CBD,2016:7-14.
[15] BLEI D M,NG A Y,JORDAN M I.Latent Dirichlet Allocation[J].The Journal of Machine Learning Research,2003,3(3):993-1022.
[16] CAO J P,WANG H,XIA Y Q.Bi-path Evolution Model for Online Topic Model Based on LDA[J].Acta Automatica Sinica,2014,40(12):2877-2886.(in Chinese) 曹建平,王晖,夏友清.基于LDA的双通道在线主题演化模型[J].自动化学报,2014,40(12):2877-2886.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!