计算机科学 ›› 2010, Vol. 37 ›› Issue (11): 156-159.

• 数据库与数据挖掘 • 上一篇    下一篇

一种基于LDA的在线主题演化挖掘模型

崔凯,周斌,贾焰,梁政   

  1. (国防科学技术大学计算机学院 长沙410073)
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金重点项目(60933005),面上项目(60873204)资助。

LDA-based Model for Online Topic Evolution Mining

CUI Kai,ZHOU Bin,JIA Yan,LIANG Zheng   

  • Online:2018-12-01 Published:2018-12-01

摘要: 基于文本内容的隐含语义分析建立在线主题演化计算模型,通过追踪不同时间片内主题的变化趋势进行主题演化分析。将Latent Dirichlet Allocation(LDA)模型扩展到在线文本流,建立并实现了在线LDA模型;利用前一时间片的后验概率影响当前时间片的先验概率来维持主题间的连续性;根据改进的增量Gibbs算法进行推理,获取主题一词和文档一主题的概率分布,利用KullbackLeibler(KL)相对嫡来衡量主题之间的相似度,从而发现主题演化中的“主题遗传”和“主题变异”。实验结果表明,该模型能从互联网语料中找出主题的演化趋势,具有良好的效果。

关键词: 主题模型,LDA,演化,舆情

Abstract: A computational model for online topic evolution mining was established through a latent semantic analysis process on textual data. Topical evolutionary analysis was achieved by tracking the topic trends in different time-slices.In this paper, Latent Dirichlet Allocation (LDA) was extended to the context of online text streams, and an online LDA model was proposed and implemented as well. The main idea is to use the posterior of topirword distribution of each time-slice to influence the inference of the next time-slice, which also maintains the relevance between the topics. The topirword and document-topic distributions arc inferenced by incremental Gibbs algorithm. Kullback Leibler (KI)relative entropy is uesd to measure the similarity between topics in order to identify topic genetic and topic mutation. Experiments show that the proposed model can discover meaningful topical evolution trends both on English and Chinese corpus.

Key words: Topic model, LDA, Evolution, Public opinion

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!