计算机科学 ›› 2012, Vol. 39 ›› Issue (1): 138-141.

• 数据库与数据挖掘 • 上一篇    下一篇

一种中文微博新闻话题检测的方法

郑斐然 苗夺谦 张志飞 高灿   

  1. (同济大学计算机科学与技术系 上海201804) (同济大学嵌入式系统与服务计算教育部重点实验室 上海201804)
  • 出版日期:2018-11-16 发布日期:2018-11-16

News Topic Detection Approach on Chinese Microblog

  • Online:2018-11-16 Published:2018-11-16

摘要: 微博的迅猛发展带来了另一种社会化的新闻媒体形式。提出一种从微博中挖掘新闻话题的方法,即在线检测微博消息中大量突现的关键字,并将它们进行聚类,从而找到新闻话题。为了提取出新闻主题词,综合考虑短文本中的词频和增长速度而构造复合权值,用以量化词语是新闻词汇的程度;在话题构造中使用了上下文的相关度模型来支撑增量式聚类算法,相比于语义相似度模型,其更能适应该问题的特点。在真实的微博数据上运行的实验表明,本方法可以有效地从大量消息中检测出新闻话题。

关键词: 微博,新闻,话题检测,聚类

Abstract: The popularity of microblogging brings another form of social news media. The paper proposed an approach of news topics mining from microblog. News topics were formed by finding the emerging keywords in large numbers and clustering them. To extract news keywords,a compound weight was introduced combining the word frequency and the growth, to measure the likelihood of a word to be a news keyword, and to construct the topic, contextual relevance model was used to support incremental clustering, which is more suitable to the problem compared with semantic similarity. The experiments on real world microblog data show the effectiveness of the approach to detect news topic out of massroc mcssagcs.

Key words: Microblog,News,Topic detection,Clustering

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!