计算机科学 ›› 2012, Vol. 39 ›› Issue (10): 198-202.

• 人工智能 • 上一篇    下一篇

MB-SinglePass:基于组合相似度的微博话题检测

周刚,邹鸿程,熊小兵,黄永忠   

  1. (软件开发环境国家重点实验室 北京100191) (信息工程大学信息工程学院 郑州450002)
  • 出版日期:2018-11-16 发布日期:2018-11-16

MB-SinglePass:Microblog Topic Detection Based on Combined Similarity

  • Online:2018-11-16 Published:2018-11-16

摘要: 话题检测技术在传统媒体的研究中取得了较好的效果。探讨了针对微博类的新型媒体短文本对象话题检测技术的优化及性能评价。基于微博中联系人存在的关注和粉丝等结构化信息、帖子之间转发评论等内在关联关系,提出了针对微博的MI3-SinglePass话题检测算法该算法除了考虑微博上述特点之外,还针对短文本特征稀疏的问题,利用同义词典,引入了微博特征扩展技术,丰富了特征信息。同时,针对单一使用余弦相似度、雅各比相似度和语义相似度的不足,采用了组合相似度策略。相较传统算法,MI3-SinglcPass算法在新浪微博实测数据集上取得了更好的性能。另外,针对相似度策略的对照实验说明采用组合相似度的效果优于单一相似度。

关键词: 微博,SinglcPass,话题检测,文本相似度,同义词扩展

Abstract: Topic detection achieves quite good result in the traditional media research. This paper discussed the refiness and performance evaluation of the topic detection technique in the new kind of medics such as microblog, proposed theM13-SinglePass topic detection algorithm on the basis of the structured information such as the relationships of attenlions and fans between contacts, the inner connection relationships such as forwarding and comment between posts. Beside considering the above microblog characteristics,MB-SinglePass introduces the characteristics extension technique in order w enrich characteristics information. At the same time, the paper used the combined similarity aiming at the shortage of singly utilizing the Jaccard similarity cocfficient,cosine based similarity and semantic similarity. Compared with the traditional algorithms,MI3-SinglePass shows better performance on the actual dataset of sing microblog. Additionally, experiment according to the similarity strategy reveals better result by using combined similarity than singular similariy.

Key words: Microblog,SinglePass,Topic detection,Text similarity,Synonyms extension

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!