计算机科学 ›› 2013, Vol. 40 ›› Issue (Z11): 235-237.

• 数据存储与挖掘 • 上一篇    下一篇

中文微博的Hashtag话题相关性分析

胡长龙,唐晋韬,王挺   

  1. 国防科学技术大学计算机学院 长沙410073;国防科学技术大学计算机学院 长沙410073;国防科学技术大学计算机学院 长沙410073
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然科学基金项目(61202337,6)资助

Topical Relevance Analysis of Hashtags in Chinese Microblogging Environment

HU Chang-long,TANG Jin-tao and WANG Ting   

  • Online:2018-11-16 Published:2018-11-16

摘要: Hashtag(微博话题词)是发布者为微博信息创建的话题标签,能帮助用户在海量微博数据中高效发现热点话题。Hashtag由用户创建的特性使得不同的Hashtag可能代表着同一个话题,挖掘Hashtag之间的话题相关性将有助于热点话题发现和聚合展示。研究了Hashtag之间相关性分析问题,抽取了Hashtag文本特征、微博内容、Hashtag的出现次数-时间分布以及Hashtag共现等一系列特征,以分析Hashtag之间的话题相关性。在新浪微博数据上的实验结果显示,这一系列特征组合能较好地帮助Hashtag相关性分析。

关键词: 微博,话题相关性,Hashtag,特征抽取

Abstract: Hashtag (the topical words of a micro-blog) is a kind of topic label of microblog created by publisher,which can help users find hot topics efficiently from the massive micro-blog data.Different Hashtags created by different publisher may describe the same topic.Thus mining the relevance between the Hashtags will help to find hot topics more efficiently.In this paper,a wide range of features were explored to analyze the topical relevance between Hashtags,such as the Hashtag text,content of the related microblog,the time of occurrence and the co-occurrences of Hashtags.The experimental results show that the proposed features are helpful for topical relevance analysis of Hashtags.

Key words: Micro-blog,Topical relevance,Hashtag,Feature extraction

[1] Rosa K D,Shah R,Lin B,et al.Topical clustering of tweets[C]∥Proceedings of the ACM SIGIR:SWSM.2011
[2] Sankaranarayanan J,Samet H,Teitler B E,et al.Twitterstand:news in tweets[C]∥Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems.ACM,2009:42-51
[3] 张晓艳.新闻话题表示模型和关联追踪技术研究[D].长沙:国防科学技术大学,2010
[4] Pschko J.Exploring Twitter Hashtags[Z].2011
[5] Antenucci D,Handy G,Modi A,et al.Classification of Tweets Via Clustering of Hashtags[Z].2011
[6] 郑斐然,苗夺谦,张志飞.一种中文微博新闻话题检测的方法[J].计算机科学,2012,39(1):138
[7] Cataldi M,Di Caro L,Schifanella C.Emerging topic detection on Twitter based on temporal and social terms evaluation[C]∥Proceedings of the Tenth International Workshop on Multimedia Data Mining.ACM,2010:4
[8] Chang H C.A new perspective on twitter Hashtag use:diffusion of innovation theory[J].Proceedings of the American Society for Information Science and Technology,2010,47(1):1-4
[9] 随机森林-维基百科,自由的百科全书[DB/OL].http://zh.wikipedia.org/wiki/随机森林,2013
[10] Leydesdorff L.On the normalization and visualization of author cocitation data:Salton’s Cosine versus the Jaccard index[J].Journal of the American Society for Information Science and Technology,2008,59(1):77-85
[11] Laniado D,Mika P.Making sense of twitter[M].The Semantic Web-ISWC 2010.Springer Berlin Heidelberg,2010:470-485
[12] Guo W,Li H,Ji H,et al.Linking Tweets to News:A Framework to Enrich Short Text Data in Social Media
[13] Wang A H.Don’t follow me:Spam detection in twitter[C]∥Security and Cryptography (SECRYPT),Proceedings of the 2010International Conference on.IEEE,2010:1-10
[14] Benevenuto F,Magno G,Rodrigues T,et al.Detecting spammers on twitter[C]∥Collaboration,electronic messaging,anti-abuse and spam conference (CEAS).2010
[15] Cilibrasi R L,Vitanyi P M B.The google similarity distance[J].IEEE Transactions on Knowledge and Data Engineering,2007,19(3):370-383

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!