Computer Science ›› 2017, Vol. 44 ›› Issue (8): 236-241.doi: 10.11896/j.issn.1002-137X.2017.08.040

Previous Articles     Next Articles

Micro-blog’s Text Classification Based on MRT-LDA

PANG Xiong-wen, WAN Ben-shuai and WANG Pan   

  • Online:2018-11-13 Published:2018-11-13

Abstract: Micro-blog’s widespread use has produced a large number of micro-blog data,which contains a large number of valuable information.However,due to the short text content of micro-blog information and its own information on the social network,the traditional model method is not so effective to deal with micro-blog information.For this kind of special text,the traditional text mining algorithm can’t be very good.Based on latent dirichlet allocation (LDA),this paper put forward a micro blogging generation model MRT-LDA according to the characteristics of micro blog information,which takes the relations between Chinese micro-blog documents and other Chinese micro-blog documents into consideration to help topic mining in micro-blog.Gibbs sampling method is used to inference the model,the results indicate that the model can offer an effective solution to text mining for Chinese micro-blog.

Key words: Micro-blog,Topic mining,LDA,MRT-LDA,Probabilistic generative model,Social network

[1] STILO G,VELARDI P.Efficient temporal mining of micro-blogtexts and its application to event discovery[J].Data Mining & Knowledge Discovery,2016,30(2):372-402.
[2] LAZARD A J,SCHEINFELD E,BERNHARDT J M,et al.Detecting themes of public concern:A text mining analysis of the Centers for Disease Control and Prevention’s Ebola live Twitter chat[J].American Journal of Infection Control,2015,43(10):1109-1111.
[3] WEI Y G,LAN M,YE S.Effect of climate and seasonality on depressed mood among twitter users[J].Applied Geography,2015,63:184-191.
[4] BOUAZIZI M,OHTSUKI T.Opinion mining in Twitter:How to make use of sarcasm to enhance sentiment analysis[C]∥IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.New York:ACM,2015:1594-1597.
[5] HOFMANN T.Unsupervised learning by probabilistic latentsemantic analysis[J].Machine Learning,2001,2(1):177-196.
[6] ROELLEKE T,WANG J.TF-IDF uncovered:a study of theo-ries and probabilities[C]∥Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2008:435-442.
[7] GEBOTYS C H,WHITE B A.EM analysis of a wireless Java-based PDA[J].Acm Transactions on Embedded Computing Systems,2008,7(4):2087-2093.
[8] BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[M].JMLR.org,2003:993-1022.
[9] ZHANG C Y,SUN J L.Large scale microblog mining using distributed MB-LDA[C]∥Proceedings of the 21st International Conference Companion on World Wide Web.New York:ACM,2012:1035-1042.
[10] WEI X,CROFT W B.LDA-based document models for ad-hoc retrieval[C]∥Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2006:178-185.
[11] MOEINZADEH H,MOHAMMADI M M,AKBARI A,et al.Evolutionary-class independent LDA as a pre-process for improving classification[C]∥Proceedings of the 11th Annual Con-ference on Genetic and Evolutionary Computation.New York:ACM,2009:1909-1910.
[12] ZHANG Z F,MIAO D Q,GAO C.Short text classification using latent Dirichlet allocation[J].Journal of Computer Applications,2013,33(6):1587-1590.(in Chinese) 张志飞,苗夺谦,高灿.基于LDA主题模型的短文本分类方法[J].计算机应用,2013,33(6):1587-1590.
[13] DIETZ L,BICKEL S,SCHEFFER T.Unsupervised prediction of citation influences[C]∥Proceedings of the 24th International Conference on Machine Learning.New York:ACM,2007:233-240.
[14] BLEI D M,LAFFERTY J.Text Mining:Classification,Clustering,and Applications [M].New York:Chapman & Hall/CRC,2009.
[15] TRUONG H P,LE T H.Fusion of bidirectional image matrices and 2D-LDA:an efficient approach for face recognition[C]∥Proceedings of the Third Symposium on Information and Communication Technology.New York:ACM,2012:142-148.
[16] BLEI D M,JORDAN M I,GRIFFITHS T L,et al.Hierarchical Topic Models and the Nested Chinese Restaurant Process[C]∥International Conference on Neural Information Processing Systems.MIT Press,2003:17-24.
[17] RAMAGE D,HALL D,NALLAPATI R,et al.Labeled LDA:A supervised topic model for credit attribution in multi-labeled corpora[C]∥Conference on Empirical Methods in Natural Language Processing:Volume.Association for Computational Linguistics,2009:248-256.
[18] BLEI D M,LAFFERTY J D.Dynamic topic models[C]∥Proceedings of the 23rd International Conference on Machine Learning.New York:ACM,2006:113-120.
[19] RAMAGE D,DUMAIS S T,LIEBLING D J.Characterizing Microblogs with Topic Models[C]∥ International Conference on Weblogs and Social Media,LSWSM 2010.DBLP,2010:130-137.
[20] PENNACCHIOTTI M,POPESCU A M.A Machine Learning Approach to Twitter User Classification[J].ICWSM,2011,11(1):281-288.
[21] WENG J,LIM E P,JIANG J,et al.Twitterrank:finding topic-sensitive influential twitterers[C]∥Proceedings of the Third ACM International Conference on Web Search and Data Mi-ning.New York:ACM,2010:261-270.
[22] HONG L,DAVISON B D.Empirical study of topic modeling in twitter[C]∥Proceedings of the first workshop on social media analytics.New York:ACM,2010:80-88.
[23] ZHAO W X,JIANG J,WENG J,et al.Comparing twitter and traditional media using topic models[M]∥Advances in Information Retrieval.Berlin:Springer Berlin Heidelberg,2011:338-349.
[24] IWATA T,WATANABE S,YAMADA T,et al.Topic Tracking Model for Analyzing Consumer Purchase Behavior[C]∥IJCAI.2009:1427-1432.
[25] SASAKI K,YOSHIKAWA T,FURUHASHI T.Twitter-TTM:An efficient online topic modeling for Twitter considering dynamics of user interests and topic trends[C]∥15th International Symposium on Soft Computing and Intelligent Systems (SCIS),2014 Joint 7th International Conference on and Advanced Intelligent Systems (ISIS).IEEE,2014:440-445.
[26] ZHANG C Y,SUN J L,DING Y Q.Topic Mining for Microblog Based on MB-LDA Model[J].Journal of Computer Research and Development,2011,48(10):1795-1802.(in chinese) 张晨逸,孙建伶,丁轶群.基于 MB-LDA 模型的微博主题挖掘[J].计算机研究与发展,2011,48(10):1795-1802.
[27] LI J,YIN J,LIU S P,et al.Microblog Topic Mining Based on Hashtag[J].Computer Engineering,2015,41(4):30-35.(in Chinese) 李敬,印鉴,刘少鹏,等.基于话题标签的微博主题挖掘[J].计算机工程,2015,41(4):30-35.
[28] TAO Y C,HE Z Z,SHI L,et al.Personalized microblogging re-commendation based on weighted dynamic degree of interest[J].Journal of Computer Applications,2014,34(12):3491-3496.(in Chinese) 陶永才,何宗真,石磊,等.基于加权动态兴趣度的微博个性化推荐[J].计算机应用,2014,34(12):3491-3496.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!