面向微博内容的信息抽取模型研究

摘要/Abstract

摘要： 社会媒体是人们用来分享意见、见解、观念和经验的平台或工具,目前已经发展成具有重大影响力的新媒体。而微博作为社会媒体的一个重要部分,对信息的传播起到了很大的作用。面向微博内容的信息抽取就是要从充满噪音的、零碎的、非结构化的微博内容的自由文本中提取有价值的结构化的信息,以利于从微博内容中有效地获取信息。提出了一种基于因子图的微博事件抽取方法来准确地抽取微博中所反映的事件。最后通过实验验证了该方法在性能和准确性上都比其他的方法要高。

关键词: 社会媒体,微博,事件抽取,因子图中图法分类号TP393文献标识码J

Abstract: Social media is the platform or tool that people use to share opinions,insights,ideas and experience．It has become the new media having great influence．Microblogging is an important part of social media,so it will play an important role in the information transfer．Microblogged content-oriented information extraction is to extract the valuable structred information from free text of full of noise,loose,unstructured microblogging content to facilitate effective access to information from Twitter content.This paper proposed a microblogging event extraction based on factor graph approach to accurately extract the events reflected in microblogging．At last we used some experiments to verify the effectiveness of the methods,and the results show that the performance and accuracy of this method is higher than other methods．

Key words: Social media,Microblog,Event extraction,Factor graph

郑影,李大辉. 面向微博内容的信息抽取模型研究[J]. 计算机科学, 2014, 41(2): 270-275. https://doi.org/

ZHENG Ying and LI Da-hui. Research on Information Extration Model for Microblog Content[J]. Computer Science, 2014, 41(2): 270-275. https://doi.org/

参考文献

[1] Wikipedia．Facebook user statistics．http://en.wikipe-dia.org/wiki/Facebook,2013
[2] Wikipedia．Twitter user statistics．http://en.wikipedia.org/wiki/twitter,2013
[3] How many Twitter Users Are There 2012．http://www.howmanyarethere.org/how-many-twitter-users-are-there-2012/2/,2013
[4] How Many Facebook Users Are There．http://www.howmanyarethere.org/how-many-facebook-users-are-there-2012/,2013
[5] Settles B．Biomedical named entity recognition using conditional random fields and rich feature sets[C]∥Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications．Association for Computational Linguistics,2004:104-107
[6] Xiao J,Su J,Zhou G,et al．Protein-protein interaction extrac-tion:a supervised learning approach[C]∥Proc Symp on Semantic Mining in Biomedicine．2005:51-59
[7] Richardson M,Domingos P．Markov logic networks [J]．Ma-chine learning,2006,62(1/2):107-136
[8] Casella G,George E I．Explaining the Gibbs sampler [J]．TheAmerican Statistician,1992,46(3):167-174
[9] McClosky D,Charniak E,Johnson M．Effective self-training for parsing[C]∥Proceedings of the main conference on human language technology conference of the North American Chapter of the Association of Computational Linguistics．Association for Computational Linguistics,2006:152-159
[10] Yates A,Cafarella M,Banko M,et al．TextRunner:open information extraction on the Web[C]∥Proceedings of Human Language Technologies:The Annual Conference of the North American Chapter of the Association for Computational Linguistics:Demonstrations．Association for Computational Linguistics,2007:25-26
[11] Grishman R,Westbrook D,Meyers A．NYU’s English ACE2005system description[C]∥Proc．ACE 2005Evaluation Workshop．2005
[12] Liao S,Grishman R．Using document level cross-event inference to improve event extraction[C]∥Proceedings of the 48th AnnualMeeting of the Association for Computational Linguistics．Association for Computational Linguistics,2010:789-797
[13] Bethard S,Martin J H．Identification of event mentions and their semantic class[C]∥Proceedings of the 2006Conference on Empirical Methods in Natural Language Processing．Association for Computational Linguistics,2006:146-154
[14] Llorens H,Saquete E,Navarro-Colorado B．TimeML events re-cognition and classification:learning CRF models with semantic roles[C]∥Proceedings of the 23rd International Conference on Computational Linguistics．Association for Computational Linguistics,2010:725-733
[15] Yu L C,Chan C L,Lin C C,et al．Mining association language patterns using a distributional semantic model for negative life event classification [J]．Journal of biomedical informatics,2011,44(4):509-518
[16] Sankaranarayanan J,Samet H,Teitler B E,et al.TwitterStand:news in tweets[C]∥SIGSPATIAL,GIS’09．New York,NY,USA:ACM Press,2009:42-51
[17] Sakaki T,Okazaki M,Matsuo Y．Earthquake shakes Twitter u-sers:real-time event detection by social sensors[C]∥WWW’10．2010:851-860
[18] Benson E,Haghighi A,Barzilay R．Event discovery in socialmedia feeds[C]∥Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies-Volume 1．Association for Computational Linguistics,2011:389-398
[19] Agarwal A,Rambow O．Automatic detection and classification of social events[C]∥Proceedings of the 2010Conference on Empirical Methods in Natural Language Processing．Association for Computational Linguistics,2010:1024-1034
[20] Ritter A,Clark S,Etzioni O．Named entity recognition intweets:an experimental study[C]∥Proceedings of the Conference on Empirical Methods in Natural Language Processing．Association for Computational Linguistics,2011:1524-1534
[21] Murphy K P,Weiss Y,Jordan M I．Loopy belief propagation for approximate inference:An empirical study[C]∥Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence．Morgan Kaufmann Publishers Inc.,1999:467-475
[22] Fletcher R．Practical methods of optimization [M]．1987
[23] Nocedal J．Updating quasi-Newton matrices with limited storage [J]．Mathematics of computation,1980,35(151):773-782
[24] 杨武,宋静静,唐继强．中文微博情感分析中主客观句分类方法[J]．重庆理工大学学报:自然科学版,2013,27(1):51-56

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed