计算机科学 ›› 2013, Vol. 40 ›› Issue (6): 196-198.

• 人工智能 • 上一篇    下一篇

中文微博命名实体识别

邱泉清,苗夺谦,张志飞   

  1. 同济大学计算机科学与技术系 上海201804同济大学嵌入式系统与服务计算教育部重点实验室 上海200092;同济大学计算机科学与技术系 上海201804同济大学嵌入式系统与服务计算教育部重点实验室 上海200092;同济大学计算机科学与技术系 上海201804同济大学嵌入式系统与服务计算教育部重点实验室 上海200092
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然科学基金项目(60970061,6,61103067),中央高校基本科研业务费专项资金资助

Named Entity Recognition on Chinese Microblog

QIU Quan-qing,MIAO Duo-qian and ZHANG Zhi-fei   

  • Online:2018-11-16 Published:2018-11-16

摘要: 微博这一媒体形式的迅速发展为命名实体识别提供了一个新的载体。根据微博文本的特点,提出针对中文微博的命名实体识别方法。首先,对微博文本做规范化处理,消除由于微博表达不规范造成的干扰;在建立中文人名库、常用地点库等知识库的基础上,选取适合微博的特征模板,使用条件随机场方法进行实体识别;同时,将正确的识别结果添加到知识库中以提升识别效果。在真实微博数据上的实验表明,该方法能够有效地完成中文微博的命名实体识别任务。

关键词: 中文信息处理,微博,命名实体,条件随机场

Abstract: The rapid development of microblog brings a new carrier for named entity recognition.The paper proposed an approach for named entity recognition on Chinese microblog according to the features of microblog.First of all,the paper normalized the text of the microblog and eliminated the interference caused by non-standard expression,then constructed several knowledge bases,such as Chinese person names,common place names and organization names,and devised feature templates for the recognition method based on conditional random fields.Meanwhile the correct recognition results were added to the knowledge bases to improve the performance of recognition.The experiment results show that our approach is effective to recognize named entities on Chinese microblog.

Key words: Chinese information processing,Microblog,Named entity,Conditional random fields

[1] 命名实体评测大纲[C/OL].863命名实体识别评测组,2004.http://www.863data.com.cn
[2] 张晓艳,王挺,陈火旺.命名实体识别研究[J].计算机科学,2005,2(4):44-48
[3] 郑斐然,苗夺谦,张志飞,等.一种中文微博新闻话题检测的方法[J].计算机科学,2012,9(1):138-141
[4] 杨华.基于最大熵模型的中文命名实体方法研究[D].长沙:国防科学技术大学,2008
[5] 俞鸿魁,张华平,刘群,等.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,7(2):87-94
[6] 周俊生,戴新宇,尹存燕,等.基于层叠条件随机场模型的中文机构名自动识别[J].电子学报,2006,4(5):804-808
[7] 周昆.基于规则的命名实体识别研究[D].合肥:合肥工业大学,2010
[8] Della Pietra S,Della Pietra V,Mercer R L,et al.Adaptive language modeling using minimum discriminant estimation[C]∥Acoustics,Speech,and Signal Processing,ICASSP-92.USA,1992:633-636
[9] Chen A,Peng F,Shan R,et al.Chinese named entity recognition with conditional probabilistic models[C]∥Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing.Australia,2006:173-176
[10] Ritter A,Clark S,Mausam,et al.Named entity recognition intweets:an experimental study[C]∥Proceedings of the Confe-rence on Empirical Methods in Natural Language Processing.USA,2011:1524-1534
[11] Ek T,Kirkegaard C,Jonsson H,et al.Named entity recognition for short text messages[J].Procedia-Social and Behavioral Scien-ces,2011,27:178-187
[12] Lafferty J,Mccallum A,Pereira F.Conditional random fields:probabilistic models for segmenting and labeling sequence data[C]∥Proceedings of the 8th International Conference of Machine Learning.USA,2001:282-289
[13] 李航.统计学习方法[M].北京:清华大学出版社,2012:194-196

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!