计算机科学 ›› 2018, Vol. 45 ›› Issue (4): 66-70.doi: 10.11896/j.issn.1002-137X.2018.04.009

• 2017年全国理论计算机科学学术年会 • 上一篇    下一篇

基于注意力长短时记忆网络的中文词性标注模型

司念文,王衡军,李伟,单义栋,谢鹏程   

  1. 中国人民解放军信息工程大学三院 郑州450001,中国人民解放军信息工程大学三院 郑州450001,66083部队 北京100144,中国人民解放军信息工程大学三院 郑州450001,西安交通大学数学与统计学院 西安710049
  • 出版日期:2018-04-15 发布日期:2018-05-11

Chinese Part-of-speech Tagging Model Using Attention-based LSTM

SI Nian-wen, WANG Heng-jun, LI Wei, SHAN Yi-dong and XIE Peng-cheng   

  • Online:2018-04-15 Published:2018-05-11

摘要: 针对传统的基于统计模型的词性标注存在人工特征依赖的问题,提出一种有效的基于注意力长短时记忆网络的中文词性标注模型。该模型以基本的分布式词向量作为单元输入,利用双向长短时记忆网络提取丰富的词语上下文特征表示。同时在网络中加入注意力隐层,利用注意力机制为不同时刻的隐状态分配概率权重,使隐层更加关注重要特征,从而优化和提升隐层向量的质量。在解码过程中引入状态转移概率矩阵,以进一步提升标注准确率。在《人民日报》和中文宾州树库CTB5语料上的实验结果表明,该模型能够有效地进行中文词性标注,其准确率高于条件随机场等传统词性标注方法,与当前较好的词性标注模型也十分接近。

关键词: 词性标注,长短时记忆网络,注意力机制,上下文特征

Abstract: Because traditional statistical model based Chinese part-of-speech tagging relies heavily on manually designed features,this paper proposed an effective attention based long short-term memory model for Chinese part-of-speech tagging.The proposed model utilizes the basic distributed word vector as the unit input,and extracts rich contextual feature representation with bidirectional long short-term memory.At the same time,an attention based hidden layer is added in the network,and the attention probability is distributed for hidden state in different time to optimize and improve the quality of hidden vector.The state transition probability is employed in decoding process to further improve accuracy.Experimental results on PKU and CTB5 dataset show that the proposed model is able to make Chinese part-of-speech tagging effectively.It achieves higher accuracy than traditional methods and gets competitive results compared with state-of-the-art models.

Key words: Part-of-speech tagging,Long short-term memory,Attention mechanism,Contextual feature

[1] LIU Q,ZHANG H P,YU H K,et al.Chinese lexical analysisusing cascaded hidden markov model[J].Journal of Computer Research and Development,2004,41(8):1421-1429.(in Chinese) 刘群,张华平,俞鸿魁,等.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429.
[2] HAN X,HUANG D G.Research on Chinese Part-of-speech tagging based on semi hidden Markov model [J].Journal of Chinese Computer Systems,2015,36(12):2813-2816.(in Chinese) 韩霞,黄德根.基于半监督隐马尔科夫模型的汉语词性标注研究[J].小型微型计算机系统,2015,36(12):2813-2816.
[3] ZHAO Y,WANG X L,LIU B Q,et al.Fusion of clustering trigger-pair features for POS tagging based on maximum entropy model [J].Journal of Computer Research and Development,2006,43(2):268-274.(in Chinese) 赵岩,王晓龙,刘秉权,等.融合聚类触发对特征的最大熵词性标注模型[J].计算机研究与发展,2006,43(2):268-274.
[4] HE J Z,WANG H F.Chinese word sense disambiguation based on maximum entropy model with feature selection [J].Journal of Software,2010,21(6):1287-1295.(in Chinese) 何径舟,王厚峰.基于特征选择和最大熵模型的汉语词义消歧[J].软件学报,2010,21(6):1287-1295.
[5] HONG M C,ZHANG K,TANG J,et al.A Chinese part ofspeech tagging approach using conditional random fields [J].Computer Science,2006,33(10):148-151.(in Chinese) 洪铭材,张阔,唐杰,等.基于条件随机场(CRFs)的中文词性标注方法[J].计算机科学,2006,33(10):148-151.
[6] YU D J,GE Y Q,YU Z T.Chinese Part-of-speech tagging based on conditional random field [J].Microelectronics & Computer,2011,28(10):63-66.(in Chinese) 于江德,葛彦强,余正涛.基于条件随机场的汉语词性标[J].微电子学与计算机,2011,28(10):63-66.
[7] COLLOBERT R,WESTON J,BOTTOU L,et al.Natural Language Processing(Almost) from Scratch[J].Journal of Machine Learning Research,2011,12(1):2493-2537.
[8] ZHENG X,CHEN H,XU T.Deep learning for Chinese word segmentation and POS tagging[C]∥Conference on Empirical Methods in Natural Language Processing.2013.
[9] ZHOU Q,WEN L,WANG X,et al.A Hierarchical LSTM Modelfor Joint Tasks[M]∥Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data.Springer International Publishing,2016.
[10] HUANG Z,XU W,YU K.Bidirectional LSTM-CRF Models for Sequence Tagging [J].arXiv Preprint.arXiv:1508.01991.
[11] BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[C]∥Proceeding of International Conference on Learning Representations.2015.
[12] CHENG H,FANG H,HE X,et al.Bi-directional Attention with Agreement for Dependency Parsing[C]∥Conference on Empirical Methods in Natural Language Processing.2016.
[13] RUSH A M,CHOPRA S,WESTON J.A Neural AttentionModel for Abstractive Sentence Summarization[C]∥Confe-rence on Empirical Methods in Natural Language Processing.2015.
[14] 宗成庆.统计自然语言处理[M].北京:清华大学出版社,2008.
[15] COTTER A,SHAMIR O,SREBRO N,et al.Better Mini-Batch Algorithms via Accelerated Gradient Methods[C]∥Advances in Neural Information Processing Systems.2011:1647-1655.
[16] HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving neural networks by preventing co-adaptation of feature detectors[J].Computer Science,2012,3(4):212-223.
[17] BASTIEN F,LAMBLIN P,PASCANU R,et al.Theano:new features and speed improvements[C]∥Deep Learning and Unsupervised Feature Learning, IPS 2012 Workshop.2012.
[18] ZHU C H,ZHAO T J,ZHENG D Q.Joint Chinese word segmentation and pos tagging system with undirected graphical models [J].Journal of Electronics & Information Technology,2010,32(3):700-704.(in Chinese) 朱聪慧,赵铁军,郑德权.基于无向图序列标注模型的中文分词词性标注一体化系统[J].电子与信息学报,2010,32(3):700-704.
[19] WANG Z,XUE N.Joint POS Tagging and Transition-basedConstituent Parsing in Chinese with Non-local Features[C]∥Meeting of the Association for Computational Linguistics.2014:733-742.
[20] YANG L,ZHANG M,LIU Y,et al.Joint POS Tagging and Dependency Parsing with Transition-based Neural Networks[J].arXiv Preprint.arXiv:1704.07616.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!