Computer Science ›› 2018, Vol. 45 ›› Issue (4): 66-70.doi: 10.11896/j.issn.1002-137X.2018.04.009

Previous Articles     Next Articles

Chinese Part-of-speech Tagging Model Using Attention-based LSTM

SI Nian-wen, WANG Heng-jun, LI Wei, SHAN Yi-dong and XIE Peng-cheng   

  • Online:2018-04-15 Published:2018-05-11

Abstract: Because traditional statistical model based Chinese part-of-speech tagging relies heavily on manually designed features,this paper proposed an effective attention based long short-term memory model for Chinese part-of-speech tagging.The proposed model utilizes the basic distributed word vector as the unit input,and extracts rich contextual feature representation with bidirectional long short-term memory.At the same time,an attention based hidden layer is added in the network,and the attention probability is distributed for hidden state in different time to optimize and improve the quality of hidden vector.The state transition probability is employed in decoding process to further improve accuracy.Experimental results on PKU and CTB5 dataset show that the proposed model is able to make Chinese part-of-speech tagging effectively.It achieves higher accuracy than traditional methods and gets competitive results compared with state-of-the-art models.

Key words: Part-of-speech tagging,Long short-term memory,Attention mechanism,Contextual feature

[1] LIU Q,ZHANG H P,YU H K,et al.Chinese lexical analysisusing cascaded hidden markov model[J].Journal of Computer Research and Development,2004,41(8):1421-1429.(in Chinese) 刘群,张华平,俞鸿魁,等.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429.
[2] HAN X,HUANG D G.Research on Chinese Part-of-speech tagging based on semi hidden Markov model [J].Journal of Chinese Computer Systems,2015,36(12):2813-2816.(in Chinese) 韩霞,黄德根.基于半监督隐马尔科夫模型的汉语词性标注研究[J].小型微型计算机系统,2015,36(12):2813-2816.
[3] ZHAO Y,WANG X L,LIU B Q,et al.Fusion of clustering trigger-pair features for POS tagging based on maximum entropy model [J].Journal of Computer Research and Development,2006,43(2):268-274.(in Chinese) 赵岩,王晓龙,刘秉权,等.融合聚类触发对特征的最大熵词性标注模型[J].计算机研究与发展,2006,43(2):268-274.
[4] HE J Z,WANG H F.Chinese word sense disambiguation based on maximum entropy model with feature selection [J].Journal of Software,2010,21(6):1287-1295.(in Chinese) 何径舟,王厚峰.基于特征选择和最大熵模型的汉语词义消歧[J].软件学报,2010,21(6):1287-1295.
[5] HONG M C,ZHANG K,TANG J,et al.A Chinese part ofspeech tagging approach using conditional random fields [J].Computer Science,2006,33(10):148-151.(in Chinese) 洪铭材,张阔,唐杰,等.基于条件随机场(CRFs)的中文词性标注方法[J].计算机科学,2006,33(10):148-151.
[6] YU D J,GE Y Q,YU Z T.Chinese Part-of-speech tagging based on conditional random field [J].Microelectronics & Computer,2011,28(10):63-66.(in Chinese) 于江德,葛彦强,余正涛.基于条件随机场的汉语词性标[J].微电子学与计算机,2011,28(10):63-66.
[7] COLLOBERT R,WESTON J,BOTTOU L,et al.Natural Language Processing(Almost) from Scratch[J].Journal of Machine Learning Research,2011,12(1):2493-2537.
[8] ZHENG X,CHEN H,XU T.Deep learning for Chinese word segmentation and POS tagging[C]∥Conference on Empirical Methods in Natural Language Processing.2013.
[9] ZHOU Q,WEN L,WANG X,et al.A Hierarchical LSTM Modelfor Joint Tasks[M]∥Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data.Springer International Publishing,2016.
[10] HUANG Z,XU W,YU K.Bidirectional LSTM-CRF Models for Sequence Tagging [J].arXiv Preprint.arXiv:1508.01991.
[11] BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[C]∥Proceeding of International Conference on Learning Representations.2015.
[12] CHENG H,FANG H,HE X,et al.Bi-directional Attention with Agreement for Dependency Parsing[C]∥Conference on Empirical Methods in Natural Language Processing.2016.
[13] RUSH A M,CHOPRA S,WESTON J.A Neural AttentionModel for Abstractive Sentence Summarization[C]∥Confe-rence on Empirical Methods in Natural Language Processing.2015.
[14] 宗成庆.统计自然语言处理[M].北京:清华大学出版社,2008.
[15] COTTER A,SHAMIR O,SREBRO N,et al.Better Mini-Batch Algorithms via Accelerated Gradient Methods[C]∥Advances in Neural Information Processing Systems.2011:1647-1655.
[16] HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving neural networks by preventing co-adaptation of feature detectors[J].Computer Science,2012,3(4):212-223.
[17] BASTIEN F,LAMBLIN P,PASCANU R,et al.Theano:new features and speed improvements[C]∥Deep Learning and Unsupervised Feature Learning, IPS 2012 Workshop.2012.
[18] ZHU C H,ZHAO T J,ZHENG D Q.Joint Chinese word segmentation and pos tagging system with undirected graphical models [J].Journal of Electronics & Information Technology,2010,32(3):700-704.(in Chinese) 朱聪慧,赵铁军,郑德权.基于无向图序列标注模型的中文分词词性标注一体化系统[J].电子与信息学报,2010,32(3):700-704.
[19] WANG Z,XUE N.Joint POS Tagging and Transition-basedConstituent Parsing in Chinese with Non-local Features[C]∥Meeting of the Association for Computational Linguistics.2014:733-742.
[20] YANG L,ZHANG M,LIU Y,et al.Joint POS Tagging and Dependency Parsing with Transition-based Neural Networks[J].arXiv Preprint.arXiv:1704.07616.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!