计算机科学 ›› 2020, Vol. 47 ›› Issue (11): 212-219.doi: 10.11896/jsjkx.191000201

• 人工智能 • 上一篇    下一篇

LAC-DGLU:基于CNN和注意力机制的命名实体识别模型

赵丰, 黄健, 张中杰   

  1. 国防科技大学智能科学学院 长沙 410073
  • 收稿日期:2019-10-31 修回日期:2020-03-29 出版日期:2020-11-15 发布日期:2020-11-05
  • 通讯作者: 黄健(nudtjHuang@hotmail.com)
  • 作者简介:62794258@qq.com

LAC-DGLU:Named Entity Recognition Model Based on CNN and Attention Mechanism

ZHAO Feng, HUANG Jian, ZHANG Zhong-jie   

  1. College of Artificial Intelligence,National University of Defense Technology,Changsha 410073,China
  • Received:2019-10-31 Revised:2020-03-29 Online:2020-11-15 Published:2020-11-05
  • About author:ZHAO Feng,born in 1997,postgradua-te.His main research interests include natural language processing and so on.
    HUANG Jian,born in 1971,Ph.D,professor,Ph.D supervisor.Her main research interests include complex system modeling and so on.

摘要: 对文本进行分词和词嵌入通常是中文命名实体识别的第一步,但中文的词与词之间没有明确的分界符,专业词及生僻词等未收录词(Out of Vocabulary,OOV)严重干扰了词向量的计算,基于词向量嵌入的模型性能极易受到分词效果的影响。同时现有模型大多使用循环神经网络,计算速度较慢,很难达到工业应用的要求。针对上述问题,构建了一个基于注意力机制和卷积神经网络的命名实体识别模型,即LAC-DGLU。针对分词依赖的问题,提出了一种基于局部注意力卷积(Local Attention Convolution,LAC)的字嵌入算法,减轻了模型对分词效果的依赖。针对计算速度较慢的问题,使用了一种带门结构的卷积神经网络,即膨胀门控线性单元(Dilated Gated Linear Unit,DGLU),提高了模型的计算速度。在多个数据集上的实验结果显示,该模型相比现有最优模型F1值提高了0.2%~2%,训练速度可以达到现有最优模型的1.4~1.9倍。

关键词: 残差结构, 局部注意力卷积, 门控线性单元, 膨胀卷积, 字嵌入

Abstract: Text segmentation and word embedding are usually the first step in Chinese named entity recognition,but there is no clear delimiter between Chinese words and words.OOV(out of vocabulary) words such as professional words and uncommon words are severely disturbing the computation of word vectors.Model performance based on word vector embedding is highly susceptible to word segmentation effects.At the same time,most of the existing models use low-speed recurrent neural network which is difficult to meet the requirements of industrial applications.Aiming at the above problems,this paper constructs a named entity recognition model based on attention mechanism and convolutional neural network:LAC-DGLU.To handel the problem of word segmentation,this paper proposes a word embedding algorithm based on Local Attention Convolution (LAC),which alle-viates the dependence of the model on the effect of word segmentation.For the problem of slow calculation speed,this paper uses aconvolutional neural network with gate structure:Dilated Gated Linear Unit (DGLU) to improve the speed of model calculation.The experimental results on several datasets show that the model can increase the F1 value by 0.2% to 2%compared with the existing mainstream model,and the calculation speed can reach more than 1.4to 1.9 times of the existing mainstream model.

Key words: Character embedding, Dilated convolution, Gated linear unit, Local attention convolution, Residual structure

中图分类号: 

  • TP391
[1] LEVY O,GOLDBERG Y.Neural word embedding asimplicitmatrix factorization[C]//Advances in Neural Information Processing Systems.2014:2177-2185.
[2] MIKOLOV T,KARAFIÁT M,BURGETL,et al.Recurrentneural network based language model[C]//Eleventh Annual Conference of the International Speech Communication Association.2010.
[3] YAO K,PENG B,ZWEIG G,et al.Recurrent conditional ran-dom field for language understanding[C]//2014 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2014:4077-4081.
[4] SUNDERMEYER M,SCHLÜTER R,NEY H.LSTM neuralnetworks for language modeling[C]//Thirteenth Annual Conference of the International Speech Communication Association.2012.
[5] HUANG Z,XU W,YU K.Bidirectional LSTM-CRF models forsequence tagging[J].arXiv:1508.01991,2015.
[6] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[7] LAMPLE G,BALLESTEROS M,SUBRAMANIAN S,et al.Neural Architectures for Named Entity Recognition[C]//Proceedings of NAACL-HLT.2016:260-270.
[8] SANG E F,DE MEULDER F.Introduction to the CoNLL-2003 shared task:Language-independent named entity recognition[J].arXiv:cs/0306050,2003.
[9] STRUBELL E,VERGA P,BELANGER D,et al.Fast and Ac-curate Entity Recognition with Iterated Dilated Convolutions[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.2017:2670-2680.
[10] SHEN T,ZHOU T,LONGG,et al.Disan:Directional self-attention network for rnn/cnn-free language understanding[C]//Thirty-Second AAAI Conference on Artificial Intelligence.2018.
[11] PETERS M E,NEUMANN M,IYYER M,et al.Deep contextualized word representations[C]//Proceedings of NAACL-HLT.2018:2227-2237.
[12] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[13] FENG Y H,YU H,SUN G,et al.Named Entity Recognition Method Based on BLSTM[J].Computer Science,2018,45(2):261-268.
[14] SHAN Y D,WANG H J,HUANG H,et al.Study on NamedEntity Recognition Model Based on Attention Mechanism-Taking Military Text as Example[J].Computer Science,2019,46(S1):111-114,119.
[15] PENG N,DREDZEM.Named entity recognition for chinese social media with jointly trained embeddings[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:548-554.
[16] ZHANG Y,YANG J.Chinese NER Using Lattice LSTM[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers).2018:1554-1564.
[17] CAO P,CHEN Y,LIUK,et al.Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:182-192.
[18] DONG C,ZHANG J,ZONG C,et al.Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[M]//Natural Language Understanding and Intelligent Applications.Springer,Cham,2016:239-250.
[19] YANG F,ZHANG J,LIU G,et al.Five-Stroke Based CNN-BiRNN-CRF Network for Chinese Named Entity Recognition[C]//CCF International Conference on Natural Language Processing and Chinese Computing.Springer,Cham,2018:184-195.
[20] DAUPHIN Y N,FAN A,AULIM,et al.Language modelingwith gated convolutional networks[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70.JMLR.org,2017:933-941.
[21] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2016:770-778.
[22] YU A W,DOHAN D,LUONG M T,et al.Qanet:Combining local convolution with global self-attention for reading comprehension[J].arXiv:1804.09541,2018.
[23] YU F,KOLTUN V.Multi-scale context aggregation by dilated convolutions[J].arXiv:1511.07122,2015.
[24] BRAUN S.LSTM Benchmarks for Deep Learning Frameworks[J].arXiv:1806.01818,2018.
[25] LIU L,JIANG H,HE P,et al.On the variance of the adaptive learning rate and beyond[J].arXiv:1908.03265,2019.
[26] HE H,SUN X.A unified model for cross-domain and semi-su-pervised namd entity recognition in chinese social media[C]//Thirty-First AAAI Conference on Artificial Intelligence.2017.
[1] 张侣, 周博文, 吴亮红.
基于改进卷积注意力模块与残差结构的SSD网络
SSD Network Based on Improved Convolutional Attention Module and Residual Structure
计算机科学, 2022, 49(3): 211-217. https://doi.org/10.11896/jsjkx.201200019
[2] 吴昊昊, 王方石.
多尺度膨胀卷积在图像分类中的应用
Application of Multi-scale Dilated Convolution in Image Classification
计算机科学, 2020, 47(6A): 166-171. https://doi.org/10.11896/JsJkx.190600179
[3] 吕培建, 陈佳鹏, 袁飞, 彭强, 项煜.
基于上下文以及多尺度信息融合的目标检测算法
Object Detection Algorithm Based on Context and Multi-scale Information Fusion
计算机科学, 2019, 46(6A): 279-283.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!