计算机科学 ›› 2019, Vol. 46 ›› Issue (12): 231-236.doi: 10.11896/jsjkx.190300069

• 人工智能 • 上一篇    下一篇

面向国防科技领域的技术和术语识别方法研究

冯鸾鸾, 李军辉, 李培峰, 朱巧明   

  1. (苏州大学计算机科学与技术学院 江苏 苏州215006);
    (江苏省计算机信息技术处理重点实验室 江苏 苏州215006)
  • 收稿日期:2019-03-16 出版日期:2019-12-15 发布日期:2019-12-17
  • 通讯作者: 李军辉(1983-),男,副教授,硕士生导师,主要研究方向为机器翻译、自然语言处理,E-mail:jhli@suda.edu.cn。
  • 作者简介:冯鸾鸾(1995-),女,硕士生,CCF学生会员,主要研究方向为自然语言处理;李培峰(1971-),男,教授,博士生导师,主要研究方向为自然语言处理和机器学习;朱巧明(1963-),男,教授,博士生导师,主要研究方向为自然语言处理。
  • 基金资助:
    本文受国家自然基金项目重点项目(61836007),面上项目(61772354,61773276)资助。

Technology and Terminology Detection Oriented National Defense Science

FENG Luan-luan, LI Jun-hui, LI Pei-feng, ZHU Qiao-ming   

  1. (School of Computer Sciences and Technology,Soochow University,Suzhou,Jiangsu 215006,China);
    (Provincial Key Laboratory for Computer Information Processing Technology,Suzhou,Jiangsu 215006,China)
  • Received:2019-03-16 Online:2019-12-15 Published:2019-12-17

摘要: 随着自然语言处理技术的发展,人们越来越重视构建面向国防科技领域的知识图谱。而面向国防科技领域的技术和术语识别是构建该领域技术知识图谱的基础。文中基于该领域的语料库,在技术和术语识别的任务上,探索了子词单元在传统序列标注Bi-LSTM+CRF模型上的应用。此外,针对任务的特点,提出了适用于技术和术语识别的语言学特征。基于该领域的语料库,实验结果表明技术和术语识别的F1值达到了71.80%,较基准系统提升了3.04%,能够较好地识别出面向国防科技领域的技术和术语。同时,所提方法也优于基于BERT模型的技术术语识别方法。

关键词: Bi-LSTM+CRF模型, 技术和术语, 面向国防科技领域, 语言学特征, 子词

Abstract: With the rapid development of natural language processing,constructing oriented national defense science (ONDS) technology knowledge base has received more and more attention.The identification of technology and terminology is fundamental for constructing ONDS technology knowledge base.To recognize technology and terminology,this paper explored the application of subwords in the traditional Bi-LSTM+CRF sequence labeling model.In addition,this paper proposed linguistic features to boost the performance.Experimental results on the annotated dataset show that the proposed approach achieves 71.8% F1 scores,with improvement of 3.04% over the baseline system,indicating the effectiveness of the proposed approach in recognizing ONDS technology and terminology.Meanwhile,it also outperforms BERT-driven models in recognizing technology and terminology.

Key words: Bi-LSTM+CRF model, Linguistic features, Oriented national defense science, Subwords, Technology and terminology

中图分类号: 

  • TP391.1
[1]SANG K T,MEULDER D F.Introduction to the conll-2003 shared task:Language-independent named entity recognition[C]//Proceedings of the 2003 Conference on Natural Language Learning.2003:142-147.
[2]CHINCHOR N.MUC-6 named entity task definition (version2.1) [C]//Proceedings of the 6th Conference on Message Understanding.Columbia,Maryland,1995.
[3]COLLINS M,SINGER Y.Unsupervised models for named entity classification[C]//Proceedings of the Joint SIGDAT Confe-rence on Empirical Methods in Natural Language Processing and Very Large Corpora.1999:100-110.
[4]ZHOU G D,SU J.Named Entity Recognition using an HMM-based Chunk Tagger[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.ACL,2002:473-480.
[5]BURGER J D,HENDERSON J C,MORGAN W T.Statistical named entity recognizer adaptation[C]//Proceedings of the 6th Conference on Natural Language Learning.Stroudsburg:Associa-tion for Computational Linguistics,2002:1-4.
[6]CHIEU H T,NG H T.Named Entity Recognition with a Maximum Entropy Approach[C]//Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL.2003:160-163.
[7]CURRAN J R,CLARK S.Language independent NER using a maximum entropy tagger[C]//Proceedings of the Conference on Natural Language Learning at HLT-NAACL.2003:164-167.
[8]EKBAL A,BANDYOPADHYAY S.Named entity recognition using support vector machine:A language independent approach[J].International Journal of Electrical and Electronics Engineering,2010,4(2):155-170.
[9]MAYFIELD J,MCNAMEE P,PIATKO C.Named entity recognition using hundreds of thousands of features[C]//Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL.Stroudsburg:Association for Computational Linguistics,2003:184-187.
[10]MCCALLUM A,LI W.Early results for Named Entity Recognition with Conditional Random Fields,Feature Induction and Web-Enhanced Lexicons[C]//Proceedings of the 7thConfe-rence on Natural Language Learning at HLT-NAACL.Stroud-sburg:Association for Computational Linguistics,2003:188-191.
[11]HUANG Z H,XU W,YU K.Bidirectional LSTM-CRF Models for Sequence Tagging[EB/OL].[2015-08-09].https://arxiv.org/pdf/1508.01991.pdf.
[12]MA X Z,HOVY E.End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.ACL,2016:1064-1074.
[13]BHARADWAJ A,MORTENSEN D,DYER C,et al.Phonologically aware neural model for named entity recognition in low resource transfer settings[C]//Proceedings of the 2016 Confe-rence on Empirical Methods in Natural Language Processing.Stroudsburg:Association for Computational Linguistics,2016:1462-1472.
[14]PETERS M E,NEUMANN M,IYYER M,et al.Deep contextualized word representations[C]//Proceedings of NAACL-HLT 2018.New Orleans:Association for Computational Linguistics,2018:2227-2237.
[15]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[EB/OL].[2019-05-24].https://arxiv.org/pdf/1810.04805.pdf.
[16]AKBIK A,BLYTHE D,VOLLGRAF R.Contextual String Embeddings for Sequence Labeling[C]//Proceedings of the 27th International Conference on Computational Linguistics.Santa Fe,New Mexico,USA:Association for Computational Linguistics,2018:1638-1649.
[17]GUO J K,BRACKLE D V,LOFASO N,et al.Extracting mea- ningful entities from human-generated tactical reports[J].Procedia Computer Science,2015,6(1):72-79.
[18]SHAN H Y,ZHANG H S,WU Z L.A Military Named Entity Recognition Method Based on CRFs with Small Granularity Strategy[J].Journal of Academy of Armored Force Enginee-ring,2017,31(1):87-88.(in Chinese)
单赫源,张海粟,吴照林.小粒度策略下基于CRFs的军事命名实体识别方法[J].装甲兵工程学院学报,2017,31(1):87-88.
[19]FENG Y T,ZHANG H J,HAO W N.Named Entity Recognition for Military Text[J].Computer Science,2015,42(7):15-18,47.(in Chinese)
冯蕴天,张宏军,郝文宁.面向军事文本的命名实体识别[J].计算机科学,2015,42(7):15-18,47.
[20]WANG X F,YANG R P,ZHU W.Military Named Entity Reco- gnition Method Based on Deep Learning[J].Journal of Academy of Armored Force Engineering,2018,32(4):94-98.(in Chinese)
王学锋,杨若鹏,朱巍.基于深度学习的军事命名实体识别方法[J].装甲兵工程学院学报,2018,32(4):94-98.
[21]MIKOLOV T,YIH W T,ZWEIG G.Linguistic regularities in continuous space word representations[C]//Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Atlanta,Georgia:Association for Computational Linguistics,2013:746-751.
[22]LAMPLE G,BALLESTEROS M,SUBRAMANIAN S,et al. Neural architectures for named entity recognition[C]//Procee-dings of NAACL-HLT.San Diego,California,2016:260-270.
[23]YANG J,LIANG S L,ZHANG Y.Design challenges and misconceptions in neural sequence labeling[C]//Proceedings of the 27th International Conference on Computational Linguistics (COLING).2018.
[24]SENNRICH R,HADDOW B,BIRCH A.Neural machine translation of rare words with subword units[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016).Berlin,Germany,2016.
[25]SENNRICH R,HADDOW B.Linguistic Input Features Improve Neural Machine Translation[EB/OL].(2016-06-27).https://arxiv.org/pdf/1606.02892.pdf.
[26]GAN L X,WAN C X,LIU D X,et al.Chinese Named Entity Relation Extraction Based on Syntactic and Semantic Features[J].Journal of Computer Research and Development,2016,53(2):284-302.(in Chinese)
甘丽新,万常选,刘德喜,等.基于句法语义特征的中文实体关系抽取[J].计算机研究与发展,2016,53(2):284-302.
[27]SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:A simple way to prevent neural networks from overfitting[J].Journal of Machine Learning Research,2014,15(1):1929-1958.
[1] 何鸿君 王明昕.
一种简单,高效的电子词典组织策略

计算机科学, 1996, 23(2): 56-57.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!