Construction of Military Corpus for Entity Annotation

ZHOU Bin-bin, ZHANG Hong-jun, ZHANG Rui, FENG Yun-tian, XU You-wei   

  1. School of Command and Control Engineering,Arm Engineering University of PLA,Nanjing 210007,China
  • Online:2019-06-14 Published:2019-07-02

Abstract: The key to build military corpus are the identification and the marking of military corpus.For the entities of military corpus,this paper put forward a set of unified army language part-of-speech tags specification and military corpus annotation specifications,and designed a kind of automatic extension of military corpora based on the military language dictionary entity framework feature extraction.With the help of high precision classifier,the framework selects and extracts the basic features,combined with the typical features of the language set,builds the feature space.Based on the language dictionary correction for military corpora entity recognition,according to the specified annotation standard and specification of morphological marker military annotation corpus entity,the framework builds a large-scale high-quality military corpus.Experiments show that the framework can better complete corpus entity recognition and corpus annotation of the work,to do the construction of military corpus work and to recognize its function and the application prospect of widely in the military.

Key words: Feature extraction, Military corpus, Military entity’s annotation, Military speech tagging

