面向实体标注的军事语料库建设

Abstract

Abstract: The key to build military corpus are the identification and the marking of military corpus.For the entities of military corpus,this paper put forward a set of unified army language part-of-speech tags specification and military corpus annotation specifications,and designed a kind of automatic extension of military corpora based on the military language dictionary entity framework feature extraction.With the help of high precision classifier,the framework selects and extracts the basic features,combined with the typical features of the language set,builds the feature space.Based on the language dictionary correction for military corpora entity recognition,according to the specified annotation standard and specification of morphological marker military annotation corpus entity,the framework builds a large-scale high-quality military corpus.Experiments show that the framework can better complete corpus entity recognition and corpus annotation of the work,to do the construction of military corpus work and to recognize its function and the application prospect of widely in the military.

Key words: Feature extraction, Military corpus, Military entity’s annotation, Military speech tagging

CLC Number:

TP391

ZHOU Bin-bin, ZHANG Hong-jun, ZHANG Rui, FENG Yun-tian, XU You-wei. Construction of Military Corpus for Entity Annotation[J].Computer Science, 2019, 46(6A): 540-546.

References

[1]麻丽莉,王祥兵.军事平行语料库的建立及其在军事翻译方面的应用[J].国防科技,2009,30(1):38-41.
[2]梁晓波,刘伍颖,孟凡礼.信息化条件下的军事语料库应用[J].国防科技,2008(2):51-57.
[3]王红霞,周密.国际化视域下海军军事科技英语的实用性研究[J].中国校外教育旬刊,2014(S1):1103-1104.
[4]向音.军用文书的语篇特征初探[J].办公室业务,2011(10):010.
[5]俞士汶,朱学锋,段慧明.大规模现代汉语标注语料库的加工规范[J].中文信息学报,2000,14(6):58-64.
[6]范云飞.基于POS规则匹配的电子商务网站用户评价信息的分析[D].武汉:武汉理工大学,2015.
[7]XIA F,YETISGEN-YILDIZ M.Clinical corpus annotation: Challenges and strategies[C]∥Proceedings of the 3rd Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012) of the International Conference on Language Resources and Evaluation (LREC).2012:32-39.
[8]SNOW R,O’CONNOR B,JURAFSKY D,et al.Cheap and fast—But is it good? Evaluating non-expert annotations for natural language tasks[C]∥Proceedings of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg.Association for Computational Linguistics,2008:254-263.
[9]ZHOU J,LI B C,CHEN G.Automatically building large-scale named entity recognition corpora from Chinese Wikipedia[J].Frontiers of Information Technology &Electronic Engineering,2015,16(11):940-957.
[10]NADEAU D,SEKINE S.A survey of named entity recognition and classification[J].Lingvisticae Investigations,2007,30(1):3-26.
[11]XIE L,ZHENG Y,LIU Z,et al.Extracting Chinese abbrevia-tion-definition pairs from anchor texts[C]∥International Conference on Machine Learning and Cybernetics.IEEE,2011:1485-1491.
[12]崔世起.中文缩略语自动抽取初探[C]∥全国第八届计算语言学联合学术会议(JSCL-2005).2005:6.
[13]CHANG J S,TENG W L.Mining atomic Chinese abbreviations with a probabilistic single character recovery model[J].Language Resources and Evaluation,2007,40(3-4):367-374.
[14]CHANG J S,LAI Y T.A Preliminary Study on Probabilistic Models for Chinese Abbreviations[C]∥Proceedings of the Third Sighan Workshop on Chinese Language Learning.2004:9-16.

Related Articles 15

[1]	ZHANG Yuan, KANG Le, GONG Zhao-hui, ZHANG Zhi-hong. Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM [J]. Computer Science, 2022, 49(7): 31-39.
[2]	ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[3]	CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[4]	LIU Wei-ye, LU Hui-min, LI Yu-peng, MA Ning. Survey on Finger Vein Recognition Research [J]. Computer Science, 2022, 49(6A): 1-11.
[5]	GAO Yuan-hao, LUO Xiao-qing, ZHANG Zhan-cheng. Infrared and Visible Image Fusion Based on Feature Separation [J]. Computer Science, 2022, 49(5): 58-63.
[6]	ZUO Jie-ge, LIU Xiao-ming, CAI Bing. Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion [J]. Computer Science, 2022, 49(3): 197-203.
[7]	REN Shou-peng, LI Jin, WANG Jing-ru, YUE Kun. Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction [J]. Computer Science, 2022, 49(2): 265-271.
[8]	ZHANG Shi-peng, LI Yong-zhong. Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions [J]. Computer Science, 2021, 48(9): 345-351.
[9]	FENG Xia, HU Zhi-yi, LIU Cai-hua. Survey of Research Progress on Cross-modal Retrieval [J]. Computer Science, 2021, 48(8): 13-23.
[10]	ZHANG Li-qian, LI Meng-hang, GAO Shan-shan, ZHANG Cai-ming. Summary of Computer-assisted Tongue Diagnosis Solutions for Key Problems [J]. Computer Science, 2021, 48(7): 256-269.
[11]	BAO Yu-xuan, LU Tian-liang, DU Yan-hui, SHI Da. Deepfake Videos Detection Method Based on i_ResNet34 Model and Data Augmentation [J]. Computer Science, 2021, 48(7): 77-85.
[12]	CHEN Yang, WANG Jin-liang, XIA Wei, YANG Hao, ZHU Run, XI Xue-feng. Footprint Image Clustering Method Based on Automatic Feature Extraction [J]. Computer Science, 2021, 48(6A): 255-259.
[13]	LI Na-na, WANG Yong, ZHOU Lin, ZOU Chun-ming, TIAN Ying-jie, GUO Nai-wang. DDoS Attack Random Forest Detection Method Based on Secondary Screening of Feature Importance [J]. Computer Science, 2021, 48(6A): 464-467.
[14]	LEI Jian-mei, ZENG Ling-qiu, MU Jie, CHEN Li-dong, WANG Cong, CHAI Yong. Reverse Diagnostic Method Based on Vehicle EMC Standard Test and Machine Learning [J]. Computer Science, 2021, 48(6): 190-195.
[15]	LI Meng-he, XU Hong-ji, SHI Lei-xin, ZHAO Wen-jie, LI Juan. Multi-person Activity Recognition Based on Bone Keypoints Detection [J]. Computer Science, 2021, 48(4): 138-143.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Construction of Military Corpus for Entity Annotation

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0