计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 230200083-6.doi: 10.11896/jsjkx.230200083
钱泰羽, 陈一飞, 庞博文
QIAN Taiyu, CHEN Yifei, PANG Bowen
摘要: 为了从审计文本中自动识别有效的实体信息,提高政策跟踪审计的效率,提出一种基于MacBERT(MLM as correction BERT)和对抗训练的审计文本命名实体识别(Named Entity Recognition,NER)模型(Audit-MBCA)。目前深度学习在NER任务上应用成熟且成果显著,但审计文本存在语料库缺乏、实体边界识别不清晰等问题。针对这些问题,文中构建了审计文本数据集并将其命名为Audit 2022,使用MacBERT中文预训练语言模型获得其向量表示,同时引入对抗训练,利用中文分词(Chinese Word Segmentation,CWS)任务与NER任务的共享词边界信息帮助进行实体边界识别。实验结果表明,Audit-MBCA模型在Audit 2022数据集上的F1值为91.05%,较主流模型提升了4.53%;在SIGHAN 2006数据集上的F1值为93.70%,较其他模型提升了0.33%~3.25%,验证了所提模型的有效性和泛化能力。
中图分类号:
[1]ZHANG W,WU Z A.Application of Natural Language Analysis of Unstructured Text Data in Policy Tracking Audit[J].Audit Observation,2022(4):70-75. [2]CHEN X,OUYANG C,LIU Y,et al.Improving the named entity recognition of Chinese electronic medical records by combining domain dictionary and rules[J].International Journal of Environmental Research and Public Health,2020,17(8):2687-2703. [3]YU H K,ZHANG H P,LIU Q,et al.Chinese named entityidentification using cascaded hidden Markov model[J].Journal on Communications,2006,27(2):87-94. [4]ZHANG Y J,XU Z T,XUE X Y.Fusion of Multiple Features for Chinese Named Entity Recognition Based on Maximum Entropy Model[J].Journal of Computer Research and Development,2008,45(6):1004-1010. [5]TANG B Z,CAO H X,WU Y H,et al.Recognizing clinical entities in hospital discharge summaries using structural support vector machines with word representation features[J].BMC Medical Informatics and Decision Making,2013,13(S1):1-10. [6]PATIL N,PATIL A,PAWAR B V.Named entity recognitionusing conditional random fields[J].Procedia Computer Science,2020,167:1181-1188. [7]HAMMERTON J.Named entity recognition with long short-term memory[C]//Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003.2003:172-175. [8]LAMPLE G,BALLESTEROS M,SUBRA-MANIAN S,et al.Neural architectures for named entity recognition[J].arXiv:1603.01360,2016. [9]CHI Y N.Research on Question and Answer Technology ofCorporate Financial Audit Based on Deep Learning[D].Harbin:Harbin Engineering University,2018. [10]CUI Y,CHE W,LIU T,et al.Revisiting pre-trained models for Chinese natural language processing[J].arXiv:2004.13922,2020. [11]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [12]ZHANG H F,ZENG C,PAN L,News topic text classification method based on BERT and feature projection network[J].Journal of Computer Applications,2022,42(4):1116-1124. [13]CUI Y,CHE W,LIU T,et al.Pre-training with whole word masking for Chinese bert[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:3504-3514. [14]JIAO K N,LI X,YE H,et al.Fine-grained entity recognitionbased on MacBERT-BiLSTM-CRF in anti-terrorism field[J].Science Technology and Engineering,2021,21(29):12638-12648. [15]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative Adversarial Nets[C]// Neural Information Processing Systems.MIT Press,2014:2672-2680. [16]CAO P,CHEN Y,LIU K,et al.Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:182-192. [17]ZHANG L L.Research on Identification of the Chinese Named Entity Based on Deep Learning[D].Taiyuan:Taiyuan University of Science and Technology,2021. [18]LEVOW G A.The third international Chinese language proces-sing bakeoff:Word segmentation and named entity recognition[C]//Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing.2006:108-117. [19]YIN Z Z,LI X Z,HUANG D G,et al.Chinese Named EntityRecognition Ensembled with Character[J].Journal of Chinese Information Processing,2019,33(11):95-100,106. [20]JIA Y,XU X.Chinese named entity recognition based on CNN-BiLSTM-CRF[C]//2018 IEEE 9th International Conference on Software Engineering and Service Science(ICSESS).IEEE,2018:1-4. [21]TAO Y,PENG Y B.Chinese named entity recognition based on Gated-CNN-CRF[J].Electronic Design Engineering,2020,28(4):42-46,51. [22]ZHANG Y,YANG J.Chinese NER using lattice LSTM[J].arXiv:1805.02023,2018. [23]XIE B H,ZHANG L L,ZHAO H Y.Chinese Named Entity Revognition Method Based on BERT-DeepCAN-CRF[J].Computer & Digital Engineering,2022,50(12):2720-2726. |
|