Computer Science ›› 2023, Vol. 50 ›› Issue (11A): 230200083-6.doi: 10.11896/jsjkx.230200083

• Artificial Intelligence • Previous Articles     Next Articles

Audit Text Named Entity Recognition Based on MacBERT and Adversarial Training

QIAN Taiyu, CHEN Yifei, PANG Bowen   

  1. School of Computer Science,Nanjing Audit University,Nanjing 211815,China
  • Published:2023-11-09
  • About author:QIAN Taiyu,born in 1994,postgra-duate,is a member of China Computer Federation.His main research interest is text mining.
    CHEN Yifei,born in 1977,Ph.D,asso-ciate professor.Her main research in-terests include text mining and intelligent information extraction.
  • Supported by:
    Postgraduate Research & Practice Innovation Program of Jiangsu Province(SJCX22_0995).

Abstract: In order to automatically identify the effective entity information from the audit text and improve the efficiency of policy tracking audit,a named entity recognition(NER) of audit text model(Audit-MBCA) based on MacBERT(MLM as correction BERT) and adversarial training is proposed.At present,deep learning has been maturely applied to NER task and achieved signi-ficant results.However,the audit text has some problems such as lacking corpus and unclear entity boundary recognition.To address these problems,the audit text dataset named Audit2022 is constructed in this paper.Its vector representation is obtained by using the MacBERT Chinese pre-training language model.At the same time,adversarial training is introduced and the shared word boundary information of Chinese word segmentation(CWS) task and NER task is used to help identify entity boundaries.Experimental results show that the value of F1 on the Audit2022 dataset from the Audit-MBCA model is 91.05%,which is 4.53% higher than the mainstream model;the value of F1 on the SIGHAN2006 dataset is 93.70%,which is 0.33%~3.25% higher than other models.These verify the effectiveness and generalization ability of the proposed model.

Key words: Audit text, Named entity recognition, MacBERT, Adversarial training

CLC Number: 

  • TP391
[1]ZHANG W,WU Z A.Application of Natural Language Analysis of Unstructured Text Data in Policy Tracking Audit[J].Audit Observation,2022(4):70-75.
[2]CHEN X,OUYANG C,LIU Y,et al.Improving the named entity recognition of Chinese electronic medical records by combining domain dictionary and rules[J].International Journal of Environmental Research and Public Health,2020,17(8):2687-2703.
[3]YU H K,ZHANG H P,LIU Q,et al.Chinese named entityidentification using cascaded hidden Markov model[J].Journal on Communications,2006,27(2):87-94.
[4]ZHANG Y J,XU Z T,XUE X Y.Fusion of Multiple Features for Chinese Named Entity Recognition Based on Maximum Entropy Model[J].Journal of Computer Research and Development,2008,45(6):1004-1010.
[5]TANG B Z,CAO H X,WU Y H,et al.Recognizing clinical entities in hospital discharge summaries using structural support vector machines with word representation features[J].BMC Medical Informatics and Decision Making,2013,13(S1):1-10.
[6]PATIL N,PATIL A,PAWAR B V.Named entity recognitionusing conditional random fields[J].Procedia Computer Science,2020,167:1181-1188.
[7]HAMMERTON J.Named entity recognition with long short-term memory[C]//Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003.2003:172-175.
[8]LAMPLE G,BALLESTEROS M,SUBRA-MANIAN S,et al.Neural architectures for named entity recognition[J].arXiv:1603.01360,2016.
[9]CHI Y N.Research on Question and Answer Technology ofCorporate Financial Audit Based on Deep Learning[D].Harbin:Harbin Engineering University,2018.
[10]CUI Y,CHE W,LIU T,et al.Revisiting pre-trained models for Chinese natural language processing[J].arXiv:2004.13922,2020.
[11]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[12]ZHANG H F,ZENG C,PAN L,News topic text classification method based on BERT and feature projection network[J].Journal of Computer Applications,2022,42(4):1116-1124.
[13]CUI Y,CHE W,LIU T,et al.Pre-training with whole word masking for Chinese bert[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:3504-3514.
[14]JIAO K N,LI X,YE H,et al.Fine-grained entity recognitionbased on MacBERT-BiLSTM-CRF in anti-terrorism field[J].Science Technology and Engineering,2021,21(29):12638-12648.
[15]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative Adversarial Nets[C]// Neural Information Processing Systems.MIT Press,2014:2672-2680.
[16]CAO P,CHEN Y,LIU K,et al.Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:182-192.
[17]ZHANG L L.Research on Identification of the Chinese Named Entity Based on Deep Learning[D].Taiyuan:Taiyuan University of Science and Technology,2021.
[18]LEVOW G A.The third international Chinese language proces-sing bakeoff:Word segmentation and named entity recognition[C]//Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing.2006:108-117.
[19]YIN Z Z,LI X Z,HUANG D G,et al.Chinese Named EntityRecognition Ensembled with Character[J].Journal of Chinese Information Processing,2019,33(11):95-100,106.
[20]JIA Y,XU X.Chinese named entity recognition based on CNN-BiLSTM-CRF[C]//2018 IEEE 9th International Conference on Software Engineering and Service Science(ICSESS).IEEE,2018:1-4.
[21]TAO Y,PENG Y B.Chinese named entity recognition based on Gated-CNN-CRF[J].Electronic Design Engineering,2020,28(4):42-46,51.
[22]ZHANG Y,YANG J.Chinese NER using lattice LSTM[J].arXiv:1805.02023,2018.
[23]XIE B H,ZHANG L L,ZHAO H Y.Chinese Named Entity Revognition Method Based on BERT-DeepCAN-CRF[J].Computer & Digital Engineering,2022,50(12):2720-2726.
[1] LUO Yuanyuan, YANG Chunming, LI Bo, ZHANG Hui, ZHAO Xujian. Chinese Medical Named Entity Recognition Method Incorporating Machine ReadingComprehension [J]. Computer Science, 2023, 50(9): 287-294.
[2] GAO Xiang, WANG Shi, ZHU Junwu, LIANG Mingxuan, LI Yang, JIAO Zhixiang. Overview of Named Entity Recognition Tasks [J]. Computer Science, 2023, 50(6A): 220200119-8.
[3] GAO Xiang, TANG Jiqiang, ZHU Junwu, LIANG Mingxuan, LI Yang. Study on Named Entity Recognition Method Based on Knowledge Graph Enhancement [J]. Computer Science, 2023, 50(6A): 220700153-6.
[4] HUANG Jiange, JIA Zhen, ZHANG Fan, LI Tianrui. Chinese Medical Named Entity Recognition Based on Multi-feature Embedding [J]. Computer Science, 2023, 50(6): 243-250.
[5] LIU Pan, GUO Yanming, LEI Jun, LAO Mingrui, LI Guohui. Study on Chinese Named Entity Extraction Rules Based on Boundary Location and Correction [J]. Computer Science, 2023, 50(3): 276-281.
[6] DING Hongxin, ZOU Peinie, ZHAO Junfeng, WANG Yasha. Active Learning-based Text Entity and Relation Joint Extraction Method [J]. Computer Science, 2023, 50(10): 126-134.
[7] ZHAO Zitian, ZHAN Wenhan, DUAN Hancong, WU Yue. Study on Adversarial Robustness of Deep Learning Models Based on SVD [J]. Computer Science, 2023, 50(10): 362-368.
[8] ZHANG Rujia, DAI Lu, GUO Peng, WANG Bang. Chinese Nested Named Entity Recognition Algorithm Based on Segmentation Attention andBoundary-aware [J]. Computer Science, 2023, 50(1): 213-220.
[9] DU Xiao-ming, YUAN Qing-bo, YANG Fan, YAO Yi, JIANG Xiang. Construction of Named Entity Recognition Corpus in Field of Military Command and Control Support [J]. Computer Science, 2022, 49(6A): 133-139.
[10] XU Guo-ning, CHEN Yi-peng, CHEN Yi-ming, CHEN Jin-yin, WEN Hao. Data Debiasing Method Based on Constrained Optimized Generative Adversarial Networks [J]. Computer Science, 2022, 49(6A): 184-190.
[11] YAN Meng, LIN Ying, NIE Zhi-shen, CAO Yi-fan, PI Huan, ZHANG Lan. Training Method to Improve Robustness of Federated Learning [J]. Computer Science, 2022, 49(6A): 496-501.
[12] WEI Ru-ming, CHEN Ruo-yu, LI Han, LIU Xu-hong. Analysis of Technology Trends Based on Deep Learning and Text Measurement [J]. Computer Science, 2022, 49(11A): 211100119-6.
[13] LIU Kai, ZHANG Hong-jun, CHEN Fei-qiong. Name Entity Recognition for Military Based on Domain Adaptive Embedding [J]. Computer Science, 2022, 49(1): 292-297.
[14] YANG Yang, CHEN Wei, ZHANG Dan-yi, WANG Dan-ni, SONG Shuang. Adversarial Attacks Threatened Network Traffic Classification Based on CNN [J]. Computer Science, 2021, 48(7): 55-61.
[15] WANG Dan-ni, CHEN Wei, YANG Yang, SONG Shuang. Defense Method of Adversarial Training Based on Gaussian Enhancement and Iterative Attack [J]. Computer Science, 2021, 48(6A): 509-513.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!