Computer Science ›› 2024, Vol. 51 ›› Issue (11A): 240300197-8.doi: 10.11896/jsjkx.240300197

• Intelligent Computing • Previous Articles     Next Articles

Fine-grained Entity Recognition Model in Audit Domain Based on Adversarial Migration ofSample Contributions

PANG Bowen, CHEN Yifei, HUANG Jia   

  1. School of Computer Science,Nanjing Audit University,Nanjing 211815,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:PANG Bowen,born in 1999,postgra-duate,is a member of CCF(No.T8233G).His main research interest is text mining.
    CHEN Yifei,born in 1977,Ph.D,asso-ciate professor.Her main research interests include text mining and intelligent information extraction.

Abstract: Fine-grained named entity recognition(NER) identifies entity information in pro-poor texts in the auditing domain,which is crucial for optimising the analysis and evaluation of pro-poor policy effectiveness.In recent years,deep learning has achieved significant results in fine-grained NER tasks,but the specific domain still faces problems such as the lack of corpus set,the increasing incompatibility of fine-grained features in transfer learning,and data imbalance.To address these issues,we formulate a fine-grained pro-poor audit entity labelling system and construct a fine-grained pro-poor audit corpus(FG-PAudit-Corpus) to address the scarcity of datasets in the audit domain.A fine-grained entity recognition model(FGATSC) based on sample contribution against migration is proposed,which does the training against migration and proposes to incorporate the sample contribution weights into the migrated features to solve the incompatibility problem of fine-grained features.Meanwhile,for the imbalance between high resources in the source domain and low resource samples in the pro-poor audit domain,balanced resource adversarial discriminator(BRAD) is proposed to reduce this effect.Experimental results show that the F1 value of the FGATSC model on FG-PAudit-Corpus is 75.83%,which is improved by 9.03% compared with the baseline model,and 4.01% to 6.53% compared with the other mainstream models.For the generalisation validation on the Resume dataset,the F1 is improved by about 0.14% to 1.31% compared with the mainstream models in recent years,and reaches 95.77%.In summary,the validity and generali-zability of the FGATSC model are verified.

Key words: Fine-grained entity recognition, Pro-poor auditing, Adversarial training, Sample contribution, Balancing resources

CLC Number: 

  • TP391
[1]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the Annual Conference on Neural Information Processing Systems.2017:6000-6010.
[2]YUAN L C.Joint Method for Chinese Word Segmentation and Part-of-speech Tagging Based on BERT-BiLSTM-CRF[J].Journal of Chinese Computer Systems,2023,44(9):1906-1911.
[3]HUANG Z,XU W,YU K.Bidirectional LSTM-CRF models forsequence tagging[J].arXiv:1508.01991,2015.
[4]LAMPLE G,BALLESTORS M,SUBRAMANIAN S,et al.Neural architectures for named entity recognition[J].arXiv:1603.01360,2016.
[5]JIANG T Q,WAN Z H,ZHANG Q C.Text classification of food safety judgment document based on BiLSTM and self-attention[J].Science Technology and Engineering,2019,19(29):191-195.
[6]LI M,LI Y L,LIN M.Review of Transfer Learning for Named Entity Recognition [J].Journal of Frontiers of Computer Science and Technology,2021,15(2):206-218.
[7]FLEISCHMAN M,HOVY E.Fine grained classification ofnamed entities[C]//Proceedings of the 19th International Conference on Computational Linguistics.2002:1-7.
[8]MAI K,PHAM T H,NGUYEN M T,et al.An empirical study on fine-grained named entity recognition[C]//Proceedings of the 27th International Conference on Computational Linguistics.2018:711-722.
[9]CHIU J,NICHOLS E.Named entity recognition with bidirec-tional lstm-cnns[J].Transactions of the Association for Computational Linguistics,2016:4:357-370.
[10]DOGAN C,DUTRA A,GARA A,et al.Fine-grained named entity recognition using elmo and wikidata[J].arXiv:1904.10503,2019.
[11]PETERS,MATTEHW E,et al.Deep contextualized word representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics.2018:2227-2237.
[12]JIAO K N,LI X,YE H,et al.Fine-grained Entity Recognition Based on MacBERT-BiLSTM-CRF in Anti-terrorism Field [J].Science Technology and Engineering,2021,21(29):12638-12648.
[13]CAO H,XU Y.Fine-grained Named Entity Recognition Based on Words Information[J].Computer Applications and Software,2023,40(3):235-240.
[14]LIAN Y,FENG J C,DING H.Named Entity Recognition InMilitary Technology Field Based On Adversarial Transfer Learning [J].Electronic Design Engineering,2022,30(20):121-127.
[15]QIAN T Y,CHEN Y F,PANG B W.Audit Text Named EntityRecognition Based on MacBERT and Adversarial Training [J].Computer Science,2023,50(S2):93-98.
[16]CAO P,CHEN Y,LIU K,et al.Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:182-192.
[17] ZHOU J T,ZHANG H,JIN D,et al.Dual adversarial neuraltransfer for low- resource named entity recognition[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics.2019:3461-3471.
[18]REIMERS N,GUREVYCH I.Reportingscore distributionsmakes a difference:Performance study of lstm-networks for sequence tagging[C]//EMNLP.2017:338-348.
[19]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE international Conference on Computer Vision,2017:2980-2988.
[20]ZHANG Y,YANG J.Chinese NER using lattice LSTM[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,2018:1554-1564.
[21]GUI T,MA R,ZHANG Q,et al.CNN-based Chinese NER with lexicon rethinking[C]//Proceedings of the 28h lnternational Joint Conference on Artificial Intelligence.2019:4982-4988.
[22]GUO Z Q,GUAN D H,YUAN W W.Word-Character Modelwith Low Lexical Information Loss for Chinese NER [J].Computer Science,2024,51(8):272-280.
[23]YANG S H,LAI P C,FU Y G,et al.Optimization Method of BERT for Chinese Few-shot Named Entity Recognition[J/OL].Journal of Chinese Computer Systems,2024:1-12.http://kns.cnki.net/kcms/detail/21.1106.TP.20240202.0926.002.html.
[24]WU B C,DENG C L,GUAN B,et al.Dynamically Transfer Entity Span Information for Cross-domain Chinese Named Entity Recognition [J].Journal of Software,2022,33(10):3776-3792.
[25]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference Cn Vomputer Vision.2017:2980-2988.
[26]CHENG T,HONG H Y,YANG D S,et al.Chinese named enti-ty recognition model based on mutual learning and SoftLexicon [J].Computer Application,2023,43(S1):61-66.
[1] QIAN Taiyu, CHEN Yifei, PANG Bowen. Audit Text Named Entity Recognition Based on MacBERT and Adversarial Training [J]. Computer Science, 2023, 50(11A): 230200083-6.
[2] ZHAO Zitian, ZHAN Wenhan, DUAN Hancong, WU Yue. Study on Adversarial Robustness of Deep Learning Models Based on SVD [J]. Computer Science, 2023, 50(10): 362-368.
[3] XU Guo-ning, CHEN Yi-peng, CHEN Yi-ming, CHEN Jin-yin, WEN Hao. Data Debiasing Method Based on Constrained Optimized Generative Adversarial Networks [J]. Computer Science, 2022, 49(6A): 184-190.
[4] YAN Meng, LIN Ying, NIE Zhi-shen, CAO Yi-fan, PI Huan, ZHANG Lan. Training Method to Improve Robustness of Federated Learning [J]. Computer Science, 2022, 49(6A): 496-501.
[5] YANG Yang, CHEN Wei, ZHANG Dan-yi, WANG Dan-ni, SONG Shuang. Adversarial Attacks Threatened Network Traffic Classification Based on CNN [J]. Computer Science, 2021, 48(7): 55-61.
[6] WANG Dan-ni, CHEN Wei, YANG Yang, SONG Shuang. Defense Method of Adversarial Training Based on Gaussian Enhancement and Iterative Attack [J]. Computer Science, 2021, 48(6A): 509-513.
[7] DONG Zhe, SHAO Ruo-qi, CHEN Yu-liang, ZHAI Wei-feng. Named Entity Recognition in Food Field Based on BERT and Adversarial Training [J]. Computer Science, 2021, 48(5): 247-253.
[8] ZHANG Xiao-hui, YU Shuang-yuan, WANG Quan-xin and XU Bao-min. Text Representation and Classification Algorithm Based on Adversarial Training [J]. Computer Science, 2020, 47(6A): 12-16.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!