计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 240300197-8.doi: 10.11896/jsjkx.240300197

• 智能计算 • 上一篇    下一篇

基于样本贡献度对抗迁移的审计领域细粒度实体识别模型

庞博文, 陈一飞, 黄佳   

  1. 南京审计大学计算机学院 南京 211815
  • 出版日期:2024-11-16 发布日期:2024-11-13
  • 通讯作者: 陈一飞(yifeichen91@nau.edu.cn)
  • 作者简介:(xiaopang6831@163.com)

Fine-grained Entity Recognition Model in Audit Domain Based on Adversarial Migration ofSample Contributions

PANG Bowen, CHEN Yifei, HUANG Jia   

  1. School of Computer Science,Nanjing Audit University,Nanjing 211815,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:PANG Bowen,born in 1999,postgra-duate,is a member of CCF(No.T8233G).His main research interest is text mining.
    CHEN Yifei,born in 1977,Ph.D,asso-ciate professor.Her main research interests include text mining and intelligent information extraction.

摘要: 细粒度命名实体识别(Named Entity Recognition,NER)在审计领域扶贫文本中识别实体信息,对优化扶贫政策成效分析与评估至关重要。近年来,深度学习在细粒度NER任务中取得显著成效,但特定领域仍面临语料集匮乏、迁移学习中细粒度特征不兼容性加剧及数据不平衡等问题。针对这些问题,制定了细粒度扶贫审计实体标签体系,并构建了细粒度扶贫审计语料集(FG-PAudit-Corpus)以解决审计领域数据集匮乏的问题。提出了基于样本贡献度对抗迁移的细粒度实体识别模型(FGATSC),该模型做对抗迁移训练,提出将样本贡献度权重纳入迁移特征中以解决细粒度特征的不兼容问题。同时,针对源域高资源与扶贫审计领域低资源样本的不平衡,提出了平衡资源对抗鉴别器(BRAD)以降低这种影响。实验结果表明,FGATSC模型在FG-PAudit-Corpus上F1的值为75.83%,较基线模型提高了9.03%,较其他主流模型提升了4.01%~6.53%;在Resume数据集上进行泛化性验证,F1值较近几年的主流模型提高约0.14%~1.31%,达到了95.77%。综上,验证了FGATSC模型的有效性和泛化性。

关键词: 细粒度实体识别, 扶贫审计, 对抗训练, 样本贡献度, 平衡资源

Abstract: Fine-grained named entity recognition(NER) identifies entity information in pro-poor texts in the auditing domain,which is crucial for optimising the analysis and evaluation of pro-poor policy effectiveness.In recent years,deep learning has achieved significant results in fine-grained NER tasks,but the specific domain still faces problems such as the lack of corpus set,the increasing incompatibility of fine-grained features in transfer learning,and data imbalance.To address these issues,we formulate a fine-grained pro-poor audit entity labelling system and construct a fine-grained pro-poor audit corpus(FG-PAudit-Corpus) to address the scarcity of datasets in the audit domain.A fine-grained entity recognition model(FGATSC) based on sample contribution against migration is proposed,which does the training against migration and proposes to incorporate the sample contribution weights into the migrated features to solve the incompatibility problem of fine-grained features.Meanwhile,for the imbalance between high resources in the source domain and low resource samples in the pro-poor audit domain,balanced resource adversarial discriminator(BRAD) is proposed to reduce this effect.Experimental results show that the F1 value of the FGATSC model on FG-PAudit-Corpus is 75.83%,which is improved by 9.03% compared with the baseline model,and 4.01% to 6.53% compared with the other mainstream models.For the generalisation validation on the Resume dataset,the F1 is improved by about 0.14% to 1.31% compared with the mainstream models in recent years,and reaches 95.77%.In summary,the validity and generali-zability of the FGATSC model are verified.

Key words: Fine-grained entity recognition, Pro-poor auditing, Adversarial training, Sample contribution, Balancing resources

中图分类号: 

  • TP391
[1]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the Annual Conference on Neural Information Processing Systems.2017:6000-6010.
[2]YUAN L C.Joint Method for Chinese Word Segmentation and Part-of-speech Tagging Based on BERT-BiLSTM-CRF[J].Journal of Chinese Computer Systems,2023,44(9):1906-1911.
[3]HUANG Z,XU W,YU K.Bidirectional LSTM-CRF models forsequence tagging[J].arXiv:1508.01991,2015.
[4]LAMPLE G,BALLESTORS M,SUBRAMANIAN S,et al.Neural architectures for named entity recognition[J].arXiv:1603.01360,2016.
[5]JIANG T Q,WAN Z H,ZHANG Q C.Text classification of food safety judgment document based on BiLSTM and self-attention[J].Science Technology and Engineering,2019,19(29):191-195.
[6]LI M,LI Y L,LIN M.Review of Transfer Learning for Named Entity Recognition [J].Journal of Frontiers of Computer Science and Technology,2021,15(2):206-218.
[7]FLEISCHMAN M,HOVY E.Fine grained classification ofnamed entities[C]//Proceedings of the 19th International Conference on Computational Linguistics.2002:1-7.
[8]MAI K,PHAM T H,NGUYEN M T,et al.An empirical study on fine-grained named entity recognition[C]//Proceedings of the 27th International Conference on Computational Linguistics.2018:711-722.
[9]CHIU J,NICHOLS E.Named entity recognition with bidirec-tional lstm-cnns[J].Transactions of the Association for Computational Linguistics,2016:4:357-370.
[10]DOGAN C,DUTRA A,GARA A,et al.Fine-grained named entity recognition using elmo and wikidata[J].arXiv:1904.10503,2019.
[11]PETERS,MATTEHW E,et al.Deep contextualized word representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics.2018:2227-2237.
[12]JIAO K N,LI X,YE H,et al.Fine-grained Entity Recognition Based on MacBERT-BiLSTM-CRF in Anti-terrorism Field [J].Science Technology and Engineering,2021,21(29):12638-12648.
[13]CAO H,XU Y.Fine-grained Named Entity Recognition Based on Words Information[J].Computer Applications and Software,2023,40(3):235-240.
[14]LIAN Y,FENG J C,DING H.Named Entity Recognition InMilitary Technology Field Based On Adversarial Transfer Learning [J].Electronic Design Engineering,2022,30(20):121-127.
[15]QIAN T Y,CHEN Y F,PANG B W.Audit Text Named EntityRecognition Based on MacBERT and Adversarial Training [J].Computer Science,2023,50(S2):93-98.
[16]CAO P,CHEN Y,LIU K,et al.Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:182-192.
[17] ZHOU J T,ZHANG H,JIN D,et al.Dual adversarial neuraltransfer for low- resource named entity recognition[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics.2019:3461-3471.
[18]REIMERS N,GUREVYCH I.Reportingscore distributionsmakes a difference:Performance study of lstm-networks for sequence tagging[C]//EMNLP.2017:338-348.
[19]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE international Conference on Computer Vision,2017:2980-2988.
[20]ZHANG Y,YANG J.Chinese NER using lattice LSTM[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,2018:1554-1564.
[21]GUI T,MA R,ZHANG Q,et al.CNN-based Chinese NER with lexicon rethinking[C]//Proceedings of the 28h lnternational Joint Conference on Artificial Intelligence.2019:4982-4988.
[22]GUO Z Q,GUAN D H,YUAN W W.Word-Character Modelwith Low Lexical Information Loss for Chinese NER [J].Computer Science,2024,51(8):272-280.
[23]YANG S H,LAI P C,FU Y G,et al.Optimization Method of BERT for Chinese Few-shot Named Entity Recognition[J/OL].Journal of Chinese Computer Systems,2024:1-12.http://kns.cnki.net/kcms/detail/21.1106.TP.20240202.0926.002.html.
[24]WU B C,DENG C L,GUAN B,et al.Dynamically Transfer Entity Span Information for Cross-domain Chinese Named Entity Recognition [J].Journal of Software,2022,33(10):3776-3792.
[25]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference Cn Vomputer Vision.2017:2980-2988.
[26]CHENG T,HONG H Y,YANG D S,et al.Chinese named enti-ty recognition model based on mutual learning and SoftLexicon [J].Computer Application,2023,43(S1):61-66.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!