计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 240300197-8.doi: 10.11896/jsjkx.240300197
庞博文, 陈一飞, 黄佳
PANG Bowen, CHEN Yifei, HUANG Jia
摘要: 细粒度命名实体识别(Named Entity Recognition,NER)在审计领域扶贫文本中识别实体信息,对优化扶贫政策成效分析与评估至关重要。近年来,深度学习在细粒度NER任务中取得显著成效,但特定领域仍面临语料集匮乏、迁移学习中细粒度特征不兼容性加剧及数据不平衡等问题。针对这些问题,制定了细粒度扶贫审计实体标签体系,并构建了细粒度扶贫审计语料集(FG-PAudit-Corpus)以解决审计领域数据集匮乏的问题。提出了基于样本贡献度对抗迁移的细粒度实体识别模型(FGATSC),该模型做对抗迁移训练,提出将样本贡献度权重纳入迁移特征中以解决细粒度特征的不兼容问题。同时,针对源域高资源与扶贫审计领域低资源样本的不平衡,提出了平衡资源对抗鉴别器(BRAD)以降低这种影响。实验结果表明,FGATSC模型在FG-PAudit-Corpus上F1的值为75.83%,较基线模型提高了9.03%,较其他主流模型提升了4.01%~6.53%;在Resume数据集上进行泛化性验证,F1值较近几年的主流模型提高约0.14%~1.31%,达到了95.77%。综上,验证了FGATSC模型的有效性和泛化性。
中图分类号:
[1]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the Annual Conference on Neural Information Processing Systems.2017:6000-6010. [2]YUAN L C.Joint Method for Chinese Word Segmentation and Part-of-speech Tagging Based on BERT-BiLSTM-CRF[J].Journal of Chinese Computer Systems,2023,44(9):1906-1911. [3]HUANG Z,XU W,YU K.Bidirectional LSTM-CRF models forsequence tagging[J].arXiv:1508.01991,2015. [4]LAMPLE G,BALLESTORS M,SUBRAMANIAN S,et al.Neural architectures for named entity recognition[J].arXiv:1603.01360,2016. [5]JIANG T Q,WAN Z H,ZHANG Q C.Text classification of food safety judgment document based on BiLSTM and self-attention[J].Science Technology and Engineering,2019,19(29):191-195. [6]LI M,LI Y L,LIN M.Review of Transfer Learning for Named Entity Recognition [J].Journal of Frontiers of Computer Science and Technology,2021,15(2):206-218. [7]FLEISCHMAN M,HOVY E.Fine grained classification ofnamed entities[C]//Proceedings of the 19th International Conference on Computational Linguistics.2002:1-7. [8]MAI K,PHAM T H,NGUYEN M T,et al.An empirical study on fine-grained named entity recognition[C]//Proceedings of the 27th International Conference on Computational Linguistics.2018:711-722. [9]CHIU J,NICHOLS E.Named entity recognition with bidirec-tional lstm-cnns[J].Transactions of the Association for Computational Linguistics,2016:4:357-370. [10]DOGAN C,DUTRA A,GARA A,et al.Fine-grained named entity recognition using elmo and wikidata[J].arXiv:1904.10503,2019. [11]PETERS,MATTEHW E,et al.Deep contextualized word representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics.2018:2227-2237. [12]JIAO K N,LI X,YE H,et al.Fine-grained Entity Recognition Based on MacBERT-BiLSTM-CRF in Anti-terrorism Field [J].Science Technology and Engineering,2021,21(29):12638-12648. [13]CAO H,XU Y.Fine-grained Named Entity Recognition Based on Words Information[J].Computer Applications and Software,2023,40(3):235-240. [14]LIAN Y,FENG J C,DING H.Named Entity Recognition InMilitary Technology Field Based On Adversarial Transfer Learning [J].Electronic Design Engineering,2022,30(20):121-127. [15]QIAN T Y,CHEN Y F,PANG B W.Audit Text Named EntityRecognition Based on MacBERT and Adversarial Training [J].Computer Science,2023,50(S2):93-98. [16]CAO P,CHEN Y,LIU K,et al.Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:182-192. [17] ZHOU J T,ZHANG H,JIN D,et al.Dual adversarial neuraltransfer for low- resource named entity recognition[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics.2019:3461-3471. [18]REIMERS N,GUREVYCH I.Reportingscore distributionsmakes a difference:Performance study of lstm-networks for sequence tagging[C]//EMNLP.2017:338-348. [19]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE international Conference on Computer Vision,2017:2980-2988. [20]ZHANG Y,YANG J.Chinese NER using lattice LSTM[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,2018:1554-1564. [21]GUI T,MA R,ZHANG Q,et al.CNN-based Chinese NER with lexicon rethinking[C]//Proceedings of the 28h lnternational Joint Conference on Artificial Intelligence.2019:4982-4988. [22]GUO Z Q,GUAN D H,YUAN W W.Word-Character Modelwith Low Lexical Information Loss for Chinese NER [J].Computer Science,2024,51(8):272-280. [23]YANG S H,LAI P C,FU Y G,et al.Optimization Method of BERT for Chinese Few-shot Named Entity Recognition[J/OL].Journal of Chinese Computer Systems,2024:1-12.http://kns.cnki.net/kcms/detail/21.1106.TP.20240202.0926.002.html. [24]WU B C,DENG C L,GUAN B,et al.Dynamically Transfer Entity Span Information for Cross-domain Chinese Named Entity Recognition [J].Journal of Software,2022,33(10):3776-3792. [25]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference Cn Vomputer Vision.2017:2980-2988. [26]CHENG T,HONG H Y,YANG D S,et al.Chinese named enti-ty recognition model based on mutual learning and SoftLexicon [J].Computer Application,2023,43(S1):61-66. |
|