计算机科学 ›› 2023, Vol. 50 ›› Issue (5): 262-269.doi: 10.11896/jsjkx.220400126
叶瀚, 李欣, 孙海春
YE Han, LI Xin, SUN Haichun
摘要: 实体信息充足与否直接影响着有赖于文本实体信息的相关应用,而常规的实体识别模型仅能对已存在的实体进行识别。文中提出以序列标注任务定义实体缺失检测任务,并提出了相应的3种实体缺失检测模型的训练数据构造方法。根据实体缺失任务的识别特点,提出了融合门控机制的卷积神经网络与预训练语言模型相结合的实体缺失检测方法。通过实验发现,基于预训练语言模型与门控卷积网络的模型对人名类、组织类、地点类实体缺失识别的F1最高分别达80.45%,83.02%和86.75%,显著高于基于LSTM的实体识别模型。通过字频统计发现,运用不同标注方法的数据集所训练的模型的准确率与被标注字符字频存在相关性。
中图分类号:
[1]FAN M,FENG C,GUO L,et al.Product-Aware HelpfulnessPrediction of Online Reviews[C]//The World Wide Web Conference(WWW '19).ACM Press,2019:2715-2721. [2]YANG Y,CHEN C,BAO F S.Aspect-Based Helpfulness Prediction for Online Product Reviews[C]//2016 IEEE 28th International Conference on Tools with Artificial Intelligence(IC-TAI).IEEE,2016:836-843. [3]ALIKANIOTIS D,YANNAKOUDAKIS H,REI M.Automatic Text Scoring Using Neural Networks[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).Association for Computa-tional Linguistics,2016:715-725. [4]TAY Y,PHAN M C,TUAN L A,et al.SkipFlow:Incorporating Neural Coherence Features for End-to-End Automatic Text Scoring[C]//Thirty-Second AAAI Conference on Artificial Intelligence.Association for the Advancement of Artificial Intelligence,2018:5948-5955. [5]SUN F,ZHANG J.Research on Grammar Checking SystemUsing Computer Big Data and Convolutional Neural Network Constructing Classification Model[J].Journal of Physics:Conference Series,2021,1952(4):042097,1-9. [6]HAO S,HAO G.A Research on Online Grammar Checker Sys-tem Based on Neural Network Model[J].Journal of Physics:Conference Series,2020,1651(1):012135,1-8. [7]LAMPLE G,BALLESTEROS M,SUBRAMANIAN S,et al.Neural Architectures for Named Entity Recognition[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Association for Computational Linguistics,2016:260-270. [8]LIU Y,GOPALAKRISHNAN V.An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data[J].Data,2017,2(1):8,1-15. [9]BIESSMANN F,RUKAT T,SCHMIDT P,et al.DataWig:Missing Value Imputation for Tables[J].Journal of Machine Learning Research,2019,20(175):1-6. [10]LI F,GUI Z,WU H,et al.Big enterprise registration data imputation:Supporting spatiotemporal analysis of industries in China[J].Computers,Environment and Urban Systems,2018,70:9-23. [11]GRAVES A,MOHAMED A R,HINTON G.Speech Recogni-tion with Deep Recurrent Neural Networks[C]//2013 IEEE International Conference on Acoustics,Speech and Signal Proces-sing.IEEE,2013:6645-6649. [12]MA X,HOVY E.End-to-end Sequence Labeling via Bi-direc-tional LSTM-CNNs-CRF[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(Vo-lume 1:Long Papers).Association for Computational Linguistics,2016:1064-1074. [13]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long and Short Papers).2019:4171-4186. [14]LI X,ZHANG H,ZHOU X H.Chinese clinical named entityrecognition with variant neural structures based on BERT me-thods[J].Journal of Biomedical Informatics,2020,107:103422,1-7. [15]LIU S,YANG H,LI J,et al.Chinese Named Entity Recognition Method in History and Culture Field Based on BERT[J].International Journal of Computational Intelligence Systems,2021,14(1):163. [16]FU J,LIU P,ZHANG Q.Rethinking Generalization of Neural Models:A Named Entity Recognition Case Study[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2020:7732-7739. [17]AGARWAL O,YANG Y,WALLACE B C,et al.Interpretability Analysis for Named Entity Recognition to Understand System Predictions and How They Can Improve[J].Computational Linguistics,2021,47(1):117-140. [18]CHEN H,HE B.Automated Essay Scoring by Maximizing Human-Machine Agreement[C]//Proceedings of the 2013 Confe-rence on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2013:1741-1752. [19]TAGHIPOUR K,NG H T.A Neural Approach to Automated Essay Scoring[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2016:1882-1891. [20]KUMARASWAMY R,WAZALWAR A,KHOT T,et al.Anomaly Detection in Text:The Value of Domain Knowledge[C]//Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference.Association for the Advancement of Artificial Intelligence,2015:225-228. [21]CICHOSZ P.Unsupervised modeling anomaly detection in discussion forums posts using global vectors for text representation[J].Natural Language Engineering,2020,26(5):551-578. [22]RUFF L,ZEMLYANSKIY Y,VANDERMEULEN R,et al.Self-Attentive,Multi-Context One-Class Classification for Unsupervised Anomaly Detection on Text[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics,2019:4061-4071. [23]RUFF L,KAUFFMANN J R,VANDERMEULEN R A,et al.A Unifying Review of Deep and Shallow Anomaly Detection[J].Proceedings of the IEEE,2021,109(5):756-795. [24]DAUPHIN Y N,FAN A,AULI M,et al.Language Modelingwith Gated Convolutional Networks[C]//Proceedings of the 34th International Conference on Machine Learning.2017:933-941. [25]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[C]//The 31st International Conference on Neural Information Processing Systems.Curran Associates Inc.,2017:6000-6010. [26]LEVOW G A.The Third International Chinese Language Processing Bakeoff:Word Segmentation and Named Entity Recognition[C]//Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing.Association for Computational Linguistics.2006:108-117. [27]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:770-778. [28]KINGMA D P,BA J.Adam:A Method for Stochastic Optimization[C]//3rd International Conference on Learning Representations.2015. |
|