面向医学领域的文本特征增强多任务学习模型

doi:10.11896/jsjkx.240200041

计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 240200041-7.doi: 10.11896/jsjkx.240200041

面向医学领域的文本特征增强多任务学习模型

郭瑞强^1,2,3, 贾晓文¹, 杨世龙¹, 魏谦强¹

1 河北师范大学计算机与网络空间安全学院石家庄 050024
2 河北师范大学河北省供应链大数据分析与数据安全河北省工程研究中心石家庄 050024
3 河北省网络与信息安全重点实验室石家庄 050024

出版日期:2024-11-16 发布日期:2024-11-13
通讯作者: 郭瑞强(rqguo@mail.hebtu.edu.cn)
基金资助:
2023年度河北省引才引智创新平台(606080123003)

Multi-task Learning Model for Text Feature Enhancement in Medical Field

GUO Ruiqiang^1,2,3, JIA Xiaowen¹, YANG Shilong¹, WEI Qianqiang¹

1 School of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,China
2 Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics & Data Security,Hebei Normal University,Shijiazhuang 050024,China
3 Hebei Provincial Key Laboratory of Network and Information Security,Shijiazhuang 050024,China

Online:2024-11-16 Published:2024-11-13
About author:GUO Ruiqiang,born in 1974,Ph.D,professor,master supervisor,is a member of CCF(No.17546M).His main research interests include database system design,data mining,big data proces-sing.
Supported by:
2023 Hebei Province Talent Introduction and Intelligence Innovation Platform(606080123003).

摘要/Abstract

摘要： 医学命名实体的识别和规范化是构建高质量医学知识图谱的基础。文中提出了一种基于文本特征增强的多任务学习模型,旨在解决现有模型中医学实体识别与规范化模型不能充分利用文本特征的问题。该模型添加词级、字符级特征和上下文语义信息来增强文本表示,再通过4个分级子任务,联合建模完成医学实体识别和规范化任务。实验表明,该模型能够学习实体识别和实体规范化这两个任务的共同特征,有效地提高学习的准确率。在NCBI和BC5CDR两个数据集上取得了较好的效果,在NER和NEN任务上的F1值分别为:91.09%,91.02%;92.05%,92%。

关键词: 医疗命名实体识别, 实体规范化, 多任务, 特征增强, 联合建模

Abstract: The recognition and standardization of medical named entities are the foundation for constructing high-quality medical knowledge graphs.This paper proposes a multi-task learning model based on text feature enhancement,aiming to address the issue of inadequate utilization of text features in existing models for medical entity recognition and standardization.The model incorporates word-level,character-level features,and contextual semantic information to enhance text representation.Through four hierarchical sub-tasks,it jointly models medical entity recognition and standardization tasks.Experiments indicate that the proposed model can learn common features for both entity recognition and entity standardization tasks,effectively improving the accuracy of learning.Satisfactory results are achieved on two datasets,NCBI and BC5CDR,with F1 scores for NER and NEN tasks 1.09%,91.02%;92.05%,92%,respectively.

Key words: Medical named entity recognition, Entity normalization, Multitask, Feature enhancement, Joint modeling

中图分类号:

TP391

郭瑞强, 贾晓文, 杨世龙, 魏谦强. 面向医学领域的文本特征增强多任务学习模型[J]. 计算机科学, 2024, 51(11A): 240200041-7. https://doi.org/10.11896/jsjkx.240200041

GUO Ruiqiang, JIA Xiaowen, YANG Shilong, WEI Qianqiang. Multi-task Learning Model for Text Feature Enhancement in Medical Field[J]. Computer Science, 2024, 51(11A): 240200041-7. https://doi.org/10.11896/jsjkx.240200041

参考文献

[1]ZHOU B Z,CAI X R,ZHANG Y,et al.MTAAL:Multi-Task Adversarial Active Learning for Medical Named Entity Recognition and Normalization[C]//Proceedings of the 35th AAAI,California.Palo Alto,AAAI Press,2021:14586-14593.
[2]LEE J,YOON W,KIM S,et al.BioBERT:a pre-trained biomedical language representation model for biomedical text mining[J].Bioinformatics,2020,36(4):1234-1240.
[3]ZHAO P,DOU Q S,TANG H L,et al.Attention AdaptiveModel with Word Information Embeding for Named Entity Re-cognition[J].Computer Engineering and Applications,2023,59(8):167-174.
[4]YANG R Y,HE Q,DU N S.Chinese Named Entity Recognition Based on Gated Multi-Feature Extractors[J].Computer Engineering and Applications,2022,58(8):117-124.
[5]ROBERT L,REZARTA L,ZHIYONG L.Dnorm:disease name normalization with pairwise learning to ran.[J].Bioinform,2013,29(22):2909-2917.
[6]LOWE D M,O'BOYLE N M,ASAYLE R.Leadmine:Diseaseidentification and concept mapping using wikipedia[C]//Proceedings of the Fifth BioCreative Challenge Evaluation Workshop.Seville,CEUR Workshop Proceedings,2015:240-246.
[7]ROBERT L,ZHIYONG L.Taggerone:joint named entity recognition and normalization with semi-markov models.[J].Bioinform,2016,32(18):2839-2846.
[8]SAHU S,ANAND A.Recurrent neural network models for disease name recognition using domain invariant features[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,Berlin,Stroudsburg,Association for Computational Linguistics,2016:2216-2225.
[9]DEVLIN J,MINGWEI C,LEE K,TOUTANOVA K.BERT:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics.Florence,Minneapolis,Association for Computational Linguistics,2019:4171-4186.
[10]XIONG Y,HUANG Y H,CHEN Q C,et al.TANG B Z.A joint model for medical named entity recognition and normalization[C]//Proceedings of the Iberian Languages Evaluation Forum colocated with 36th Conference of the Spanish Society for Natural Language Processing.Málaga,IberLEF@SEPLN,2020:499-504.
[11]ZHOU H,NING S,LIU Z,et al.Knowledge-enhanced biomedi-cal named entity recognition and normalization:application to proteins and genes.[J].Bioinform,2020,21(1):35-50.
[12]LOU Y X,ZHANG Y,QIAN T,et al.A transition-based joint model for disease named entity recognition and normalization.[J].Bioinform,2017,33(15):2363-2371.
[13]EMMA S,PATRICK V,DAVID B,ANDREW M.Fast and accurate entity recognition with iterated dilated convolutions[C]//Proceedings of the 25th Conference on Empirical Methods in Natural Language Processing.Copenhagen,Copenhagen,Asso-ciation for Computational Linguistics,2017:2670-2680.
[14]ZHAO Z H,YANG Z H,LUO L,et al.Disease named entity recognition from biomedical literature using a novel convolu-tional neural network.[J].BMC Medical Genomics,2017,10(5):73-82.
[15]WONJIN Y,CHAN H S,JINHYUK L,et al.Collabonet:colla-boration of deep neural networks for biomedical named entity re-cognition[J].BMC Bioinform,2019,20(10):55-65.
[16]ZHANG S D,LIU T,ZHAO S C,et al.A neural multi-task learning framework to jointly model medical named entity recognition and normalization[C]//Proceedings of the 33th AAAI.Hawaii,Honolulu,AAAI Press,2019:817-824.
[17]ZHOU B H,CAI X R,ZHANG Y,et al.An End-to-End Progressive Multi-Task Learning Framework for Medical Named Entity Recognition and Normalization[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing,Online,Aug 1-8,Florence.Association for Computational Linguistics,2021:6214-6224.
[18]CHEN M X,CHEN Y P,HU Y,et al.Biomedical Named Entity Recognition Method Based on Word Meaning Enhancement[J].Computer Engineering,2023,49(10):305-312.
[19]YU X Q,WANG X,LI Z Q,et al.Biomedical Named Entity Re-cognition Based on Character Level Feature Adaptation[J].Journal of Chinese Computer Systems,2023,44(9):1876-1883.
[20]ZHOU B,CAI X,ZHANG Y,et al.MTAAL:multi-task adversarial active learning for medical named entity recognition and normalization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:14586-14593.
[21]ZHOU P,SHI W,TIAN J,et al.Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,Berlin,Aug 7-12,Stroudsburg.Association for Computational Linguistics,2016:207-212.
[22]REZARTA I D,ROBERT L,LU Z Y.NCBI disease corpus:A resource for disease name recognition and concept normalization.[J].Journal of Biomedical Informatics,2014,47(1):1-10.
[23]JIAO L,SUN Y P,ROBIN J J,et al.Biocreative V CDR task corpus:a resource for chemical disease relation extraction.[J].Database(Oxford),2016,2016(2016):68-78.
[24]DIEDERIK P.KINGMA,JIMMY L B.Adam:A method for stochastic optimization[C]//Proceedings of the 3rd International Conference on Learning Representations,San Diego,May 7-9,Ithaca.Conference Track Proceedings,2015:602-616.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

面向医学领域的文本特征增强多任务学习模型

Multi-task Learning Model for Text Feature Enhancement in Medical Field

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0