计算机科学 ›› 2021, Vol. 48 ›› Issue (4): 237-242.doi: 10.11896/jsjkx.200100036
周晓进1, 徐陈铭2, 阮彤1
ZHOU Xiao-jin1, XU Chen-ming2, RUAN Tong1
摘要: 在现有的面向中文临床电子病历的命名实体识别任务中,实体标注粒度通常过细或过粗,过细的标注结果难以找到实际应用场景,而过粗的标注结果通常需要在进行复杂的处理后,才能明确实体的规范形式和语义类型,以便于后续的数据挖掘应用。为简化处理步骤,根据常见的7类粗粒度临床实体的特点,定义了用以解释粗粒度实体的9类细粒度解析实体。同时,针对多粒度实体的特点,提出了基于多任务学习和自注意力机制的多粒度临床实体识别模型,并在真实的医院电子病历库中标注了5 000条包含多粒度实体的文本以验证模型的效果。实验结果表明,该模型优于主流的序列标注模型,在粗、细粒度实体识别任务中,两者的F1值分别达到了92.88和85.48。
中图分类号:
[1]HE B,DONG B,GUANY,et al.Building a comprehensive syntactic and semantic corpus of Chinese clinical texts[J].Journal of Biomedical Informatics,2017,69:203-217. [2]FUKUDA K,TSUNODA T,TAMURA A,et al.Toward information extraction:identifying protein names from biological papers[C]//Pac Sympbiocomput.1998:707-718. [3]FRIEDMAN C,ALDERSON P O,AUSTIN J H M,et al.Ageneral natural-language text processor for clinical radiology[J].Journal of the American Medical Informatics Association,1994,1(2):161-174. [4]SONG M,YU H,HANW S.Developing a hybrid dictionary-based bio-entity recognition technique[J].BMC Medical Informatics and Decision Making,2015,15(1):S9. [5]ZHAO S.Named entity recognition in biomedical texts using an HMM model[C]//Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications.Association for Computational Linguistics,2004:84-87. [6]FINKEL J R,DINGARE S,NGUYEN H,et al.Exploiting context for biomedical entity recognition:from syntax to the web[C]//Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications(NLPBA/BioNLP).2004:91-94. [7]SETTLES B.Biomedical named entity recognition using condi-tional random fields and rich feature sets[C]//Proceedings of the International Joint Workshop on Natural Language Proces-sing in Biomedicine and its Applications(NLPBA/BioNLP).2004:107-110. [8]HUANG Z,XU W,YU K.Bidirectional LSTM-CRF models for sequence tagging[J].arXiv:1508.01991,2015. [9]GRIDACH M.Character-level neural network for biomedicalnamed entity recognition[J].Journal of Biomedical Informatics,2017,70:85-91. [10]DANG T H,LE H Q,NGUYEN T M,et al.D3NER:biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information[J].Bioinformatics,2018,34(20):3539-3548. [11]LIU J,CHEN S,HE Z,et al.Learning BLSTM-CRF with Multi-channel Attribute Embedding for Medical Information Extraction[C]//CCF International Conference on Natural Language Processing and Chinese Computing.Springer,Cham,2018:196-208. [12]GIORGI J M,BADER G D.Transfer learning for biomedicalnamed entity recognition with neural networks[J].Bioinforma-tics,2018,34(23):4087-4094. [13]QIU J,WANG Q,ZHOU Y,et al.Fast and Accurate Recognition of Chinese Clinical Named Entities with Residual Dilated Convolutions[C]//2018 IEEE International Conference on Bioinformatics and Biomedicine(BIBM).IEEE,2018:935-942. [14]WANG Q,ZHOU Y,RUAN T,et al.Incorporating dictionaries into deep neural networks for the chinese clinical named entity recognition[J].Journal of Biomedical Informatics,2019,92:103-133. [15]LUONG M T,LE Q V,SUTSKEVER I,et al.Multi-task se-quence to sequence learning[J].arXiv:1511.06114,2015. [16]ZENG L,GAO D Q,RUAN T,et al.Analysis and marking of symptom composition based on CRF[J].Journal of East China University of Science and Technology(Natural Science Edition),2018(2):277-282. [17]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013. [18]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed rep-resentations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems.2013:3111-3119. [19]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008. [20]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [21]MA X,HOVY E.End-to-end sequence labeling via bi-directional lstm-cnns-crf[J].arXiv:1603.01354,2016. [22]ZHENG G,MUKHERJEE S,DONG X L,et al.OpenTag:Open attribute value extraction from product profiles[C]//Procee-dings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.ACM,2018:1049-1058. |
[1] | 袁昊男, 王瑞锦, 郑博文, 吴邦彦. 基于Fabric的电子病历跨链可信共享系统设计与实现 Design and Implementation of Cross-chain Trusted EMR Sharing System Based on Fabric 计算机科学, 2022, 49(6A): 490-495. https://doi.org/10.11896/jsjkx.210500063 |
[2] | 于家畦, 康晓东, 白程程, 刘汉卿. 一种新的中文电子病历文本检索模型 New Text Retrieval Model of Chinese Electronic Medical Records 计算机科学, 2022, 49(6A): 32-38. https://doi.org/10.11896/jsjkx.210400198 |
[3] | 杜丽君, 唐玺璐, 周娇, 陈玉兰, 程建. 基于注意力机制和多任务学习的阿尔茨海默症分类 Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning 计算机科学, 2022, 49(6A): 60-65. https://doi.org/10.11896/jsjkx.201200072 |
[4] | 赵凯, 安卫超, 张晓宇, 王彬, 张杉, 相洁. 共享浅层参数多任务学习的脑出血图像分割与分类 Intracerebral Hemorrhage Image Segmentation and Classification Based on Multi-taskLearning of Shared Shallow Parameters 计算机科学, 2022, 49(4): 203-208. https://doi.org/10.11896/jsjkx.201000153 |
[5] | 杨晓宇, 殷康宁, 候少麒, 杜文仪, 殷光强. 基于特征定位与融合的行人重识别算法 Person Re-identification Based on Feature Location and Fusion 计算机科学, 2022, 49(3): 170-178. https://doi.org/10.11896/jsjkx.210100132 |
[6] | 范红杰, 李雪冬, 叶松涛. 面向电子病历语义解析的疾病辅助诊断方法 Aided Disease Diagnosis Method for EMR Semantic Analysis 计算机科学, 2022, 49(1): 153-158. https://doi.org/10.11896/jsjkx.201100125 |
[7] | 周艺华, 贾玉欣, 贾立圆, 方嘉博, 侍伟敏. 基于红黑树的共享电子病历数据完整性验证方案 Data Integrity Verification Scheme of Shared EMR Based on Red Black Tree 计算机科学, 2021, 48(9): 330-336. https://doi.org/10.11896/jsjkx.200600139 |
[8] | 宋龙泽, 万怀宇, 郭晟楠, 林友芳. 面向出租车空载时间预测的多任务时空图卷积网络 Multi-task Spatial-Temporal Graph Convolutional Network for Taxi Idle Time Prediction 计算机科学, 2021, 48(7): 112-117. https://doi.org/10.11896/jsjkx.201000089 |
[9] | 郭文, 尹童灵, 张天柱, 徐常胜. 时间一致性保持的多任务稀疏深度表达视觉跟踪 Temporal Consistency Preserving Multi-Mask Sparse Deep Representation for Visual Tracking 计算机科学, 2021, 48(6): 110-117. https://doi.org/10.11896/jsjkx.200800212 |
[10] | 刘小龙, 韩芳, 王直杰. 基于知识表示的联合问答模型 Joint Question Answering Model Based on Knowledge Representation 计算机科学, 2021, 48(6): 241-245. https://doi.org/10.11896/jsjkx.200600011 |
[11] | 张春云, 曲浩, 崔超然, 孙皓亮, 尹义龙. 基于过程监督的序列多任务法律判决预测方法 Process Supervision Based Sequence Multi-task Method for Legal Judgement Prediction 计算机科学, 2021, 48(3): 227-232. https://doi.org/10.11896/jsjkx.200700056 |
[12] | 余杰, 纪斌, 刘磊, 李莎莎, 马俊, 刘慧君. 面向中文医疗事件的联合抽取方法 Joint Extraction Method for Chinese Medical Events 计算机科学, 2021, 48(11): 287-293. https://doi.org/10.11896/jsjkx.201200016 |
[13] | 王体爽, 李培峰, 朱巧明. 基于数据增强的中文隐式篇章关系识别方法 Chinese Implicit Discourse Relation Recognition Based on Data Augmentation 计算机科学, 2021, 48(10): 85-90. https://doi.org/10.11896/jsjkx.200800115 |
[14] | 潘祖江, 刘宁, 张伟, 王建勇. 基于层次注意力机制的多任务疾病进展模型 MTHAM:Multitask Disease Progression Modeling Based on Hierarchical Attention Mechanism 计算机科学, 2020, 47(9): 185-189. https://doi.org/10.11896/jsjkx.190900001 |
[15] | 周子钦, 严华. 基于多任务学习的有限样本多视角三维形状识别算法 3D Shape Recognition Based on Multi-task Learning with Limited Multi-view Data 计算机科学, 2020, 47(4): 125-130. https://doi.org/10.11896/jsjkx.190700163 |
|