计算机科学 ›› 2020, Vol. 47 ›› Issue (3): 211-216.doi: 10.11896/jsjkx.190200259
唐国强,高大启,阮彤,叶琪,王祺
TANG Guo-qiang,GAO Da-qi,RUAN Tong,YE Qi,WANG Qi
摘要: 临床电子病历命名实体识别(Clinical Named Entity Recognition,CNER)的主要任务是对给定的一组电子病历文档进行识别并抽取出与医学临床相关的命名实体,然后将它们归类到预先定义好的类别中,如疾病、症状、检查等实体。命名实体识别任务通常被看作一个序列标注问题。目前,深度学习方法已经被广泛应用于该任务并取得了非常好的效果。但其中大部分方法未能有效利用大量的未标注数据;并且目前使用的特征相对简单,未能深入捕捉病历文本自身的特征。针对这两个问题,文中提出一种融入语言模型和注意力机制的深度学习方法。该方法首先从未标注的临床医疗数据中训练字符向量和语言模型,然后利用标注数据来训练标注模型。具体地,将句子的向量表示送入一个双向门控循环网络(Bidirectional Gated Recurrent Units,BiGRU)和预训练好的语言模型,并将两部分的输出进行拼接。之后,将前一层的拼接向量输入另一个BiGRU和多头注意力(Multi-head Attention)模块。最后,将BiGRU和多头注意力模块的输出进行拼接并输入条件随机场(Conditional Randoin Field,CRF),预测全局最优的标签序列。通过利用语言模型特征和多头注意力机制,该方法在CCKS-2017 Shared Task2标准数据集上取得了良好的结果(F1值为91.34%)。
中图分类号:
[1]电子病历基本规范(试行)[J].中国社区医学,2010(1):13-14. [2]GRIDACH M.Character-level neural network for biomedical named entity recognition[J].Journal of Biomedical Informatics,2017,70:85-91. [3]HABIBI M,WEBER L,NEVES M,et al.Deep learning with word embeddings improves biomedical named entity recognition[J].Bioinformatics,2017,33(14):i37-i48. [4]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]∥Advances in Neural Information Processing Systems.2017:5998-6008. [5]FRIEDMAN C,ALDERSON P O,AUSTIN J H,et al.A general natural-language text processor for clinical radiology[J].J Am Med Inform Assoc,1994,1(2):161-174. [6]ZENG Q T,GORYACHEV S,WEISS S,et al.Extracting principal diagnosis,co-morbidity and smoking status for asthma research:evaluation of a natural language processing system[J].BMC medical Informatics and Decision Making,2006,6(1):30. [7]SAVOVA G K,MASANZ J J,OGREN P V,et al.Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES):architecture,component evaluation and applications[J].Journal of the American Medical Informatics Association Jamia,2010,17(5):507. [8]RINDFLESCH T C,TANABE L,WEINSTEIN J N,et al.EDGAR:Extraction of Drugs,Genes And Relations from the Biomedical Literature[M]∥Biocomputing 2000.2014. [9]SONG M,YU H,HAN W S.Developing a hybrid dictionary- based bio-entity recognition technique[J].BMC Medical Informatics and Decision Making,2015,15(1):S9. [10]LEI J,TANG B,LU X,et al.A comprehensive study of named entity recognition in Chinese clinical text[J].Journal of the American Medical Informatics Association,2014,21(5):808-814. [11]SETTLES B.Biomedical named entity recognition using conditional random fields and rich feature sets[C]∥Proceedings of the International Joint Workshop on Natural Language Proces-sing in Biomedicine and Its Applications.Association for Computational Linguistics,2004:104-107. [12]SKEPPSTEDT M,KVIST M,NILSSON G H,et al.Automatic recognition of disorders,findings,pharmaceuticals and body structures from clinical text:An annotation and machine lear-ning study[J].Journal of Biomedical Informatics,2014,49:148-158. |
[1] | 高捷, 刘沙, 黄则强, 郑天宇, 刘鑫, 漆锋滨. 基于国产众核处理器的深度神经网络算子加速库优化 Deep Neural Network Operator Acceleration Library Optimization Based on Domestic Many-core Processor 计算机科学, 2022, 49(5): 355-362. https://doi.org/10.11896/jsjkx.210500226 |
[2] | 焦翔, 魏祥麟, 薛羽, 王超, 段强. 基于深度学习的自动调制识别研究 Automatic Modulation Recognition Based on Deep Learning 计算机科学, 2022, 49(5): 266-278. https://doi.org/10.11896/jsjkx.211000085 |
[3] | 肖丁, 张玙璠, 纪厚业. 基于多头注意力机制的用户窃电行为检测 Electricity Theft Detection Based on Multi-head Attention Mechanism 计算机科学, 2022, 49(1): 140-145. https://doi.org/10.11896/jsjkx.210100177 |
[4] | 范红杰, 李雪冬, 叶松涛. 面向电子病历语义解析的疾病辅助诊断方法 Aided Disease Diagnosis Method for EMR Semantic Analysis 计算机科学, 2022, 49(1): 153-158. https://doi.org/10.11896/jsjkx.201100125 |
[5] | 潘芳, 张会兵, 董俊超, 首照宇. 基于高效Transformer的中文在线课程评论方面情感分析 Aspect Sentiment Analysis of Chinese Online Course Review Based on Efficient Transformer 计算机科学, 2021, 48(6A): 264-269. https://doi.org/10.11896/jsjkx.200800116 |
[6] | 周欣, 刘硕迪, 潘薇, 陈媛媛. 自然交通场景中的车辆颜色识别 Vehicle Color Recognition in Natural Traffic Scene 计算机科学, 2021, 48(6A): 15-20. https://doi.org/10.11896/jsjkx.200800078 |
[7] | 丁玲, 向阳. 基于分层次多粒度语义融合的中文事件检测 Chinese Event Detection with Hierarchical and Multi-granularity Semantic Fusion 计算机科学, 2021, 48(5): 202-208. https://doi.org/10.11896/jsjkx.200800038 |
[8] | 刘东, 王叶斐, 林建平, 马海川, 杨闰宇. 端到端优化的图像压缩技术进展 Advances in End-to-End Optimized Image Compression Technologies 计算机科学, 2021, 48(3): 1-8. https://doi.org/10.11896/jsjkx.201100134 |
[9] | 张栋, 陈文亮. 基于上下文相关字向量的中文命名实体识别 Chinese Named Entity Recognition Based on Contextualized Char Embeddings 计算机科学, 2021, 48(3): 233-238. https://doi.org/10.11896/jsjkx.191200074 |
[10] | 马琳, 王云霄, 赵丽娜, 韩兴旺, 倪金超, 张婕. 基于多模型判别的网络入侵检测系统 Network Intrusion Detection System Based on Multi-model Ensemble 计算机科学, 2021, 48(11A): 592-596. https://doi.org/10.11896/jsjkx.201100170 |
[11] | 潘雨, 邹军华, 王帅辉, 胡谷雨, 潘志松. 基于网络表示学习的深度社团发现方法 Deep Community Detection Algorithm Based on Network Representation Learning 计算机科学, 2021, 48(11A): 198-203. https://doi.org/10.11896/jsjkx.210200113 |
[12] | 邹傲, 郝文宁, 靳大尉, 陈刚, 田媛. 基于预训练和深度哈希的大规模文本检索研究 Study on Text Retrieval Based on Pre-training and Deep Hash 计算机科学, 2021, 48(11): 300-306. https://doi.org/10.11896/jsjkx.210300266 |
[13] | 刘天星, 李伟, 许铮, 张立华, 戚骁亚, 甘中学. 面向高维连续行动空间的蒙特卡罗树搜索算法 Monte Carlo Tree Search for High-dimensional Continuous Control Space 计算机科学, 2021, 48(10): 30-36. https://doi.org/10.11896/jsjkx.201000129 |
[14] | 王瑞平, 贾真, 刘畅, 陈泽威, 李天瑞. 基于DeepFM的深度兴趣因子分解机网络 Deep Interest Factorization Machine Network Based on DeepFM 计算机科学, 2021, 48(1): 226-232. https://doi.org/10.11896/jsjkx.191200098 |
[15] | 张艳梅, 楼胤成. 基于深度神经网络的庞氏骗局合约检测方法 Deep Neural Network Based Ponzi Scheme Contract Detection Method 计算机科学, 2021, 48(1): 273-279. https://doi.org/10.11896/jsjkx.191100020 |
|