计算机科学 ›› 2021, Vol. 48 ›› Issue (11): 287-293.doi: 10.11896/jsjkx.201200016

• 人工智能 • 上一篇    下一篇

面向中文医疗事件的联合抽取方法

余杰1, 纪斌1, 刘磊2, 李莎莎1, 马俊1, 刘慧君1   

  1. 1 国防科技大学计算机学院 长沙410073
    2 军事科学院后勤科学与技术研究所 北京100091
  • 收稿日期:2020-12-02 修回日期:2021-03-11 出版日期:2021-11-15 发布日期:2021-11-10
  • 通讯作者: 刘慧君(lhj12uestc@163.com)
  • 作者简介:yj@nudt.edu.cn
  • 基金资助:
    国家自然科学基金(61532001)

Joint Extraction Method for Chinese Medical Events

YU Jie1, JI Bin1, LIU Lei2, LI Sha-sha1, MA Jun1, LIU Hui-jun1   

  1. 1 College of Computer,National University of Defense Technology,Changsha 410073,China
    2 Institute of Logistics Science and Technology,Academy of Military Sciences,Beijing 100091,China
  • Received:2020-12-02 Revised:2021-03-11 Online:2021-11-15 Published:2021-11-10
  • About author:YU Jie,born in 1982,Ph.D,research fellow,master supervisor,is a member of China Computer Federation.His main research interests include operating system,artificial intelligence and natural language processing.
    LIU Hui-jun,born in 1993,Ph.D.Her main research interests include natural language processing and text against attack and defense.
  • Supported by:
    National Natural Science Foundation of China(61532001).

摘要: 临床病历电子化的推广普及使得利用自动化的方法从病历中快速抽取高价值的信息成为可能。作为一种重要的医学信息,肿瘤医疗事件由描述恶性肿瘤的一系列属性构成。近年来,肿瘤医疗事件抽取已成为学术界的一个研究热点,众多学术会议将其发布为评测任务,并提供了一系列高质量的标注数据。针对肿瘤医疗事件属性离散的特点,文中提出了一种中文医疗事件的联合抽取方法,实现了肿瘤原发部位和原发肿瘤大小两种属性的联合抽取和肿瘤转移部位的抽取。此外,针对肿瘤医疗事件标注文本的数量和类型少的问题,提出了一种基于关键信息全域随机替换的伪数据生成算法,提升了联合抽取方法对不同类型肿瘤医疗事件抽取的迁移学习能力。所提方法获得了CCKS2020中文电子病历临床医疗事件抽取评测任务的第三名,在CCKS2019和CCKS2020数据集上的大量实验验证了所提方法的有效性。

关键词: 联合抽取, 迁移学习, 医疗事件抽取, 中文电子病历, 肿瘤事件

Abstract: The popularization of electronic clinical medical records (EMRs) makes it possible to use automated ways to quickly extract high-value information from EMRs.As a kind of crucial medical information,tumor medical event is typically composed of a series of attributes describing malignant tumors.Recently,tumor medical event extraction has become a research hotspot in the academic community,and many influential academic conferences publish it as an evaluation task and provide a series of high-quality manually annotated data.Aiming at the discrete characteristic of tumor event attributes,this paper proposes a joint extraction method,which realizes the joint extraction of tumor primary site and primary tumor size and also the extraction of tumor metastasis sites.In addition,aiming to alleviate the small counts and types of annotated tumor medical texts,this paper proposes a pseudo-data generation algorithm based on the global random replacement of key information,which improves the transfer learning ability of the joint extraction method for different types of tumor events.The proposed method wins the third place in the clinical medical event extraction evaluation task of CCKS2020,and extensive experiments on CCKS2019 and CCKS2020 datasets verify the effectiveness of the proposed method.

Key words: Chinese electronic medical record, Joint extraction, Medical event extraction, Transfer learning, Tumor event

中图分类号: 

  • TP391
[1]TANG B Z,WANG X L,YAN J,et al.Entity recognition inChinese clinical text using attention-based CNN-LSTM-CRF[J].BMC Medical Informatics and Decision Making,2019,19(S3):74.
[2]Extraction of clinical medical entities and attributes from Chinese electronic medical records [EB/OL].[2020-11-28].http://icrc.hitsz.edu.cn/chip2018/task.html.
[3]Named entity recognition for Chinese electronic medical records [EB/OL].[2020-11-28].http://www.ccks2019.cn/?page_id=62.
[4]Medical entity and event extraction for Chinese electronic medical records[EB/OL].[2020-11-28].http://sigkg.cn/ccks2020/?page_id=69.
[5]JI B,LI S S,YU J,et al.Research on Chinese medical named entity recognition based on collaborative cooperation of multiple neural network models[J].Journal of Biomedical Informatics,2020,104:103395.
[6]LYU J N,XING C Y,LI L.Video Character Relation Extraction Based on Multi-feature Fusion and Fine-granularity Analysis[J].Computer Science,2021,48(4):117-122.
[7]DING L,XIANG Y.Chinese Event Detection with Hierarchical and Multi-granularity Semantic Fusion[J].Computer Science,2021,48(5):202-208.
[8]ZHANG D,CHEN W L.Chinese Named Entity RecognitionBased on Contextualized Char Embeddings[J].Computer Science,2021,48(3):233-238.
[9]ZHOU X J,XU C M,RUAN T.Multi-granularity Medical Entity Recognition for Chinese Electronic Medical Records[J].Computer Science,2021,48(4):237-242.
[10]SUN X,SUN C Y,REN F J.Biomedical named entity recognition based on deep conditional random fields[J].Pattern Recognition and Artificial Intelligence,2016,29(11):997-1008.
[11]DONG X S,QIAN L J,GUAN Y.A multiclass classificationmethod based on deep learning for named entity recognition in electronic medical record[C]//Proceedings of the International 2016 New York Scientific Data Summit (NYSDS).2016.
[12]WANG X,YANG C,GUAN R.A comparative study for biomedical named entity recognition[J].International Journal of Machine Learning & Cybernetics,2018,9(3):373-382.
[13]YU N,WANG P,WENG Z,et al.Named entity recognition in Chinese electronic medical records based on multi-feature integration[J].Beijing Biomedical Engineering,2018,37(3):279-284.
[14]TANG B,CAO H,WANG X.Evaluating word representation features in biomedical named entity recognition tasks[J].Bio-Med Research International,2014:240403.
[15]CHANG F,GUO J,XU W.Application of word embeddings in biomedical named entity recognition tasks[J].Digital Inf. Ma-nage,2015,13(5):321-327.
[16]YAO L,LIU H,LIU Y.Biomedical named entity recognitionbased on deep natural network[J].International Journal of Hybrid Information Technology,2015,8(8):279-288.
[17]LI L,JIN L,JIANG Y.Recognizing biomedical named entities based on sentence vector/twin word embeddings conditioned bidirectional LSTM[C]//Proceedings of China National Confe-rence on Chinese Computational Linguistics.Springer International Publishing,2016:165-176.
[18]LI L S,GUO Y K.Biomedical named entity recognition with CNN-BLSTM-CRF[J].Journal of Chinese Information Proces-sing,2018,32(1):116-122.
[19]LIANG Z,CHEN J,XU Z,et al.A Pattern-Based Method for Medical Entity Recognition From Chinese Diagnostic Imaging Text[J].Frontiers in Artificial Intelligence,2019,2:1-8.
[20]ZHAO G,ZHANG T,WANG C Y,et al.Team MSIIP at CCKS 2019 Task 2 [EB/OL].[2020-11-11].https://conference.bj.bcebos.com/ccks2019/eval/webpage/pdfs/eval_paper_1_2_2.pdf.
[21]SONG Y W,LUO L,LI N,et al.NER-PS-MS:Medical Attri-bute Extraction based on Medical Named Entity Recognition [EB/OL].[2020-11-09].https://conference.bj.bcebos.com/ccks2019/eval/webpage/pdfs/eval_paper_1_2_3.pdf.
[22]DAI S T,WANG Q,HUANG P P,et al.Small sample medical event extraction based on pre-trained language model.[EB/OL].[2020-11-28].CCKS2020 evaluation paper,https://bj.bcebos.com/v1/conference/ccks2020/eval_paper/ccks2020_eval_paper_3_2_1.pdf.
[23]ZHANG X N,ZHAO X Y,GE S,et al.ccks2020 medical event extraction based on named entity recognition [EB/OL].[2020-11-28].CCKS2020 evaluation paper,https://bj.bcebos.com/v1/conference/ccks2020/eval_paper/ccks2020_eval_paper_3_2_2.pdf.
[24]JI B,LIU R,LI S S,et al.A hybrid approach for named entity recognition in Chinese electronic medical record[J].BMC Medical Informatics and Decision Making,2019,19(S2):64.
[1] 方义秋, 张震坤, 葛君伟.
基于自注意力机制和迁移学习的跨领域推荐算法
Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning
计算机科学, 2022, 49(8): 70-77. https://doi.org/10.11896/jsjkx.210600011
[2] 王君锋, 刘凡, 杨赛, 吕坦悦, 陈峙宇, 许峰.
基于多源迁移学习的大坝裂缝检测
Dam Crack Detection Based on Multi-source Transfer Learning
计算机科学, 2022, 49(6A): 319-324. https://doi.org/10.11896/jsjkx.210500124
[3] 彭云聪, 秦小林, 张力戈, 顾勇翔.
面向图像分类的小样本学习算法综述
Survey on Few-shot Learning Algorithms for Image Classification
计算机科学, 2022, 49(5): 1-9. https://doi.org/10.11896/jsjkx.210500128
[4] 谭珍琼, 姜文君, 任演纳, 张吉, 任德盛, 李晓鸿.
基于二分图的个性化学习任务分配
Personalized Learning Task Assignment Based on Bipartite Graph
计算机科学, 2022, 49(4): 269-281. https://doi.org/10.11896/jsjkx.210500125
[5] 左杰格, 柳晓鸣, 蔡兵.
基于图像分块与特征融合的户外图像天气识别
Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion
计算机科学, 2022, 49(3): 197-203. https://doi.org/10.11896/jsjkx.201200263
[6] 张舒萌, 余增, 李天瑞.
跨领域文本的可迁移情绪分析方法
Transferable Emotion Analysis Method for Cross-domain Text
计算机科学, 2022, 49(3): 218-224. https://doi.org/10.11896/jsjkx.210400034
[7] 李星燃, 张立言, 姚树婧.
结合特征融合和注意力机制的微表情识别方法
Micro-expression Recognition Method Combining Feature Fusion and Attention Mechanism
计算机科学, 2022, 49(2): 4-11. https://doi.org/10.11896/jsjkx.210900028
[8] 侯宏旭, 孙硕, 乌尼尔.
蒙汉神经机器翻译研究综述
Survey of Mongolian-Chinese Neural Machine Translation
计算机科学, 2022, 49(1): 31-40. https://doi.org/10.11896/jsjkx.210900006
[9] 吴兰, 王涵, 李斌全.
基于自监督任务最优选择的无监督域自适应方法
Unsupervised Domain Adaptive Method Based on Optimal Selection of Self-supervised Tasks
计算机科学, 2021, 48(6A): 357-363. https://doi.org/10.11896/jsjkx.201000030
[10] 李达, 雷迎科, 张海川.
基于LTE网络的室外指纹定位
Outdoor Fingerprint Positioning Based on LTE Networks
计算机科学, 2021, 48(6A): 404-409. https://doi.org/10.11896/jsjkx.200700170
[11] 熊朝阳, 王婷.
基于卷积神经网络的建筑构件图像识别
Image Recognition for Building Components Based on Convolutional Neural Network
计算机科学, 2021, 48(6A): 51-56. https://doi.org/10.11896/jsjkx.200500122
[12] 刘昱彤, 李鹏, 孙云云, 胡素君.
基于深度动态联合自适应网络的图像识别方法
Image Recognition with Deep Dynamic Joint Adaptation Networks
计算机科学, 2021, 48(6): 131-137. https://doi.org/10.11896/jsjkx.210100008
[13] 刘林芽, 吴送英, 左志远, 曹子文.
基于YOLOv3算法的山区铁路边坡落石检测方法研究
Research on Rockfall Detection Method of Mountain Railway Slope Based on YOLOv3 Algorithm
计算机科学, 2021, 48(11A): 290-294. https://doi.org/10.11896/jsjkx.201200113
[14] 周彦, 陈少昌, 吴可, 宁明强, 陈宏昆, 张鹏.
SCTD1.0:声呐常见目标检测数据集
SCTD 1.0:Sonar Common Target Detection Dataset
计算机科学, 2021, 48(11A): 334-339. https://doi.org/10.11896/jsjkx.210100138
[15] 王新平, 夏春明, 颜建军.
基于肌音信号图像化和卷积神经网络的手语识别研究
Sign Language Recognition Based on Image-interpreted Mechanomyography and Convolution Neural Network
计算机科学, 2021, 48(11): 242-249. https://doi.org/10.11896/jsjkx.201000019
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!