计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 153-158.doi: 10.11896/jsjkx.201100125

• 数据库&大数据&数据科学 • 上一篇    下一篇

面向电子病历语义解析的疾病辅助诊断方法

范红杰1, 李雪冬2, 叶松涛3   

  1. 1 中国政法大学科学技术教学部 北京102249
    2 北京大学软件与微电子学院 北京102600
    3 湘潭大学计算机学院 湖南 湘潭411105
  • 收稿日期:2020-11-17 修回日期:2021-04-16 出版日期:2022-01-15 发布日期:2022-01-18
  • 通讯作者: 叶松涛(yesongtao@xtu.edu.cn)
  • 作者简介:hjfan@cupl.edu.cn
  • 基金资助:
    国家自然科学基金(61802327);湖南省自然科学基金(2018JJ3511)

Aided Disease Diagnosis Method for EMR Semantic Analysis

FAN Hong-jie1, LI Xue-dong2, YE Song-tao3   

  1. 1 The Department of Science and Technology Teaching,China University of Political Science and Law,Beijing 102249,China
    2 School of Software and Microelectronics,Peking University,Beijing 102600,China
    3 School of Computer Science,Xiangtan University,Xiangtan,Hunan 411105,China
  • Received:2020-11-17 Revised:2021-04-16 Online:2022-01-15 Published:2022-01-18
  • About author:FAN Hong-jie,born in 1984,Ph.D,lecturer.His main research interests include data exchange and knowledge graphs.
    YE Song-tao,born in 1983,Ph.D,asso-ciate professor.His main research in-terests include truth discovery,data analysis and data mining.
  • Supported by:
    National Natural Science Foundation of China(61802327) and Natural Science Foundation of Hunan Province (2018JJ3511).

摘要: 针对面向电子病历的疾病辅助诊断问题,文中将词向量和文本判别方法应用到电子病历的文本语义解析任务中。具体地,采用预训练语言模型作为字符的语义表征,从而对文本特征进行准确表达,在卷积神经网络中提取N元特征后,使用胶囊单元对特征进行聚类,从而更好地捕获文本的高层语义特征,同时减少对数据量的需求。实验发现,基于ERNIE+CNN+Capsule的组合模型在真实的电子病历数据集上取得了良好的效果。此外,受图像风格迁移的启发,文中训练了从电子病历文本到病情自述文本的风格转换模型,利用非平行数据,在风格转换模型的基础上,增加了对抗思想和困惑度评价指标,可以有效缓解训练数据和测试数据分布不一致的问题。最后,相比ALBERTtiny,BERT等模型,所提模型在病历文本上获得了86.89%的F1值,提升了1.36%~3.68%;在泛化性能任务评估中,获得了94.95%的F1值。实验证明,所提模型在保证较高准确率的前提下,可以有效适应疾病辅助诊断。

关键词: 电子病历, 辅助诊断, 胶囊网络, 深度神经网络, 语义解析

Abstract: Aiming at solving the problem of auxiliary disease diagnosis for electronic medical record,the word vector and text discrimination method are applied to the semantic text analysis task.Concretely,the pre-training language model is used as the semantic representation of characters,so as to accurately express the text features.After extracting N-ary features from convolutional neural network,the capsule unit is used to cluster the features,so as to better capture the high-level semantic text features and reduce the demand for data.It is found that the combination model based on ERNIE+CNN+Capsule achieves high accuracy on the real EMR.In addition,inspired by the image style transfer,a style conversion model from EMR text to disease self-report text is trained.Based on the style conversion model,non-parallel data are used to add confrontation ideas and confusion evaluation indexes,which can effectively alleviate the problem of inconsistent distribution of training data and test data.Finally,compared with ALBERTtiny,BERT and other models,the proposed model gets 86.89% F1 value in the EMR,which is improved by1.36%~3.68%,and 94.95% F1 value in the generalization.Experiments show that the proposed model can effectively adapt to the auxiliary disease diagnosis on the premise of ensuring high accuracy.

Key words: Auxiliary diagnosis, Capsule network, Deep neural networks, Electronic medical record, Semantic analysis

中图分类号: 

  • TP391.4
[1]国卫办医发〔2017〕8号.关于印发电子病历应用管理规范(试行)的通知[OL].http://www.nhc.gov.cn/mohwsbwstjxxzx/s8553/201702/fb49f9487d884645b7247218b764bba3.shtml.
[2]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks[C]//Neural Information Processing Systems.2012:1106-1114.
[3]ZHANG Y Q,GU D Y.Review of Computer Aided Diagnosis for Parkinson's Tremor and Essential Tremor[J].Computer Science,2019,46(7):22-29.
[4]CHEN D Y,ZHAO H,ZHANG X.Aided Diagnosis Method for Diseases Based on the Domain Semantic Knowledge Base[J].Journal of Software,2020,31(10):3167-3183.
[5]HOCHREITER S,SCHMIDUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[6]KIM Y.Convolutional neural networks for sentence classification[C]//Conference on Empirical Methods in Natural Language Processing.2014:1746-1751.
[7]ZHANG Z Y,HAN X,LIU Z Y,et al.ERNIE:Enhanced language representation with informative entities[C]//Proceedings of the 57th Annual Meeting of the Association for ComputationalLinguistics.2019:1441-1451.
[8]SABOUR S,FROSST N,HINTON G E.Dynamic routing between capsules[C]//Neural Information Processing Systems.2017:3856-3866.
[9]LI F F,PERONA P.A Bayesian hierarchical model for learning natural scene categories[C]//Computer Vision and Pattern Re-cognition.2005:524-531.
[10]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[C]//International Conference on Learning Representations (Workshop Poster).Scottsdale,Arizona,USA,2013.
[11]CHEN T Q,GUESTRIN C.XGBoost:a scalable tree boosting system[C]//Knowledge Discovery and Data Mining.2016:785-794.
[12]PETERS M E,NEUMANN M,IYYER M,et al.Deep contex-tualized word representations[C]//The North American Chapter of the Association for Computational Linguistics.2018:2227-2237.
[13]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pretraining ofDeep Bidirectional Transformers for Language Understanding[C]//The North American Chapter of the Association for Computational Linguistics.2019:4171-4186.
[14]ZHANG H,GOODFELLOW I J,METAXAS D N,et al.Self-Attention Generative Adversarial Networks[C]//International Conference on Machine Learning.2019:7354-7363.
[15]ZHU J Y,PARK T,ISOLA P,et al.Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks[C]//International Conference on Computer Vision.2017:2242-2251.
[16]LUO F L,LI P.A dual reinforcement learning framework forunsupervised text style transfer[C]//International Joint Confe-rence on Artificial Intelligence.2019:5116-5122.
[17]LAN Z Z,CHEN M D,GOODMAN S,et al.ALBERT:A Lite BERT for self-supervised learning of language representations [C]//International Conference on Learning Representations.2020.
[1] 袁昊男, 王瑞锦, 郑博文, 吴邦彦.
基于Fabric的电子病历跨链可信共享系统设计与实现
Design and Implementation of Cross-chain Trusted EMR Sharing System Based on Fabric
计算机科学, 2022, 49(6A): 490-495. https://doi.org/10.11896/jsjkx.210500063
[2] 常炳国, 石华龙, 常雨馨.
基于深度学习的黑色素瘤智能诊断多模型算法
Multi Model Algorithm for Intelligent Diagnosis of Melanoma Based on Deep Learning
计算机科学, 2022, 49(6A): 22-26. https://doi.org/10.11896/jsjkx.210500197
[3] 于家畦, 康晓东, 白程程, 刘汉卿.
一种新的中文电子病历文本检索模型
New Text Retrieval Model of Chinese Electronic Medical Records
计算机科学, 2022, 49(6A): 32-38. https://doi.org/10.11896/jsjkx.210400198
[4] 武霖, 孙静宇.
多分支RA胶囊网络及在图像分类中的应用
Multi-branch RA Capsule Network and Its Application in Image Classification
计算机科学, 2022, 49(6): 224-230. https://doi.org/10.11896/jsjkx.210400087
[5] 焦翔, 魏祥麟, 薛羽, 王超, 段强.
基于深度学习的自动调制识别研究
Automatic Modulation Recognition Based on Deep Learning
计算机科学, 2022, 49(5): 266-278. https://doi.org/10.11896/jsjkx.211000085
[6] 高捷, 刘沙, 黄则强, 郑天宇, 刘鑫, 漆锋滨.
基于国产众核处理器的深度神经网络算子加速库优化
Deep Neural Network Operator Acceleration Library Optimization Based on Domestic Many-core Processor
计算机科学, 2022, 49(5): 355-362. https://doi.org/10.11896/jsjkx.210500226
[7] 颜锐, 梁智勇, 李锦涛, 任菲.
基于深度学习和H&E染色病理图像的肿瘤相关指标预测研究综述
Predicting Tumor-related Indicators Based on Deep Learning and H&E Stained Pathological Images:A Survey
计算机科学, 2022, 49(2): 69-82. https://doi.org/10.11896/jsjkx.210900140
[8] 周艺华, 贾玉欣, 贾立圆, 方嘉博, 侍伟敏.
基于红黑树的共享电子病历数据完整性验证方案
Data Integrity Verification Scheme of Shared EMR Based on Red Black Tree
计算机科学, 2021, 48(9): 330-336. https://doi.org/10.11896/jsjkx.200600139
[9] 陈志文, 王坤, 周广蕴, 王旭, 张晓丹, 朱虎明.
基于胶囊网络及其权重剪枝的SAR图像变化检测方法
SAR Image Change Detection Method Based on Capsule Network with Weight Pruning
计算机科学, 2021, 48(7): 190-198. https://doi.org/10.11896/jsjkx.200800225
[10] 周欣, 刘硕迪, 潘薇, 陈媛媛.
自然交通场景中的车辆颜色识别
Vehicle Color Recognition in Natural Traffic Scene
计算机科学, 2021, 48(6A): 15-20. https://doi.org/10.11896/jsjkx.200800078
[11] 徐少伟, 秦品乐, 曾建朝, 赵致楷, 高媛, 王丽芳.
基于多级特征和全局上下文的纵膈淋巴结分割算法
Mediastinal Lymph Node Segmentation Algorithm Based on Multi-level Features and Global Context
计算机科学, 2021, 48(6A): 95-100. https://doi.org/10.11896/jsjkx.200700067
[12] 周晓进, 徐陈铭, 阮彤.
面向中文电子病历的多粒度医疗实体识别
Multi-granularity Medical Entity Recognition for Chinese Electronic Medical Records
计算机科学, 2021, 48(4): 237-242. https://doi.org/10.11896/jsjkx.200100036
[13] 刘东, 王叶斐, 林建平, 马海川, 杨闰宇.
端到端优化的图像压缩技术进展
Advances in End-to-End Optimized Image Compression Technologies
计算机科学, 2021, 48(3): 1-8. https://doi.org/10.11896/jsjkx.201100134
[14] 潘雨, 邹军华, 王帅辉, 胡谷雨, 潘志松.
基于网络表示学习的深度社团发现方法
Deep Community Detection Algorithm Based on Network Representation Learning
计算机科学, 2021, 48(11A): 198-203. https://doi.org/10.11896/jsjkx.210200113
[15] 马琳, 王云霄, 赵丽娜, 韩兴旺, 倪金超, 张婕.
基于多模型判别的网络入侵检测系统
Network Intrusion Detection System Based on Multi-model Ensemble
计算机科学, 2021, 48(11A): 592-596. https://doi.org/10.11896/jsjkx.201100170
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!