计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 153-158.doi: 10.11896/jsjkx.201100125
范红杰1, 李雪冬2, 叶松涛3
FAN Hong-jie1, LI Xue-dong2, YE Song-tao3
摘要: 针对面向电子病历的疾病辅助诊断问题,文中将词向量和文本判别方法应用到电子病历的文本语义解析任务中。具体地,采用预训练语言模型作为字符的语义表征,从而对文本特征进行准确表达,在卷积神经网络中提取N元特征后,使用胶囊单元对特征进行聚类,从而更好地捕获文本的高层语义特征,同时减少对数据量的需求。实验发现,基于ERNIE+CNN+Capsule的组合模型在真实的电子病历数据集上取得了良好的效果。此外,受图像风格迁移的启发,文中训练了从电子病历文本到病情自述文本的风格转换模型,利用非平行数据,在风格转换模型的基础上,增加了对抗思想和困惑度评价指标,可以有效缓解训练数据和测试数据分布不一致的问题。最后,相比ALBERTtiny,BERT等模型,所提模型在病历文本上获得了86.89%的F1值,提升了1.36%~3.68%;在泛化性能任务评估中,获得了94.95%的F1值。实验证明,所提模型在保证较高准确率的前提下,可以有效适应疾病辅助诊断。
中图分类号:
[1]国卫办医发〔2017〕8号.关于印发电子病历应用管理规范(试行)的通知[OL].http://www.nhc.gov.cn/mohwsbwstjxxzx/s8553/201702/fb49f9487d884645b7247218b764bba3.shtml. [2]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks[C]//Neural Information Processing Systems.2012:1106-1114. [3]ZHANG Y Q,GU D Y.Review of Computer Aided Diagnosis for Parkinson's Tremor and Essential Tremor[J].Computer Science,2019,46(7):22-29. [4]CHEN D Y,ZHAO H,ZHANG X.Aided Diagnosis Method for Diseases Based on the Domain Semantic Knowledge Base[J].Journal of Software,2020,31(10):3167-3183. [5]HOCHREITER S,SCHMIDUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [6]KIM Y.Convolutional neural networks for sentence classification[C]//Conference on Empirical Methods in Natural Language Processing.2014:1746-1751. [7]ZHANG Z Y,HAN X,LIU Z Y,et al.ERNIE:Enhanced language representation with informative entities[C]//Proceedings of the 57th Annual Meeting of the Association for ComputationalLinguistics.2019:1441-1451. [8]SABOUR S,FROSST N,HINTON G E.Dynamic routing between capsules[C]//Neural Information Processing Systems.2017:3856-3866. [9]LI F F,PERONA P.A Bayesian hierarchical model for learning natural scene categories[C]//Computer Vision and Pattern Re-cognition.2005:524-531. [10]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[C]//International Conference on Learning Representations (Workshop Poster).Scottsdale,Arizona,USA,2013. [11]CHEN T Q,GUESTRIN C.XGBoost:a scalable tree boosting system[C]//Knowledge Discovery and Data Mining.2016:785-794. [12]PETERS M E,NEUMANN M,IYYER M,et al.Deep contex-tualized word representations[C]//The North American Chapter of the Association for Computational Linguistics.2018:2227-2237. [13]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pretraining ofDeep Bidirectional Transformers for Language Understanding[C]//The North American Chapter of the Association for Computational Linguistics.2019:4171-4186. [14]ZHANG H,GOODFELLOW I J,METAXAS D N,et al.Self-Attention Generative Adversarial Networks[C]//International Conference on Machine Learning.2019:7354-7363. [15]ZHU J Y,PARK T,ISOLA P,et al.Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks[C]//International Conference on Computer Vision.2017:2242-2251. [16]LUO F L,LI P.A dual reinforcement learning framework forunsupervised text style transfer[C]//International Joint Confe-rence on Artificial Intelligence.2019:5116-5122. [17]LAN Z Z,CHEN M D,GOODMAN S,et al.ALBERT:A Lite BERT for self-supervised learning of language representations [C]//International Conference on Learning Representations.2020. |
[1] | 袁昊男, 王瑞锦, 郑博文, 吴邦彦. 基于Fabric的电子病历跨链可信共享系统设计与实现 Design and Implementation of Cross-chain Trusted EMR Sharing System Based on Fabric 计算机科学, 2022, 49(6A): 490-495. https://doi.org/10.11896/jsjkx.210500063 |
[2] | 常炳国, 石华龙, 常雨馨. 基于深度学习的黑色素瘤智能诊断多模型算法 Multi Model Algorithm for Intelligent Diagnosis of Melanoma Based on Deep Learning 计算机科学, 2022, 49(6A): 22-26. https://doi.org/10.11896/jsjkx.210500197 |
[3] | 于家畦, 康晓东, 白程程, 刘汉卿. 一种新的中文电子病历文本检索模型 New Text Retrieval Model of Chinese Electronic Medical Records 计算机科学, 2022, 49(6A): 32-38. https://doi.org/10.11896/jsjkx.210400198 |
[4] | 武霖, 孙静宇. 多分支RA胶囊网络及在图像分类中的应用 Multi-branch RA Capsule Network and Its Application in Image Classification 计算机科学, 2022, 49(6): 224-230. https://doi.org/10.11896/jsjkx.210400087 |
[5] | 焦翔, 魏祥麟, 薛羽, 王超, 段强. 基于深度学习的自动调制识别研究 Automatic Modulation Recognition Based on Deep Learning 计算机科学, 2022, 49(5): 266-278. https://doi.org/10.11896/jsjkx.211000085 |
[6] | 高捷, 刘沙, 黄则强, 郑天宇, 刘鑫, 漆锋滨. 基于国产众核处理器的深度神经网络算子加速库优化 Deep Neural Network Operator Acceleration Library Optimization Based on Domestic Many-core Processor 计算机科学, 2022, 49(5): 355-362. https://doi.org/10.11896/jsjkx.210500226 |
[7] | 颜锐, 梁智勇, 李锦涛, 任菲. 基于深度学习和H&E染色病理图像的肿瘤相关指标预测研究综述 Predicting Tumor-related Indicators Based on Deep Learning and H&E Stained Pathological Images:A Survey 计算机科学, 2022, 49(2): 69-82. https://doi.org/10.11896/jsjkx.210900140 |
[8] | 周艺华, 贾玉欣, 贾立圆, 方嘉博, 侍伟敏. 基于红黑树的共享电子病历数据完整性验证方案 Data Integrity Verification Scheme of Shared EMR Based on Red Black Tree 计算机科学, 2021, 48(9): 330-336. https://doi.org/10.11896/jsjkx.200600139 |
[9] | 陈志文, 王坤, 周广蕴, 王旭, 张晓丹, 朱虎明. 基于胶囊网络及其权重剪枝的SAR图像变化检测方法 SAR Image Change Detection Method Based on Capsule Network with Weight Pruning 计算机科学, 2021, 48(7): 190-198. https://doi.org/10.11896/jsjkx.200800225 |
[10] | 周欣, 刘硕迪, 潘薇, 陈媛媛. 自然交通场景中的车辆颜色识别 Vehicle Color Recognition in Natural Traffic Scene 计算机科学, 2021, 48(6A): 15-20. https://doi.org/10.11896/jsjkx.200800078 |
[11] | 徐少伟, 秦品乐, 曾建朝, 赵致楷, 高媛, 王丽芳. 基于多级特征和全局上下文的纵膈淋巴结分割算法 Mediastinal Lymph Node Segmentation Algorithm Based on Multi-level Features and Global Context 计算机科学, 2021, 48(6A): 95-100. https://doi.org/10.11896/jsjkx.200700067 |
[12] | 周晓进, 徐陈铭, 阮彤. 面向中文电子病历的多粒度医疗实体识别 Multi-granularity Medical Entity Recognition for Chinese Electronic Medical Records 计算机科学, 2021, 48(4): 237-242. https://doi.org/10.11896/jsjkx.200100036 |
[13] | 刘东, 王叶斐, 林建平, 马海川, 杨闰宇. 端到端优化的图像压缩技术进展 Advances in End-to-End Optimized Image Compression Technologies 计算机科学, 2021, 48(3): 1-8. https://doi.org/10.11896/jsjkx.201100134 |
[14] | 潘雨, 邹军华, 王帅辉, 胡谷雨, 潘志松. 基于网络表示学习的深度社团发现方法 Deep Community Detection Algorithm Based on Network Representation Learning 计算机科学, 2021, 48(11A): 198-203. https://doi.org/10.11896/jsjkx.210200113 |
[15] | 马琳, 王云霄, 赵丽娜, 韩兴旺, 倪金超, 张婕. 基于多模型判别的网络入侵检测系统 Network Intrusion Detection System Based on Multi-model Ensemble 计算机科学, 2021, 48(11A): 592-596. https://doi.org/10.11896/jsjkx.201100170 |
|