计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 211100241-9.doi: 10.11896/jsjkx.211100241

• 大数据&数据科学 • 上一篇    下一篇

基于多源健康感知数据动静态关系融合的疾病诊断

霍甜媛, 顾晶晶   

  1. 南京航空航天大学计算机科学与技术学院 南京 211106
  • 出版日期:2022-11-10 发布日期:2022-11-21
  • 通讯作者: 顾晶晶(gujingjing@nuaa.edu.cn)
  • 作者简介:(huotianyuan@nuaa.edu.cn)
  • 基金资助:
    国家自然科学基金(62072235)

Dynamic and Static Relationship Fusion of Multi-source Health Perception Data for Disease Diagnosis

HUO Tian-yuan, GU Jing-jing   

  1. School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:HUO Tian-yuan,born in 1997,postgra-duate,is a member of China Computer Federation.Her main research interests include machine learning and data mi-ning.
    GU Jing-jing,born in 1986,Ph.D,professor,is a member of China Computer Federation.Her main research interests include mobile computing and data mi-ning.
  • Supported by:
    National Natural Science Foundation of China(62072235).

摘要: 疾病诊断是电子健康记录数据挖掘的热门研究领域,也是实现医疗诊断智能化的一个重要环节。但是,电子健康记录中健康感知数据的来源多样、数据结构复杂,且不同类型的数据之间有着潜在的相关性,在进行特征提取和挖掘分析过程中存在着异构数据应该如何融合的问题。只有对医学感测数据、个人体质记录数据、疾病间关系数据进行综合考虑,挖掘其中的相关隐藏特征,才能对多种类别疾病进行更准确的诊断。因此,基于多源健康感知数据动静态关系融合的疾病诊断模型(DSRF)首先通过动静态关系融合算法解决动态医学感测数据和静态体质记录数据的异构性问题并挖掘其相关关系,然后计算多类别疾病的关联矩阵来提取疾病间依赖关系,最后在门控循环单元网络架构的基础上将多种健康感知数据进行融合,完成了多源异构数据的综合分析。在美国MIMIC-III临床数据集上的实验结果证明,相比同类型主流模型,该模型可以更准确地对多种类别疾病进行联合诊断。

关键词: 多源数据融合, 动静态关系融合, 疾病诊断, 电子健康记录, 临床数据挖掘

Abstract: Disease diagnosis is a field of electronic health record data mining where lots of researchers are interested in,and it is also an important link to realize the intellectualization of medical diagnosis.However,due to the diversity of data sources,complex data structure and potential correlation among different types of health sensing data,there is a problem of how to fuse heterogeneous data in the process of feature extraction and data mining.Therefore,comprehensively considering clinical sensing data,personal physical record data and relationship data between diseases,and mining the latent relevant features can make the diagnosis of multi-category diseases more accurate.Dynamic and static relationship fusion of multi-source health perception data for disease diagnosis(DSRF) is proposed.Firstly,the dynamic and static relationship fusion algorithm is used to extract data correlation features and solve the heterogeneity of dynamic clinical sensing time series data and static personal physical condition data.Then the dependency matrix of multi-category diseases is calculated to extract the correlations among diseases.Finally,various health sen-sing data is fused based on the gated recurrent unit network.The comprehensive analysis of multi-source heterogeneous data is completed after the above three steps.Experimental results on the real-world American MIMIC-III clinical dataset show that the proposed model outperforms state-of-the-art models and is able to diagnose multiple categories of diseases accurately.

Key words: Multi-source data fusion, Dynamic and static relationship fusion, Disease diagnosis, Electronic health record, Clinical data mining

中图分类号: 

  • TP399
[1]KOREN A,PRASAD R.IoT Health Data in Electronic Health Records(EHR):Security and Privacy Issues in Era of 6G[J].Journal of ICT Standardization,2022,10(1):63-84.
[2]MAHAJAN P,RANA D.Investigating Clinical Named EntityRecognition Approaches for Information Extraction from EMR[M]//Tracking and Preventing Diseases with Artificial Intelligence.Springer,Cham,2022:153-175.
[3]MOHAMMADI R,JAIN S,AGBOOLA S,et al.Learning toidentify patients at risk of uncontrolled hypertension using electronic health records data[J].AMIA Summits on Translational Science Proceedings,2019,2019:533-542.
[4]MA L,GAO J,WANG Y,et al.AdaCare:Explainable Clinical Health Status Representation Learning via Scale-Adaptive Feature Extraction and Recalibration[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2020:825-832.
[5]AYON S I,ISLAM M M.Diabetes Prediction:A Deep LearningApproach[J].International Journal of Information Engineering and Electronic Business,2019,11(2):21-27.
[6]JOHNSON A E W,POLLARD T J,SHEN L,et al.MIMIC-III,a freely accessible critical care database[J].Scientific Data,2016,3(1):1-9.
[7]WONG M S,WELLS M,PARRINELLA K,et al.EHR phenotyping by Natural Language Processing improves detection of patients at risk for preeclampsia[J].American Journal of Obstetrics & Gynecology,2022,226(1):S65-S66.
[8]LIAO B,JIA X,ZHANG T,et al.DHDIP:An InterpretableModel for Hypertension and Hyperlipidemia Prediction Based on EMR Data[J/OL].SSRN.http://dx.doi.org/10.2139/ssrn.4022954.
[9]GUTIERREZ G.Artificial intelligence in the intensive care unit[J].Critial Care,2020,24(1):1-9.
[10]PIRRACCHIO R.Mortality prediction in the ICU based onMIMIC-II results from the super ICU learner algorithm(SICULA) project[J/OL].Secondary Analysis of Electronic Health Records,2016,2016:295-313.https://doi.org/10.1007/978-3-3-19-43742-2_20.
[11]SANJAY P,CHUIZHENG M,ZHENGPING C,et al.Benchmarking deep learning models on large healthcare datasets[J].Journal of Biomedical Informatics,2018,83:112-134.
[12]HARUTYUNYAN H,KHACHATRIAN H,KALE D C,et al.Multitask learning and benchmarking with clinical time series data[J].Scientific Data,2019,6(1):1-18.
[13]LEE W,SHI Y,SUN H,et al.MSIPA:Multi-Scale Interval Pattern-Aware Network for ICU Transfer Prediction[J].ACM Transactions on Knowledge Discovery from Data(TKDD),2021,16(1):1-17.
[14]CHOI E,XU Z,LI Y,et al.Learning the Graphical Structure of Electronic Health Records with Graph Convolutional Transformer[J].Proceedings of the AAAI Conference on Artificial Intelligence,2020,34(1):606-613.
[15]XU Z,SO D R,DAI A M.MUFASA:Multimodal Fusion Architecture Search for Electronic Health Records[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:10532-10540.
[16]AGARWAL V,PODCHIYSKA T,BANDA J M,et al.Learning statistical models of phenotypes using noisy labeled training data[J].Journal of the American Medical Informatics Association,2016,23(6):1166-1173.
[17]HALPERN Y,HORNG S,CHOI Y,et al.Electronic medical record phenotyping using the anchor and learn framework[J].Journal of the American Medical Informatics Association,2016,23(4):731-740.
[18]MARLIN B M,KALE D C,KHEMANI R G,et al.Unsuper-vised pattern discovery in electronic health care data using proba-bilistic clustering models[C]//Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium.2012:389-398.
[19]HO J C,GHOSH J,SUN J.Marble:high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2014:115-124.
[20]CHE Z,KALE D,LI W,et al.Deep computational phenotyping[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2015:507-516.
[21]CHOI E,BAHADORI M T,SCHUETZ A,et al.Doctor ai:Predicting clinical events via recurrent neural networks[C]//Machine Learning for Healthcare Conference.PMLR,2016:301-318.
[22]RAZAVIAN N,MARCUS J,SONTAG D.Multi-task prediction of disease onsets from longitudinal laboratory tests[C]//Machine Learning for Healthcare Conference.PMLR,2016:73-100.
[23]MA L,ZHANG C,WANG Y,et al.Concare:Personalized clinical feature embedding via capturing the healthcare context[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:833-840.
[24]CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014.
[25]LIPTON Z C,KALE D C,ELKAN C,et al.Learning to diag-nose with LSTM recurrent neural networks[J].arXiv:1511.03677,2015.
[26]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[27]CHOI E,BAHADORI M T,SEARLES E,et al.Multi-layer representation learning for medical concepts[C]//proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2016:1495-1504.
[28]ESTEBAN C,STAECK O,BAIER S,et al.Predicting clinicalevents by combining static and dynamic information using recurrent neural networks[C]//2016 IEEE International Conference on Healthcare Informatics(ICHI).IEEE,2016:93-101.
[29]SONG H,RAJAN D,THIAGARAJAN J J,et al.Attendand diagnose:Clinical time series analysis using attention models[C]//Thirty-second AAAI Conference on Artificial Intelligence.2018.
[1] 周海榆, 张道强.
面向多中心数据的超图卷积神经网络及应用
Multi-site Hyper-graph Convolutional Neural Networks and Application
计算机科学, 2022, 49(3): 129-133. https://doi.org/10.11896/jsjkx.201100152
[2] 康明.
基于弱监督的深度学习胸部X光疾病诊断与定位方法
Method for Diagnosis and Location of Chest X-ray Diseases with Deep Learning Based on Weak Supervision
计算机科学, 2021, 48(11A): 367-369. https://doi.org/10.11896/jsjkx.201200152
[3] 李杭, 李维华, 陈伟, 杨仙明, 曾程.
基于Node2vec和知识注意力机制的诊断预测
Diagnostic Prediction Based on Node2vec and Knowledge Attention Mechanisms
计算机科学, 2021, 48(11A): 630-637. https://doi.org/10.11896/jsjkx.210300070
[4] 樊连玺, 刘彦北, 王雯, 耿磊, 吴骏, 张芳, 肖志涛.
基于多模态表示学习的阿尔兹海默症诊断算法
Multimodal Representation Learning for Alzheimer's Disease Diagnosis
计算机科学, 2021, 48(10): 107-113. https://doi.org/10.11896/jsjkx.200900178
[5] 屠袁飞,张成真.
面向云端的安全高效的电子健康记录
Secure and Efficient Electronic Health Records for Cloud
计算机科学, 2020, 47(2): 294-299. https://doi.org/10.11896/jsjkx.181202256
[6] 琚春华, 邹江波, 傅小康.
融入区块链技术的大数据征信平台的设计与应用研究
Design and Application of Big Data Credit Reporting Platform Integrating Blockchain Technology
计算机科学, 2018, 45(11A): 522-526.
[7] 闫铭, 张应辉, 郑东, 吕柳迪, 苏昊楠.
灵活访问且模糊可搜索的EHR云服务系统
Flexibly Accessed and Vaguely Searchable EHR Cloud Service System
计算机科学, 2018, 45(10): 172-177. https://doi.org/10.11896/j.issn.1002-137X.2018.10.032
[8] 张巧丽,赵地,迟学斌.
基于深度学习的医学影像诊断综述
Review for Deep Learning Based on Medical Imaging Diagnosis
计算机科学, 2017, 44(Z11): 1-7. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.001
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!