计算机科学 ›› 2023, Vol. 50 ›› Issue (7): 46-52.doi: 10.11896/jsjkx.230200216

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于对比学习的疾病诊断预测算法

王明霞, 熊贇   

  1. 复旦大学计算机科学技术学院 上海 200433
    上海市数据科学重点实验室 上海 200433
  • 收稿日期:2023-02-28 修回日期:2023-04-17 出版日期:2023-07-15 发布日期:2023-07-05
  • 通讯作者: 熊贇(yunx@fudan.edu.cn)
  • 作者简介:(wangmx20@fudan.edu.cn)

Disease Diagnosis Prediction Algorithm Based on Contrastive Learning

WANG Mingxia, XIONG Yun   

  1. School of Computer Science,Fudan University,Shanghai 200433,ChinaShanghai Key Laboratory of Data Science,Shanghai 200433,China
  • Received:2023-02-28 Revised:2023-04-17 Online:2023-07-15 Published:2023-07-05
  • About author:WANG Mingxia,born in 1999,postgraduate.Her main research interests include big data and medical data mi-ning.XIONG Yun,born in 1980,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.Her main research interests include data science and data mining.

摘要: 疾病诊断预测旨在利用电子健康数据建模疾病进展模式,预测患者未来的健康状况,其在辅助临床决策、医疗保健服务等领域得到广泛应用。为了进一步发掘就诊记录中有价值的信息,提出了一种基于对比学习的疾病诊断预测算法。对比学习通过衡量样本间相似度为模型提供自监督训练信号,提升模型的信息捕捉能力。所提算法通过对比训练挖掘相似患者之间的共性知识,增强模型学习患者表征的能力;为了捕获更加全面的共性信息,还进一步挖掘了目标患者相似群体的信息作为辅助信息刻画患者健康状态。在公开数据集上的实验结果表明,相比Retain,Dipole,LSAN和GRASP算法,所提算法在再入院预测任务的AUROC和AUPRC指标上分别提升2.9%和8.1%以上,在诊断预测任务的Recall@10和MAP@10指标上分别提升2.1%和1.8%以上。

关键词: 诊断预测, 深度学习, 对比学习, 聚类, 相似患者

Abstract: Disease diagnosis prediction aims to use electronic health data to model disease progression patterns and predict the future health status of patients,and is widely used in assisting clinical decision-making,healthcare services and other fields.In order to further explore the valuable information in the medical records,a disease diagnosis prediction algorithm based on contrastive learning is proposed.Contrastive learning provides self-supervised training signals for the model by measuring the similarity between samples,which can improve the information capture ability of the model.The proposed algorithm excavates the common knowledge between similar patients through contrastive training,and enhances the ability of the model to learn patient representations.In order to capture more comprehensive common information,the information of similar groups of the target patient is further explored as auxiliary information to characterize the health status of the target patient.Experimental results on the public dataset show that compared with the Retain,Dipole,LSAN and GRASP algorithms,the proposed algorithm improves AUROC and AUPRC of the readmission prediction task by more than 2.9% and 8.1% respectively,and Recall@10 and MAP@10 of the diagnosis prediction task by 2.1% and 1.8%,respectively.

Key words: Diagnosis prediction, Deep learning, Contrastive learning, Clustering, Similar patients

中图分类号: 

  • TP311
[1]LI Y J,ZHENG R L,YANG X M.Diagnosis and predictionmodel of coronary heart disease based on data mining technology[J].Medical Information,2020,33(24):14-17.
[2]ZHU X T,PANG C Y,ZHU H.Cardiovascular disease prediction model based on deep learning [J].Journal of Computer Applications,2021,41(S2):346-350.
[3]LI M,MA L Y,YAO Z.Study on an intelligent diagnosis prediction model based on deep neural network[J].Medical Information,2022,43(8):52-55,75.
[4]CHOI E,BAHADORI M T,KULAS J A,et al.Retain:An interpretable predictive model for healthcare using reverse time attention mechanism[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:3512-3520.
[5]MA F,CHITTA R,ZHOU J,et al.Dipole:Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2017:1903-1911.
[6]XIAO C,MA T,DIENG A B,et al.Readmission prediction via deep contextual embedding of clinical concepts[J].PLOS ONE,2018,13(4):1-15.
[7]CHOI E,BAHADORI M T,SCHUETZ A,et al.Doctor AI:Predicting clinical events via recurrent neural networks[C]//Proceedings of the 1st Machine Learning for Healthcare Confe-rence.2016:301-318.
[8]BAYTAS I M,XIAO C,ZHANG X,et al.Patient subtyping via time-aware LSTM networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2017:65-74.
[9]KWON B C,CHOI M J,KIM J T,et al.RetainVis:Visual analytics with interpretable and interactive recurrent neural networks on electronic medical records [J].IEEE Transactions on Visualization and Computer Graphics,2019,25(1):299-309.
[10]BAI T,ZHANG S,EGLESTON B L,et al.Interpretable representation learning for healthcare via capturing disease progression through time[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2018:43-51.
[11]LUO J,YE M,XIAO C,et al.HiTANet:Hierarchical time-aware attention networks for risk prediction on electronic health records [C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2020:647-656.
[12]MEN L,ILK N,TANG X,et al.Multi-disease predictionusing LSTM recurrent neural networks[J].Expert Systems with Applications,2021,177:114905.
[13]SUO Q,MA F,YUAN Y,et al.Personalized disease prediction using a CNN based similarity learning method[C]//2017 IEEE International Conference on Bioinformatics and Biomedicine(BIBM).2017:811-816.
[14]SUO Q,MA F,YUAN Y,et al.Deep patient similarity learning for personalized health care[J].IEEE Transactions on NanoBioscience,2018,17(3):219-227.
[15]ZHANG C,GAO X,MA L,et al.GRASP:Generic framework for health status representation learning based on incorporating knowledge from similar patients [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:715-723.
[16]OEI R W,HSU W,LEE M L,et al.Using similar patients to predict complication in patients with diabetes,hypertension,and lipid disorder:a domain knowledge infused convolutional neural network approach[J].Journal of the American Medical Informatics Association,2022,30(2):273-281.
[17]LI Y,YANG D,GONG X.Patient similarity via medical attributed heterogeneous graph convolutional network[J].IAENG International Journal of Computer Science,2022,49(4):1152-1161.
[18]AN Y,LI R,CHEN X.MERGE:A multi-graph attentive representation learning framework integrating group information from similar patients[J].Computers in Biology and Medicine,2022,151:106245.
[19]ZHANG C,CHU X,MA L,et al.M3Care:Learning with mis-sing modalities in multimodal healthcare data[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.KDD,2022:2418-2428.
[20]VAN DEN OORD A,LI Y,VINYALS O.Representation lear-ning with contrastive predictive coding[J].arXiv:,1807.03748,2018.
[21]LI J,ZHOU P,XIONG C,et al.Prototypical contrastive learning of unsupervised representations [J].arXiv:2005.04966,2020.
[22]PENG X,LONG G,SHEN T,et al.Self-attention enhanced patient journey understanding in healthcare system[C]//Joint European Conference on Machine Learning and Knowledge Disco-very in Databases.2020:719-735.
[23]YE M,LUO J,XIAO C,et al.LSAN:Modeling long-term dependencies and short-term correlations with hierarchical attention for risk prediction[C]//Proceedings of the 29th ACM International Conference on Information and Knowledge Management.2020:1753-1762.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!