融合Dead-ends和离线监督Actor-Critic的动态治疗策略生成模型

doi:10.11896/jsjkx.231000138

计算机科学 ›› 2024, Vol. 51 ›› Issue (7): 80-88.doi: 10.11896/jsjkx.231000138

• 数据库&大数据&数据科学 • 上一篇下一篇

融合Dead-ends和离线监督Actor-Critic的动态治疗策略生成模型

杨莎莎, 于亚新, 王跃茹, 许晶铭, 魏阳杰, 李新华

东北大学计算机科学与工程学院沈阳 110169
医学影像智能计算教育部重点实验室(东北大学) 沈阳 110169

收稿日期:2023-10-19 修回日期:2024-03-27 出版日期:2024-07-15 发布日期:2024-07-10
通讯作者: 于亚新(yuyx@mail.neu.edu.cn)
作者简介:(1692080148@qq.com)
基金资助:
国家自然科学基金(62373084)

Dynamic Treatment Regime Generation Model Combining Dead-ends and Offline SupervisionActor-Critic

YANG Shasha, YU Yaxin, WANG Yueru, XU Jingming, WEI Yangjie, LI Xinhua

College of Computer Science and Engineering,Northeastern University,Shenyang 110169,China
Key Laboratory of Intelligent in Medical Image,Northeastern University,Shenyang 110169,China

Received:2023-10-19 Revised:2024-03-27 Online:2024-07-15 Published:2024-07-10
About author:YANG Shasha,born in 2000,postgra-duate.Her main research interests include reinforcement learning and dynamic treatment regime.
YU Yaxin,born in 1971,Ph.D,associate professor.Her main research interests include data mining and social network.
Supported by:
National Natural Science Foundation of China(62373084).

摘要/Abstract

摘要： 强化学习对数学模型依赖性低,利用经验便于架构和优化模型,非常适合用于动态治疗策略学习。但现有研究仍存在以下问题:1)学习策略最优性的同时未考虑风险,导致学到的策略存在一定的风险;2)忽略了分布偏移问题,导致学到的策略与医生策略完全不同;3)忽略患者的历史观测数据和治疗史,从而不能很好地得到患者状态,进而导致不能学到最优策略。基于此,提出了融合Dead-ends和离线监督Actor-Critic的动态治疗策略生成模型DOSAC-DTR。首先,考虑学到的策略所推荐的治疗行动的风险性,在Actor-Critic框架中融入Dead-ends概念;其次,为缓解分布偏移问题,在Actor-Critic框架中融入医生监督,在最大化预期回报的同时,最小化所学策略与医生策略之间的差距;最后,为了得到包含患者关键历史信息的状态表示,使用基于LSTM的编码器解码器模型对患者的历史观测数据和治疗史进行建模。实验结果表明,DOSAC-DTR相比基线方法有更好的性能,可以得到更低的估计死亡率以及更高的Jaccard系数。

关键词: 动态治疗策略, Dead-ends, Actor-Critic, 状态表征

Abstract: Reinforcement learning has low dependence on mathematical models,and it is easy to construct and optimize models by using experience,which is very suitable for dynamic treatment regime learning.However,existing studies still have the following problems:1)risk is not considered when learning strategy optimality,resulting in certain risks in the learned policy; 2)the problem of distribution deviation is ignored,resulting in learning policies completely different from the doctor's policy; 3)the patient's histo-rical observation data and treatment history are ignored,thus failing to obtain a good patient status and thus failing to learn the optimal policy.Based on this,DOSAC-DTR,a dynamic treatment regime generation model combining dead-ends and offline supervision actor-critic,is proposed.First,considering the risk of treatment actions recommended by the learned policies,the concept of dead-ends is integrated into the actor-critic framework.Secondly,in order to alleviate the problem of distribution offset,physician supervision is integrated into the actor-critic framework to minimize the gap between learned policies and doctors' policies while maximizing the expected return.Finally,in order to obtain a state representation that includes critical patient historical information,a LSTM-based encoder decoder model is used to model the patient's historical observation data and treatment history.Experiments show that DOSAC-DTR has better performance than the baseline approach,resulting in lower estimated mortality rates and higher Jaccard coefficients.

Key words: Dynamic treatment regime, Dead-ends, Actor-Critic, State representation

中图分类号:

TP399

杨莎莎, 于亚新, 王跃茹, 许晶铭, 魏阳杰, 李新华. 融合Dead-ends和离线监督Actor-Critic的动态治疗策略生成模型[J]. 计算机科学, 2024, 51(7): 80-88. https://doi.org/10.11896/jsjkx.231000138

YANG Shasha, YU Yaxin, WANG Yueru, XU Jingming, WEI Yangjie, LI Xinhua. Dynamic Treatment Regime Generation Model Combining Dead-ends and Offline SupervisionActor-Critic[J]. Computer Science, 2024, 51(7): 80-88. https://doi.org/10.11896/jsjkx.231000138

参考文献

[1]RIACHI E,MAMDANI M,FRALICK M,et al.Challenges for Reinforcement Learning in Healthcare[J].arXiv:2103.05612,2021.
[2]CORONATO A,NAEEM M,DE PIETRO G,et al.Reinforcement learning for intelligent healthcare applications:A survey[J].Artificial Intelligence in Medicine,2020,109:101964.
[3]YU C,LIU J,NEMATI S,et al.Reinforcement learning inhealthcare:A survey[J].ACM Computing Surveys(CSUR),2021,55(1):1-36.
[4]MATTHIEU K,LEO A C,OMAR B,et al.The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care[J].Nature Medicine,2018,24(11):1716.
[5]RAGHU A,KOMOROWSKI M,AHMED I,et al.Deep rein-forcement learning for sepsis treatment[J].arXiv:1711.09602,2017.
[6]WANG L,ZHANG W,HE X,et al.Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mi-ning.2018:2447-2456.
[7]KAUSHIK P,KUMMETHA S,MOODLEY P,et al.A conservative Q-learning approach for handling distribution shift in sepsis treatment strategies[J].arXiv:2203.13884,2022.
[8]FUJIMOTO S,GUS S.A minimalist approach to offline reinforcement learning[J].Advances in Neural Information Proces-sing Systems,2021,34:20132-20145.
[9]YIN C,LIU R,CATERINO J,et al.Deconfounding Actor-Critic Network with Policy Adaptation for Dynamic Treatment Regimes[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.2022:2316-2326.
[10]FATEMI M,KILLIAN T W,SUBRAMANIAN J,et al.Medical Dead-ends and Learning to Identify High-risk States and Treatments[C]//Advances in Neural Information Processing Systems 34.2021.
[11]TESAURO G.Programming backgammon using self-teachingneural nets[J].Artificial Intelligence,2002,134(1/2):181-199.
[12]SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[13]REDDY G,CELANI A,SEJNOWSKI T J,et al.Learning to soar in turbulent environments[J].Proceedings of the National Academy of Sciences,2016,113(33):E4877-E4884.
[14]JETER R,JOSEF C,SHASHIKUMARS,et al.Does the “Artificial Intelligence Clinician” learn optimal treatment strategies for sepsis in intensive care?[J].arXiv:1902.03271,2019.
[15]LIANG D,DENG H,LIU Y.The treatment of sepsis:an episo-dic memory-assisted deep reinforcement learning approach[J].Applied Intelligence,2023,53(9):11034-11044.
[16]YU C,REN G,DONG Y.Supervised-actor-critic reinforcementlearning for intelligent mechanical ventilation and sedative dosing in intensive care units[J].BMC Medical Informatics and Decision Making,2020,20(3):1-8.
[17]THOMASP S.Safe reinforcement learning[R].University ofMassachusetts Libraries,2015.
[18]THOMAS P S,CASTRO DA SILVA B,BARTO A G,et al.Preventing undesirable behavior of intelligent machines[J].Science,2019,366(6468):999-1004.
[19]LAROCHE R,TRICHELAIR P,DES COMBES R T.Safe policy improvement with baseline bootstrapping[C]//International Conference on Machine Learning.PMLR,2019:3652-3661.
[20]FATEMI M,SHARMA S,VAN SEIJEN H,et al.Dead-endsand secure exploration in reinforcement learning[C]//International Conference on Machine Learning.PMLR,2019:1873-1881.
[21]TAYLOR W K,HAORAN Z,JAYAKUMAR S,et al.An empirical study of representation learning for reinforcement lear-ning in healthcare[C]//Machine Learning for Health.PMLR,2020:139-160.
[22]FUJIMOTO S,HOOF H,MEGER D.Addressing function ap-proximation error in actor-critic methods[C]//International Conference on Machine Learning.PMLR,2018:1587-1596.
[23]JOHNSON A E W,POLLARD T J,SHEN L,et al.MIMIC-III,a freely accessible critical care database[J].Scientific Data,2016,3(1):1-9.
[24]NANAYAKKARA T,CLERMONT G,LANGMEAD C J,et al.Unifying cardiovascular modelling with deep reinforcement learning for uncertainty aware control of sepsis treatment[J].PLOS Digital Health,2022,1(2):e0000012.
[25]PEINE A,HALLAWA A,BICKENBACH J,et al.Development and validation of a reinforcement learning algorithm to dynamically optimize mechanical ventilation in critical care[J].NPJ Digital Medicine,2021,4(1):1-12.
[26]WENG W H,GAO M,HE Z,et al.Representation and rein-forcement learning for personalized glycemic control in septic patients[J].arXiv:1712.00654,2017.
[27]ZHANG Y,CHEN R,TANG J,et al.LEAP:learning to prescribe effective and safe treatment combinations for multimorbidity[C]//Proceedings of the 23rd ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining.2017:1315-1324.
[28]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[29]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[J].arXiv:1509.02971,2015.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

融合Dead-ends和离线监督Actor-Critic的动态治疗策略生成模型

Dynamic Treatment Regime Generation Model Combining Dead-ends and Offline SupervisionActor-Critic

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0