融合Dead-ends和离线监督Actor-Critic的动态治疗策略生成模型

doi:10.11896/jsjkx.231000138

Abstract

Abstract: Reinforcement learning has low dependence on mathematical models,and it is easy to construct and optimize models by using experience,which is very suitable for dynamic treatment regime learning.However,existing studies still have the following problems:1)risk is not considered when learning strategy optimality,resulting in certain risks in the learned policy; 2)the problem of distribution deviation is ignored,resulting in learning policies completely different from the doctor's policy; 3)the patient's histo-rical observation data and treatment history are ignored,thus failing to obtain a good patient status and thus failing to learn the optimal policy.Based on this,DOSAC-DTR,a dynamic treatment regime generation model combining dead-ends and offline supervision actor-critic,is proposed.First,considering the risk of treatment actions recommended by the learned policies,the concept of dead-ends is integrated into the actor-critic framework.Secondly,in order to alleviate the problem of distribution offset,physician supervision is integrated into the actor-critic framework to minimize the gap between learned policies and doctors' policies while maximizing the expected return.Finally,in order to obtain a state representation that includes critical patient historical information,a LSTM-based encoder decoder model is used to model the patient's historical observation data and treatment history.Experiments show that DOSAC-DTR has better performance than the baseline approach,resulting in lower estimated mortality rates and higher Jaccard coefficients.

Key words: Dynamic treatment regime, Dead-ends, Actor-Critic, State representation

CLC Number:

TP399

YANG Shasha, YU Yaxin, WANG Yueru, XU Jingming, WEI Yangjie, LI Xinhua. Dynamic Treatment Regime Generation Model Combining Dead-ends and Offline SupervisionActor-Critic[J].Computer Science, 2024, 51(7): 80-88.

References

[1]RIACHI E,MAMDANI M,FRALICK M,et al.Challenges for Reinforcement Learning in Healthcare[J].arXiv:2103.05612,2021.
[2]CORONATO A,NAEEM M,DE PIETRO G,et al.Reinforcement learning for intelligent healthcare applications:A survey[J].Artificial Intelligence in Medicine,2020,109:101964.
[3]YU C,LIU J,NEMATI S,et al.Reinforcement learning inhealthcare:A survey[J].ACM Computing Surveys(CSUR),2021,55(1):1-36.
[4]MATTHIEU K,LEO A C,OMAR B,et al.The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care[J].Nature Medicine,2018,24(11):1716.
[5]RAGHU A,KOMOROWSKI M,AHMED I,et al.Deep rein-forcement learning for sepsis treatment[J].arXiv:1711.09602,2017.
[6]WANG L,ZHANG W,HE X,et al.Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mi-ning.2018:2447-2456.
[7]KAUSHIK P,KUMMETHA S,MOODLEY P,et al.A conservative Q-learning approach for handling distribution shift in sepsis treatment strategies[J].arXiv:2203.13884,2022.
[8]FUJIMOTO S,GUS S.A minimalist approach to offline reinforcement learning[J].Advances in Neural Information Proces-sing Systems,2021,34:20132-20145.
[9]YIN C,LIU R,CATERINO J,et al.Deconfounding Actor-Critic Network with Policy Adaptation for Dynamic Treatment Regimes[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.2022:2316-2326.
[10]FATEMI M,KILLIAN T W,SUBRAMANIAN J,et al.Medical Dead-ends and Learning to Identify High-risk States and Treatments[C]//Advances in Neural Information Processing Systems 34.2021.
[11]TESAURO G.Programming backgammon using self-teachingneural nets[J].Artificial Intelligence,2002,134(1/2):181-199.
[12]SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[13]REDDY G,CELANI A,SEJNOWSKI T J,et al.Learning to soar in turbulent environments[J].Proceedings of the National Academy of Sciences,2016,113(33):E4877-E4884.
[14]JETER R,JOSEF C,SHASHIKUMARS,et al.Does the “Artificial Intelligence Clinician” learn optimal treatment strategies for sepsis in intensive care?[J].arXiv:1902.03271,2019.
[15]LIANG D,DENG H,LIU Y.The treatment of sepsis:an episo-dic memory-assisted deep reinforcement learning approach[J].Applied Intelligence,2023,53(9):11034-11044.
[16]YU C,REN G,DONG Y.Supervised-actor-critic reinforcementlearning for intelligent mechanical ventilation and sedative dosing in intensive care units[J].BMC Medical Informatics and Decision Making,2020,20(3):1-8.
[17]THOMASP S.Safe reinforcement learning[R].University ofMassachusetts Libraries,2015.
[18]THOMAS P S,CASTRO DA SILVA B,BARTO A G,et al.Preventing undesirable behavior of intelligent machines[J].Science,2019,366(6468):999-1004.
[19]LAROCHE R,TRICHELAIR P,DES COMBES R T.Safe policy improvement with baseline bootstrapping[C]//International Conference on Machine Learning.PMLR,2019:3652-3661.
[20]FATEMI M,SHARMA S,VAN SEIJEN H,et al.Dead-endsand secure exploration in reinforcement learning[C]//International Conference on Machine Learning.PMLR,2019:1873-1881.
[21]TAYLOR W K,HAORAN Z,JAYAKUMAR S,et al.An empirical study of representation learning for reinforcement lear-ning in healthcare[C]//Machine Learning for Health.PMLR,2020:139-160.
[22]FUJIMOTO S,HOOF H,MEGER D.Addressing function ap-proximation error in actor-critic methods[C]//International Conference on Machine Learning.PMLR,2018:1587-1596.
[23]JOHNSON A E W,POLLARD T J,SHEN L,et al.MIMIC-III,a freely accessible critical care database[J].Scientific Data,2016,3(1):1-9.
[24]NANAYAKKARA T,CLERMONT G,LANGMEAD C J,et al.Unifying cardiovascular modelling with deep reinforcement learning for uncertainty aware control of sepsis treatment[J].PLOS Digital Health,2022,1(2):e0000012.
[25]PEINE A,HALLAWA A,BICKENBACH J,et al.Development and validation of a reinforcement learning algorithm to dynamically optimize mechanical ventilation in critical care[J].NPJ Digital Medicine,2021,4(1):1-12.
[26]WENG W H,GAO M,HE Z,et al.Representation and rein-forcement learning for personalized glycemic control in septic patients[J].arXiv:1712.00654,2017.
[27]ZHANG Y,CHEN R,TANG J,et al.LEAP:learning to prescribe effective and safe treatment combinations for multimorbidity[C]//Proceedings of the 23rd ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining.2017:1315-1324.
[28]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[29]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[J].arXiv:1509.02971,2015.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Dynamic Treatment Regime Generation Model Combining Dead-ends and Offline SupervisionActor-Critic

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 7

Metrics

Comments

Recommended 0

[1]	ZHANG Qiyang, CHEN Xiliang, ZHANG Qiao. Sparse Reward Exploration Method Based on Trajectory Perception [J]. Computer Science, 2023, 50(1): 262-269.
[2]	LI Xiaoling, WU Haotian, ZHOU Tao, LU Hui. Password Guessing Model Based on Reinforcement Learning [J]. Computer Science, 2023, 50(1): 334-341.
[3]	ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185.
[4]	DAI Shan-shan, LIU Quan. Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method [J]. Computer Science, 2021, 48(9): 235-243.
[5]	LI Li,ZHENG Jia-li,WANG Zhe,YUAN Yuan,SHI Jing. RFID Indoor Positioning Algorithm Based on Asynchronous Advantage Actor-Critic [J]. Computer Science, 2020, 47(2): 233-238.
[6]	LI Jie, LING Xing-hong, FU Yu-chen, LIU Quan. Asynchronous Advantage Actor-Critic Algorithm with Visual Attention Mechanism [J]. Computer Science, 2019, 46(5): 169-174.
[7]	JIN Yu-jing,ZHU Wen-wen,FU Yu-chen and LIU Quan. Actor-Critic Algorithm Based on Tile Coding and Model Learning [J]. Computer Science, 2014, 41(6): 239-242.