Computer Science ›› 2024, Vol. 51 ›› Issue (9): 265-272.doi: 10.11896/jsjkx.230700151
• Artificial Intelligence • Previous Articles Next Articles
WANG Tianjiu1, LIU Quan1,2, WU Lan1
CLC Number:
[1]LIU Q,ZHAI J W,ZHANG Z Z,et al.A survey on deep rein-forcement learning [J].Chinese Journal of Computers,2018,41(1):1-27. [2]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning [J].Nature,2015,518(7540):529-533. [3]LEVINE S,KUMAR A,TUCKER G,et al.Offline reinforcement learning:Tutorial,review,and perspectives on open pro-blems [J].arXiv:2005.01643,2020. [4]FUJIMOTO S,MEGER D,PRECUP D.Off-policy deep rein-forcement learning without explora-tion[C]//International Conference on Machine Learning.PMLR,2019:2052-2062. [5]KINGMA D P,WELLING M.Auto-Encoding Variational Bayes [J].arXiv:1312.6114,2014. [6]KUMAR A,FU J,SOH M,et al.Stabilizing off-policy q-lear-ning via bootstrapping error reduction [J].arXiv:1906.00949,2019. [7]FUJIMOTO S,HOOF H,MEGER D.Addressing function approximation error in actor-critic methods[C]//International Conference on Machine Learning.PMLR,2018:1587-1596. [8]FUJIMOTO S,GU S S.A minimalist approach to offline rein-forcement learning [J].Advances in Neural Information Processing Systems,2021,34:20132-20145. [9]KUMAR A,ZHOU A,TUCKER G,et al.Conservative Q-learning for offline reinforcement learning [J].Advances in Neural Information Processing Systems,2020,33:1179-1191. [10]LYU J,MA X,LI X,et al.Mildly conservative Q-learning foroffline reinforcement learning [J].Advances in Neural Information Processing Systems,2022,35:1711-1724. [11]AGARWAL R,SCHUURMANS D,NOROUZI M.An optimistic perspective on offline reinforcement learning[C]//Procee-dings of the 37th International Conference on Machine Lear-ning.2020:104-114. [12]OSBAND I,BLUNDELL C,PRITZEL A,et al.Deep explora-tion via bootstrapped DQN [C]//Proceedings of the 30th International Conference on Neural Information Processing Sys-tems.2016:4033-4041. [13]WU Y,ZHAI S,SRIVASTAVA N,et al.Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning[C]//International Conference on Machine Learning.PMLR,2021:11319-11328. [14]KIDAMBI R,RAJESWARAN A,NETRA-PALLI P,et al.Morel:Model-based offline rein-forcement learning [J].Advances in Neural Information Processing Systems,2020,33:21810-21823. [15]YU T,KUMAR A,RAFAILOV R,et al.Combo:Conservative offline model-based policy optimization [J].Advances in Neural Information Processing Systems,2021,34:28954-28967. [16]SUTTON R S,BARTO A G.Reinforcement learning:An introduction [M].MIT press,2018. [17]WU Y,TUCKER G,NACHUM O.Behavior regularized offline reinforcement learning [J].arXiv:1911.11361,2019. [18]GAL Y,GHAHRAMANI Z.Dropout as a Bayesian approximation:Representing model uncertainty in deep learning[C]//International Conference on Machine Learning.PMLR,2016:1050-1059. [19]LAKSHMINARAYANAN B,PRITZEL A,BLUNDELL C.Simple and scalable predictive uncertainty estimation using deep ensembles [C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6405-6416. [20]FU J,KUMAR A,NACHUM O,et al.D4rl:Datasets for deep data-driven reinforcement learning [J].arXiv:2004.07219,2020. [21]BROCKMAN G,CHEUNG V,PETTERSSON L,et al.Openai gym[J].arXiv:1606.01540,2016. |
[1] | YAN Xin, HUANG Zhiqiu, SHI Fan, XU Heng. Study on Following Car Model with Different Driving Styles Based on Proximal PolicyOptimization Algorithm [J]. Computer Science, 2024, 51(9): 223-232. |
[2] | ZHOU Wenhui, PENG Qinghua, XIE Lei. Study on Adaptive Cloud-Edge Collaborative Scheduling Methods for Multi-object State Perception [J]. Computer Science, 2024, 51(9): 319-330. |
[3] | LI Jingwen, YE Qi, RUAN Tong, LIN Yupian, XUE Wandong. Semi-supervised Text Style Transfer Method Based on Multi-reward Reinforcement Learning [J]. Computer Science, 2024, 51(8): 263-271. |
[4] | WANG Xianwei, FENG Xiang, YU Huiqun. Multi-agent Cooperative Algorithm for Obstacle Clearance Based on Deep Deterministic PolicyGradient and Attention Critic [J]. Computer Science, 2024, 51(7): 319-326. |
[5] | WANG Shuanqi, ZHAO Jianxin, LIU Chi, WU Wei, LIU Zhao. Fuzz Testing Method of Binary Code Based on Deep Reinforcement Learning [J]. Computer Science, 2024, 51(6A): 230800078-7. |
[6] | LI Liying, ZHOU Jun, WANG Min. Supply Chain Decisions Considering Supplier Loss Aversion and Financial Constraints [J]. Computer Science, 2024, 51(6A): 230800134-7. |
[7] | HUANG Feihu, LI Peidong, PENG Jian, DONG Shilei, ZHAO Honglei, SONG Weiping, LI Qiang. Multi-agent Based Bidding Strategy Model Considering Wind Power [J]. Computer Science, 2024, 51(6A): 230600179-8. |
[8] | GAO Yuzhao, NIE Yiming. Survey of Multi-agent Deep Reinforcement Learning Based on Value Function Factorization [J]. Computer Science, 2024, 51(6A): 230300170-9. |
[9] | ZHONG Yuang, YUAN Weiwei, GUAN Donghai. Weighted Double Q-Learning Algorithm Based on Softmax [J]. Computer Science, 2024, 51(6A): 230600235-5. |
[10] | LI Danyang, WU Liangji, LIU Hui, JIANG Jingqing. Deep Reinforcement Learning Based Thermal Awareness Energy Consumption OptimizationMethod for Data Centers [J]. Computer Science, 2024, 51(6A): 230500109-8. |
[11] | ZHAO Tong, SHA Chaofeng. Revisiting Test Sample Selection for CNN Under Model Calibration [J]. Computer Science, 2024, 51(6): 34-43. |
[12] | YANG Xiuwen, CUI Yunhe, QIAN Qing, GUO Chun, SHEN Guowei. COURIER:Edge Computing Task Scheduling and Offloading Method Based on Non-preemptivePriorities Queuing and Prioritized Experience Replay DRL [J]. Computer Science, 2024, 51(5): 293-305. |
[13] | XIN Yuanxia, HUA Daoyang, ZHANG Li. Multi-agent Reinforcement Learning Algorithm Based on AI Planning [J]. Computer Science, 2024, 51(5): 179-192. |
[14] | ZHAO Miao, XIE Liang, LIN Wenjing, XU Haijiao. Deep Reinforcement Learning Portfolio Model Based on Dynamic Selectors [J]. Computer Science, 2024, 51(4): 344-352. |
[15] | SHI Dianxi, HU Haomeng, SONG Linna, YANG Huanhuan, OUYANG Qianying, TAN Jiefu , CHEN Ying. Multi-agent Reinforcement Learning Method Based on Observation Reconstruction [J]. Computer Science, 2024, 51(4): 280-290. |
|