Computer Science ›› 2024, Vol. 51 ›› Issue (11): 81-94.doi: 10.11896/jsjkx.231000170
• Database & Big Data & Data Science • Previous Articles Next Articles
YANG Haolin1, LIU Quan1,2
CLC Number:
[1] SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].MIT Press,2018. [2] GOVINDARAJAN L N,LIU R G,LINSLEY D,et al.Diagnosing and exploiting the computational demands of videos games for deep reinforcement learning[J].arXiv:2309.13181,2023. [3] WU Q,SUN N,YANG T,et al.Deep Reinforcement Learning-Based Control for Asynchronous Motor-Actuated Triple Pendulum Crane Systems With Distributed Mass Payloads[J].IEEE Transactions on Industrial Electronics,2023,71(2):1853-1862. [4] ZHOU X,WU L,ZHANG Y,et al.A robust deep reinforcement learning approach to driverless taxi dispatching under uncertain demand[J].Information Sciences,2023,646:119401. [5] CHAI D,WU W,HAN Q,et al.Description Based Text Classification with Reinforcement Learning[C]//International Conference on Machine Learning.PMLR,2020:1371-1382. [6] LI S,HU C,KE S,et al.LS-MolGen:Ligand-and-StructureDual-Driven Deep Reinforcement Learning for Target-Specific Molecular Generation Improves Binding Affinity and Novelty[J].Journal of Chemical Information and Modeling,2023,63(13):4207-4215. [7] LEVINE S,KUMAR A,TUCKER G,et al.Offline Reinforce-ment Learning:Tutorial,Review,and Perspectives on Open Problems[J].arXiv:2005.01643,2020. [8] LIU Q,ZHAI J W,ZHANG Z Z,et al.A survey on deep reinforcement learning[J].Chinese Journal of Computers,2018,41(1):1-27. [9] SCHWEIGHOFER K,DINU M,RADLER A,et al.A Dataset Perspective on Offline Reinforcement Learning[C]//Conference on Lifelong Learning Agents.PMLR,2022:470-517. [10] FUJIMOTO S,MEGER D,PRECUP D.Off-Policy Deep Rein-forcement Learning without Exploration[C]//International Conference on Machine Learning.PMLR,2019:2052-2062. [11] KUMAR A,FU J,TUCKER G,et al.Stabilizing off-policy Q-learning via bootstrapping error reduction[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:11784-11794. [12] KUMAR A,ZHOU A,TUCKER G,et al.Conservative Q-Learning for Offline Reinforcement Learning[C]//Advances in Neural Information Processing Systems.2020:1179-1191. [13] FUJIMOTO S,GU S.A Minimalist Approach to Offline Reinforcement Learning[C]//Advances in Neural Information Processing Systems.2021:20132-20145. [14] NAIR A,GUPTA A,DALAL M,et al.Awac:Accelerating online reinforcement learning with offline datasets[J].arXiv:2006.09359,2020. [15] LUO Y,WANG Y,DONG K,et al.Relay hindsight experience replay:Continual reinforcement learning for robot manipulation tasks with sparse rewards[J].arXiv:2208.00843,2022. [16] LI J,YU T,ZHANG X,et al.Efficient experience replay based deep deterministic policy gradient for AGC dispatch in integra-ted energy system[J].Applied Energy,2021,285:116386. [17] GAI S,WANG D,HE L.Offline Experience Replay for Conti-nual Offline Reinforcement Learning[J].arXiv:2305.13804,2023. [18] WANG C,WU Y,VUONG Q,et al.Striving for simplicity and performance in off-policy DRL:Output normalization and non-uniform sampling[C]//International Conference on Machine Learning.PMLR,2020:10070-10080. [19] SHI S M,LIU Q.Deep deterministic policy gradient with classified experience replay[J].Automatica Sinica,2022,48(7):1816-1823. [20] BARTO A G,SUTTON R S,ANDERSON C W.Neuronlikeadaptive elements that can solve difficult learning control pro-blems[J].IEEE Transactionson Systems,Man,And Cybernetics,1983(5):834-846. [21] RUSU A A,COLMENAREJO S G,GULCEHRE C,et al.Policy distillation[J].arXiv:1511.06295,2015. [22] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [23] KONDA V R,TSITSIKLIS J N.Actor-citic agorithms[C]//Proceedings of the 12th International Conference on Neural Information Processing Systems.1999:1008-1014. [24] CHEN D,ZHANG Q.Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning[J].arXiv:2306.01920,2023. [25] LAI K H,ZHA D,LI Y,et al.Dual policy distillation[J].arXiv:2006.04061,2020. [26] HONG Z W,NAGARAJAN P,MAEDA G.Periodic intra-en-semble knowledge distillation for reinforcement learning[C]//Machine Learning and Knowledge Discovery in Databases.Research Track:European Conference,ECML PKDD 2021,Bilbao,Spain,September 13-17,2021,Proceedings,Part I 21.Springer International Publishing,2021:87-103. [27] FEDUS W,RAMACHANDRAN P,AGARWAL R,et al.Revisiting fundamentals of experiencereplay[C]//International Conference on Machine Learning.PMLR,2020:3061-3071. [28] ZHENG G,ZHOU S,BRAVERMAN V,et al.Selective expe-rience replay compression using coresets for lifelong deep reinforcement learning in medical imaging[J].arXiv:2302.11510,2023. [29] PACKER C,ABBEEL P,GONZALEZ J E.Hindsight task relabelling:Experience replay for sparse reward meta-rl[J].Advances in Neural Information Processing Systems,2021,34:2466-2477. [30] LI J,TANG C,TOMIZUKA M,et al.Hierarchical planningthrough goal-conditioned offline reinforcement learning[J].IEEE Robotics and Automation Letters,2022,7(4):10216-10223. [31] CHEN X,GHADIRZADEH A,YU T,et al.Latent-variable advantage-weighted policy optimization for offline rl[J].arXiv:2203.08949,2022. [32] FU J,KUMAR A,NACHUM O,et al.D4rl:Datasets for deep data-driven reinforcement learning[J].arXiv:2004.07219,2020. |
[1] | WANG Tianjiu, LIU Quan, WU Lan. Offline Reinforcement Learning Algorithm for Conservative Q-learning Based on Uncertainty Weight [J]. Computer Science, 2024, 51(9): 265-272. |
[2] | ZHOU Wenhui, PENG Qinghua, XIE Lei. Study on Adaptive Cloud-Edge Collaborative Scheduling Methods for Multi-object State Perception [J]. Computer Science, 2024, 51(9): 319-330. |
[3] | GAO Yuzhao, NIE Yiming. Survey of Multi-agent Deep Reinforcement Learning Based on Value Function Factorization [J]. Computer Science, 2024, 51(6A): 230300170-9. |
[4] | WANG Shuanqi, ZHAO Jianxin, LIU Chi, WU Wei, LIU Zhao. Fuzz Testing Method of Binary Code Based on Deep Reinforcement Learning [J]. Computer Science, 2024, 51(6A): 230800078-7. |
[5] | LI Danyang, WU Liangji, LIU Hui, JIANG Jingqing. Deep Reinforcement Learning Based Thermal Awareness Energy Consumption OptimizationMethod for Data Centers [J]. Computer Science, 2024, 51(6A): 230500109-8. |
[6] | YANG Xiuwen, CUI Yunhe, QIAN Qing, GUO Chun, SHEN Guowei. COURIER:Edge Computing Task Scheduling and Offloading Method Based on Non-preemptivePriorities Queuing and Prioritized Experience Replay DRL [J]. Computer Science, 2024, 51(5): 293-305. |
[7] | LI Junwei, LIU Quan, XU Yapeng. Option-Critic Algorithm Based on Mutual Information Optimization [J]. Computer Science, 2024, 51(2): 252-258. |
[8] | SHI Dianxi, PENG Yingxuan, YANG Huanhuan, OUYANG Qianying, ZHANG Yuhui, HAO Feng. DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Learning [J]. Computer Science, 2024, 51(2): 268-277. |
[9] | ZHAO Xiaoyan, ZHAO Bin, ZHANG Junna, YUAN Peiyan. Study on Cache-oriented Dynamic Collaborative Task Migration Technology [J]. Computer Science, 2024, 51(2): 300-310. |
[10] | AN Yang, WANG Xiuqing, ZHAO Minghua. Mobile Robots' Path Planning Method Based on Policy Fusion and Spiking Deep ReinforcementLearning [J]. Computer Science, 2024, 51(11A): 240100211-11. |
[11] | TANG Jianing, LI Chengyang, ZHOU Sida, MA Mengxing, SHI Yang. Autonomous Exploration Methods for Unmanned Aerial Vehicles Based on Deep ReinforcementLearning [J]. Computer Science, 2024, 51(11A): 231100139-6. |
[12] | LU Yue, WANG Qiong, LIU Shun, LI Qingtao, LIU Yang, WANG Hongbiao. Reinforcement Learning Algorithm for Charging/Discharging Control of Electric Vehicles Considering Battery Loss [J]. Computer Science, 2024, 51(11A): 231200147-7. |
[13] | CHEN Juan, WANG Yang, WU Zongling, CHEN Peng, ZHANG Fengchun , HAO Junfeng. Cloud-Edge Collaborative Task Transfer and Resource Reallocation Optimization Based on Deep Reinforcement Learning [J]. Computer Science, 2024, 51(11A): 231100170-10. |
[14] | ZHAO Weidong, LU Ming, ZHANG Rui. Study on Road Crack Detection Based on Weakly Supervised Semantic Segmentation [J]. Computer Science, 2024, 51(11): 148-156. |
[15] | LIU Xingguang, ZHOU Li, ZHANG Xiaoying, CHEN Haitao, ZHAO Haitao, WEI Jibo. Edge Intelligent Sensing Based UAV Space Trajectory Planning Method [J]. Computer Science, 2023, 50(9): 311-317. |
|