Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230600235-5.doi: 10.11896/jsjkx.230600235
• Artificial Intelligenc • Previous Articles Next Articles
ZHONG Yuang, YUAN Weiwei, GUAN Donghai
CLC Number:
[1]WIERING M,VAN OTTERLO M.Reinforcement Learning:State of the Art[M].New York:Springer,2012. [2]LI Y.Deep reinforcement learning:An overview[J].arXiv:1701.07274,2017. [3]KAISER L,BABAEIZADEH M,MILOS P,et al.Model Based Reinforcement Learning for Atari[C]//International Conference on Learning Representations.2019. [4]JOHANNINK T,BAHL S,NAIR A,et al.Residual reinforcement learning for robot control[C]//2019 International Confer-ence on Robotics and Automation(ICRA).IEEE,2019:6023-6029. [5]KIRAN B R,SOBH I,TALPAERTV,et al.Deep reinforcement learning for autonomous driving:A survey[J].IEEE Transactions on Intelligent Transportation Systems,2021,23(6):4909-4926. [6]WU X,CHEN H,WANG J,et al.Adaptive stock trading strategies with deep reinforcement learning methods[J].Information Sciences,2020,538:142-158. [7]WATKINS C J C H,DAYAN P.Q-learning[J].Machine lear-ning,1992,8:279-292. [8]LEE D,DEFOURNY B,POWELLW B.Bias-corrected q-lear-ning to control max-operator bias in q-learning[C]//2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning(ADPRL).IEEE,2013:93-99. [9]AZAR M G,MUNOS R,GHAVAMZADEH M,et al.Speedy Q-learning[C]//Advances in neural information processing systems.2011:2411-2419. [10]HASSELT H.Double Q-learning[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems.2010:2613-2621. [11]D’ERAMO C,RESTELLI M,NUARA A.Estimating maxi-mum expected value through gaussian approximation[C]//International Conference on Machine Learning.PMLR,2016:1032-1040. [12]ZHANG Z,PAN Z,KOCHENDERFERM J.Weighted doubleQ-learning[C]//IJCAI.2017:3455-3461. [13]REN Z,ZHU G,HU H,et al.On theEstimation Bias in Double Q-Learning[J].Advances in Neural Information Processing Systems,2021,34:10246-10259. [14]WANG Y,LIU Y,CHENW,et al.Target transfer Q-learning and its convergence analysis[J].Neurocomputing,2020,392:11-22. [15]SUTTON R S,BARTOA G.Reinforcement learning:An introduction[M].MIT press,2018. |
[1] | GAO Yuzhao, NIE Yiming. Survey of Multi-agent Deep Reinforcement Learning Based on Value Function Factorization [J]. Computer Science, 2024, 51(6A): 230300170-9. |
[2] | XU Haitao, CHENG Haiyan, TONG Mingwen. Study on Genetic Algorithm of Course Scheduling Based on Deep Reinforcement Learning [J]. Computer Science, 2024, 51(6A): 230600062-8. |
[3] | LI Danyang, WU Liangji, LIU Hui, JIANG Jingqing. Deep Reinforcement Learning Based Thermal Awareness Energy Consumption OptimizationMethod for Data Centers [J]. Computer Science, 2024, 51(6A): 230500109-8. |
[4] | WANG Shuanqi, ZHAO Jianxin, LIU Chi, WU Wei, LIU Zhao. Fuzz Testing Method of Binary Code Based on Deep Reinforcement Learning [J]. Computer Science, 2024, 51(6A): 230800078-7. |
[5] | HUANG Feihu, LI Peidong, PENG Jian, DONG Shilei, ZHAO Honglei, SONG Weiping, LI Qiang. Multi-agent Based Bidding Strategy Model Considering Wind Power [J]. Computer Science, 2024, 51(6A): 230600179-8. |
[6] | XIN Yuanxia, HUA Daoyang, ZHANG Li. Multi-agent Reinforcement Learning Algorithm Based on AI Planning [J]. Computer Science, 2024, 51(5): 179-192. |
[7] | YANG Xiuwen, CUI Yunhe, QIAN Qing, GUO Chun, SHEN Guowei. COURIER:Edge Computing Task Scheduling and Offloading Method Based on Non-preemptivePriorities Queuing and Prioritized Experience Replay DRL [J]. Computer Science, 2024, 51(5): 293-305. |
[8] | SHI Dianxi, HU Haomeng, SONG Linna, YANG Huanhuan, OUYANG Qianying, TAN Jiefu , CHEN Ying. Multi-agent Reinforcement Learning Method Based on Observation Reconstruction [J]. Computer Science, 2024, 51(4): 280-290. |
[9] | ZHAO Miao, XIE Liang, LIN Wenjing, XU Haijiao. Deep Reinforcement Learning Portfolio Model Based on Dynamic Selectors [J]. Computer Science, 2024, 51(4): 344-352. |
[10] | WANG Yao, LUO Junren, ZHOU Yanzhong, GU Xueqiang, ZHANG Wanpeng. Review of Reinforcement Learning and Evolutionary Computation Methods for StrategyExploration [J]. Computer Science, 2024, 51(3): 183-197. |
[11] | WANG Yan, WANG Tianjing, SHEN Hang, BAI Guangwei. Optimal Penetration Path Generation Based on Maximum Entropy Reinforcement Learning [J]. Computer Science, 2024, 51(3): 360-367. |
[12] | LI Junwei, LIU Quan, XU Yapeng. Option-Critic Algorithm Based on Mutual Information Optimization [J]. Computer Science, 2024, 51(2): 252-258. |
[13] | SHI Dianxi, PENG Yingxuan, YANG Huanhuan, OUYANG Qianying, ZHANG Yuhui, HAO Feng. DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Learning [J]. Computer Science, 2024, 51(2): 268-277. |
[14] | WANG Yangmin, HU Chengyu, YAN Xuesong, ZENG Deze. Study on Deep Reinforcement Learning for Energy-aware Virtual Machine Scheduling [J]. Computer Science, 2024, 51(2): 293-299. |
[15] | ZHAO Xiaoyan, ZHAO Bin, ZHANG Junna, YUAN Peiyan. Study on Cache-oriented Dynamic Collaborative Task Migration Technology [J]. Computer Science, 2024, 51(2): 300-310. |
|