Computer Science ›› 2023, Vol. 50 ›› Issue (1): 262-269.doi: 10.11896/jsjkx.220700010
• Artificial Intelligence • Previous Articles Next Articles
ZHANG Qiyang, CHEN Xiliang, ZHANG Qiao
CLC Number:
[1]SUTTON R S,BARTO A G.Reinforcement learning: An introduction[M].MIT Press,2018. [2]SILVER D,SINGH S,PRECUP D,et al.Reward is enough[J].Artificial Intelligence,2021,299: 103535. [3]CHENTANEZ N,BARTO A,SINGH S.Intrinsically motivated reinforcement learning[C]// Proceedings of the 17th International Conference on Neural Information Processing Systems.2004:1281-1288. [4]ZHU Z,LIN K,ZHOU J.Transfer Learning in Deep Reinforcement Learning: A Survey[J].arXiv:2009.07888,2020. [5]PATHAK D,AGRAWAL P,EFROS A A,et al.Curiosity-dri-ven exploration by self-supervised prediction[C]//International Conference on Machine Learning.PMLR,2017: 2778-2787. [6]TAO Y,GENC S,CHUNG J,et al.Repaint: Knowledge transfer in deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2021:10141-10152. [7]WATKINS C J C H,DAYAN P.Q-learning[J].Machine Lear-ning,1992,8(3):279-292. [8]RUMMERY G A,NIRANJAN M.On-line Q-learning usingconnectionist systems[M].Cambridge,UK:University of Cambridge,1994. [9]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [10]MARBACH P,TSITSIKLIS J N.Simulation-based optimization of Markov reward processes[J].IEEE Transactions on Automatic Control,2001,46(2):191-209. [11]SUTTON R S,MCALLESTER D,SINGH S,et al.Policy gra-dient methods for reinforcement learning with function approximation[J].Advances in Neural Information Processing Systems(NIPS 1999),2000,12:1057-1063. [12]KONDA V R,TSITSIKLIS J N.Actorcitic agorithms[C]//Proceedings of the 12th International Conference on Neural Information Processing Systems.1999:1008-1014. [13]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region po-licy optimization[C]//International Conference on Machine Learning.PMLR,2015:1889-1897. [14]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithms[J].arXiv:1707.06347,2017. [15]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[C]//ICLR(Poster).2016. [16]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1928-1937. [17]BURDA Y,EDWARDS H,STORKEY A,et al.Exploration by random network distillation[C]//Seventh International Confe-rence on Learning Representations.2019:1-17. [18]BELLEMARE M G,SRINIVASAN S,OSTROVSKI G,et al.Unifying count-based exploration and intrinsic motivation[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:1479-1487. [19]MACHADO M C,BELLEMARE M G,BOWLING M.Count-based exploration with the successor representation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:5125-5133. [20]HOUTHOOFT R,CHEN X,DUAN Y,et al.VIME:variationalinformation maximizing exploration[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:1117-1125. [21]MOHAMED S,REZENDE D J.Variational information maxi-misation for intrinsically motivated reinforcement learning[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2.2015:2125-2133. [22]SERT E,BAR-YAM Y,MORALES A J.Segregation dynamics with reinforcement learning and agent based modeling[J].Scientific Reports,2020,10(1):1-12. [23]SCHULMAN J,MORITZ P,LEVINE S,et al.High-dimensionalcontinuous control using generalized advantage estimation[J].arXiv:1506.02438,2015. |
[1] | HUANG Yuzhou, WANG Lisong, QIN Xiaolin. Bi-level Path Planning Method for Unmanned Vehicle Based on Deep Reinforcement Learning [J]. Computer Science, 2023, 50(1): 194-204. |
[2] | XU Ping'an, LIU Quan. Deep Reinforcement Learning Based on Similarity Constrained Dual Policy Distillation [J]. Computer Science, 2023, 50(1): 253-261. |
[3] | WEI Nan, WEI Xianglin, FAN Jianhua, XUE Yu, HU Yongyang. Backdoor Attack Against Deep Reinforcement Learning-based Spectrum Access Model [J]. Computer Science, 2023, 50(1): 351-361. |
[4] | YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253. |
[5] | TANG Feng, FENG Xiang, YU Hui-qun. Multi-task Cooperative Optimization Algorithm Based on Adaptive Knowledge Transfer andResource Allocation [J]. Computer Science, 2022, 49(7): 254-262. |
[6] | LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279. |
[7] | XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11. |
[8] | HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157. |
[9] | LI Ye, CHEN Song-can. Physics-informed Neural Networks:Recent Advances and Prospects [J]. Computer Science, 2022, 49(4): 254-262. |
[10] | LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268. |
[11] | OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51. |
[12] | LI Sun, CAO Feng, LIU Zi-shan. Study on Quality Evaluation Method of Speech Datasets for Algorithm Model [J]. Computer Science, 2022, 49(11A): 210800246-6. |
[13] | CAI Yue, WANG En-liang, SUN Zhe, SUN Zhi-xin. Study on Dual Sequence Decision-making for Trucks and Cargo Matching Based on Dual Pointer Network [J]. Computer Science, 2022, 49(11A): 210800257-9. |
[14] | ZHAO Hong, CHANG You-kang, WANG Wei-jie. Survey of Adversarial Attacks and Defense Methods for Deep Neural Networks [J]. Computer Science, 2022, 49(11A): 210900163-11. |
[15] | WANG Lu, WEN Wu-song. Study on Distributed Intrusion Detection System Based on Artificial Intelligence [J]. Computer Science, 2022, 49(10): 353-357. |
|