Computer Science ›› 2022, Vol. 49 ›› Issue (1): 298-305.doi: 10.11896/jsjkx.201100194
• Artificial Intelligence • Previous Articles Next Articles
AO Tian-yu1, LIU Quan1,2,3,4
CLC Number:
[1]SUTTON R S,BARTO A G.Reinforcement Learning:An In-troduction[M].Cambridge,USA:MIT Press,2018:1-4. [2]FU W B,SUN T,LIANG J,et al.Review of Principle and Application of Deep Learning[J].Computer Science,2018,45(6A):11-15,40. [3]LIU Q,ZHAI J W,ZHANG Z C.et al.A Survey on Deep Reinforcement Learning[J].Chinese Journal of Computers,2018,41(1):1-27. [4]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the Game of Go Without Human Knowledge[J].Nature,2017,550 (7676):354-359. [5]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[C]//Proceedings of the Workshops at the 27th Neural Information Proceeding Systems.Lake Tahoe,USA:MIT Press,2013:201-220. [6]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level Control through Deep Reinforcement Learning[J].Nature,2015,518(7540):529-533. [7]XU J N,ZENG J.Dynamic Target Following Based on Reinforcement Learning of Robotcar[J].Computer Science,2019,46(11A):94-97. [8]ZHANG H,GUAN X J,BAI G W.Optimization of MobileCharging Path of Wireless Rechargeable Sensor Networks Based on Reinforcement Learning[J].Computer Science,2020,47(11):316-321. [9]MICHAEL G,SCOTT S,CHI-GUHN L.Epsilon-BMC:ABayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning[C]//Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence.Tel Aviv,Israel:AUAI Press,2019:476-485. [10]ZHOU Z H.Machine Learning[M].Beijing:Tsinghua University Press,2016:373-377. [11]AUER P,CESA-BIANCHI N,FISCHER P.Finite-time Analysis of the Multiarmed Bandit Problem[J].Machine Learning,2002,47(2-3):235-256. [12]ZHANG X F,ZHOU Q,LIANG B,et al.An Adaptive Algorithm in Multi-Armed Bandit Problem[J].Journal of Computer Research and Development,2019,56(3):643-654. [13]HAO B,ABBASI-YADKORI Y,WEN Z,et al.BootstrappingUpper Confidence Bound[J].Advances in Neural Information Processing Systems,2019,32:12123-12133. [14]KUMARASWAMY R,SCHLEGEL M,WHITE A,et al.Context-Dependent Upper-Confidence Bounds for Directed Exploration[C]//Advances in Neural Information Processing Systems.2018:4779-4789. [15]FORTUNATO M,AZAR M G,PIOT B,et al.Noisy Networks for Exploration[C]//International Conference on Learning Representations.2018. [16]OSBAND I,BLUNDELL C,PRITZEL A,et al.Deep Exploration via Bootstrapped DQN[C]//Advances in Neural Information Processing Systems.Barcelona,Spain,2016:4026-4034. [17]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous Me-thods for Deep Reinforcement Learning[C]//International Conference on Machine Learning.New York,USA:ACM,2016:1928-1937. [18]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust Region Policy Optimization[C]//International Conference on Machine Learning.Lille,France,2015:1889-1897. [19]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.ProximalPolicy Optimization Algorithms[J].arXiv:1707.06347,2017. [20]WATKINS C J C H,DAYAN P.Q-learning[J].Machine Lear-ning,1992,8(3/4):279-292. [21]SUTTON R.Introduction to Reinforcement Learning withFunction Approximation[C]//Tutorial at the Conference on Neural Information Processing Systems.Montreal,Canada:MIT Press,2015:33. [22]HAUSKNECHT M,STONE P.Deep Recurrent Q-Learning for Partially Observable MDPs[J].arXiv:1507.06527,2015. [23]WILLIAMS R J,ZIPSER D.A Learning Algorithm for Conti-nually Running Fully Recurrent Neural Networks[J].Neural Computation,1989,1(2):270-280. [24]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory[J].Neural Computation,1997,9(8):1735-1780. [25]METROPOLIS N,ULAM S.The Monte Carlo Method[J].Journal of the American Statistical Association,1949,44(247):335-341. [26]BELLEMARE M,SRINIVASAN S,OSTROVSKI G,et al.Unifying Count-Based Exploration and Intrinsic Motivation[C]//Advances in Neural Information Processing Systems.Barcelona,Spain,2016:1471-1479. [27]HASSELT H V.Double Q-learning[C]//Advances in NeuralInformation Processing Systems.Vancouver,British Columbia,Canada,2010:2613-2621. [28]HASSELT H V,GUEZ A,SILVER D.Deep ReinforcementLearning with Double Q-Learning[J].arXiv:1509.06461,2015. [29]FUJIMOTO S,HOOF H V,MEGER D.Addressing Function Approximation Error in Actor-Critic Methods[J].arXiv:1802.09477,2018. [30]JIN C,ALLEN-ZHU Z,BUBECK S,et al.IsQ-Learning Pro-vably Efficient? [C]//Advances in Neural Information Proces-sing Systems.Montreal,Canada:MIT Press,2018:4863-4873. [31]SUTTON R S,BARTO A G.Reinforcement Learning:An Introduction[M].Cambridge,USA:MIT Press,2018:131-132. [32]BROCKMAN G,CHEUNG V,PETTERSSON L,et al.OpenAi gym[J].arXiv:1606.01540,2016. |
[1] | HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157. |
[2] | ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185. |
[3] | LIN Zhen-xian, ZHANG Meng-kai, WU Cheng-mao, ZHENG Xing-ning. Face Image Inpainting with Generative Adversarial Network [J]. Computer Science, 2021, 48(9): 174-180. |
[4] | HUANG Zhi-yong, WU Hao-lin, WANG Zhuang, LI Hui. DQN Algorithm Based on Averaged Neural Network Parameters [J]. Computer Science, 2021, 48(4): 223-228. |
[5] | SHEN Xia-jiong, YANG Ji-yong, ZHANG Lei. Attribute Exploration Algorithm Based on Unrelated Attribute Set [J]. Computer Science, 2021, 48(4): 54-62. |
[6] | GUO Xin, ZHANG Geng, CHEN Qian, WANG Su-ge. Candidate Sentences Extraction for Machine Reading Comprehension [J]. Computer Science, 2020, 47(5): 198-203. |
[7] | ZHANG Xin-ming, LI Shuang-qian, LIU Yan, MAO Wen-tao, LIU Shang-wang, LIU Guo-qi. Coyote Optimization Algorithm Based on Information Sharing and Static Greed Selection [J]. Computer Science, 2020, 47(5): 217-224. |
[8] | WANG Guo-yin, QU Zhong, ZHAO Xian-lian. Practical Exploration of Discipline Construction of Artificial Intelligence+ [J]. Computer Science, 2020, 47(4): 1-5. |
[9] | ZHENG Tian-jian, HOU Jin-hong, ZHANG Wei, WANG Ju. Finite Basis of Implicational System Associated with Finite Models of Description Logic FL0 Under the Greatest Fixed Point Semantics [J]. Computer Science, 2020, 47(11A): 92-96. |
[10] | YANG Jia-ning, HUANG Xiang-sheng, LI Zong-han, RONG Can, LIU Dao-wei. Spatio-temporal Trajectory Prediction of Power Grid Based on Double Layers Stacked Long Short-term Memory [J]. Computer Science, 2019, 46(11A): 23-27. |
[11] | . Review of Coverage Path Planning Arithmetic in Unl}nown Indoor Environment [J]. Computer Science, 2012, 39(Z11): 334-338. |
[12] | PENG Kai,QING Yong-bin,XU Dao-yun. Customer Segmentation Modeling on Factor Analysis and K-MEANS Clustering [J]. Computer Science, 2011, 38(5): 154-158. |
|