Computer Science ›› 2021, Vol. 48 ›› Issue (10): 30-36.doi: 10.11896/jsjkx.201000129
• Artificial Intelligence • Previous Articles Next Articles
LIU Tian-xing, LI Wei, XU Zheng, ZHANG Li-hua, QI Xiao-ya, GAN Zhong-xue
CLC Number:
[1]BROWNE C B,POWLEY E,WHITEHOUSE D,et al.A Survey of Monte Carlo Tree Search Methods[J].IEEE Transactions on Computational Intelligence & AI in Games,2012,4(1):1-43. [2]SCHRITTWIESER J,ANTONOGLOU I,HUBERT T,et al.Mastering Atari,Go,Chess and Shogi by Planning with a Lear-ned Model[J].arXiv:1911.08265. [3]SILVER D,HUBERT T,SCHRITTWIESER J,et al.A general reinforcement learning algorithm that masters chess,shogi,and Go through self-play[J].Science,2018,362(6419):1140-1144. [4]CHASLOT G,WINANDS M,VAN DEN HERIK H J,et al.Progressive strategies for monte-carlo tree search[J].New Mathematics and Natural Computation,2008,4(3):343-357. [5]COUËTOUX A,MARIO M.Continuous rapid action value estimates [C]//Asian Conference on Machine Learning.2011:19-31. [6]YEE T,VILAM L,BOWLING M.Monte Carlo Tree Search in Continuous Action Spaces with Execution Uncertainty[C]//IJCAI 2016.AAAI Press,2016:690-697. [7]BROCKMAN G,CHEUNG V,PETTERSSON L,et al.OpenAI Gym[J].arXiv:1606.01540. [8]RÉMI C.Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search[C]//Proceeding of the International Confe-rence on Computer & Games.2006:72-83. [9]KOCSIS L,SZEPESVÁRI C.Bandit based monte-carlo planning[C]//17th European Conference on Machine Learning.2006:282-293. [10]COUËTOUX A,HOOCK J B.Continuous Upper ConfidenceTrees[C]//International Conference on Learning and Intelligent Optimization.2011:433-445. [11]SÉBASTIEN B,RÉMI M,STOLTZ G,et al.Online Optimization in X-Armed Bandits[J].Advances in Neural Information Processing Systems,2009:201-208. [12]MANSLEY C R,WEINSTEIN A,LITTMAN M L.Sample-Based Planning for Continuous Action Markov Decision Processes[C]//International Conference on International Conference on Automated Planning & Scheduling.AAAI Press,2011. [13]WEINSTEIN A,LITTMAN M L.Bandit-based planning andlearning in continuous-action markov decision processes[C]//Proceedings of the Twenty-Second International Conference on International Conference on Automated Planning and Scheduling.2012:306-314. [14]KIM B,LEE K,LIM S,et al.Monte Carlo Tree Search in Continuous Spaces Using Voronoi Optimistic Optimization with Regret Bounds [C]//AAAI 2020.2020:9916-9924. [15]CHASLOT G M B,WINANDS M H,VAN DENHERIK H J.Parallel monte-carlo tree search[C]//International Conference on Computers and Games.2008:60-71. [16]KURZER K,CHRISTOPH H,MARIUS Z J.Parallelization of Monte Carlo Tree Search in Continuous Domains[J].arXiv:2003.13741. [17]ANTHONY T,TIAN Z,BARBER D.Thinking Fast and Slow with Deep Learning and Tree Search[C]//Neural Information Processing Systems.2017:5360-5370. [18]KARTAL B,HERNANDEZ-LEAL P,TAYLOR M E.ActionGuidance with MCTS for Deep Reinforcement Learning[J].Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment,2019,15(1):153-159. [19]SUBRAMANIAN S G,CROWLEY M.Combining MCTS and A3C for Prediction of Spatially Spreading Processes in Forest Wildfire Settings[C]//Canadian Conference on Artificial Intelligence.2018:285-291. [20]ZHANG H,CHENG F,XU B,et al.RevCuT Tree SearchMethod in Complex Single-player Game with Continuous Search Space[C]//2019 International Joint Conference on Neural Networks (IJCNN).2019:1-8. [21]LEE K,KIM S,CHOI J,et al.Deep Reinforcement Learning in Continuous Action Spaces:a Case Study in the Game of Simulated Curling[C]//Proceedings of the 35th International Confe-rence on Machine Learning.2018:2937-2946. |
[1] | LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241. |
[2] | YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204. |
[3] | SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256. |
[4] | YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253. |
[5] | LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279. |
[6] | LUO Jun-ren, ZHANG Wan-peng, LU Li-na, CHEN Jing. Survey on Online Adversarial Planning for Real-time Strategy Game [J]. Computer Science, 2022, 49(6): 287-296. |
[7] | GUO Yu-xin, CHEN Xiu-hong. Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement [J]. Computer Science, 2022, 49(6): 313-318. |
[8] | FAN Jing-yu, LIU Quan. Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning [J]. Computer Science, 2022, 49(6): 335-341. |
[9] | WEI Hui, CHEN Ze-mao, ZHANG Li-qiang. Anomaly Detection Framework of System Call Trace Based on Sequence and Frequency Patterns [J]. Computer Science, 2022, 49(6): 350-355. |
[10] | XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11. |
[11] | HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157. |
[12] | GAO Jie, LIU Sha, HUANG Ze-qiang, ZHENG Tian-yu, LIU Xin, QI Feng-bin. Deep Neural Network Operator Acceleration Library Optimization Based on Domestic Many-core Processor [J]. Computer Science, 2022, 49(5): 355-362. |
[13] | ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185. |
[14] | JIAO Xiang, WEI Xiang-lin, XUE Yu, WANG Chao, DUAN Qiang. Automatic Modulation Recognition Based on Deep Learning [J]. Computer Science, 2022, 49(5): 266-278. |
[15] | LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268. |
|