Computer Science ›› 2023, Vol. 50 ›› Issue (12): 314-321.doi: 10.11896/jsjkx.221100096
• Artificial Intelligence • Previous Articles Next Articles
XU Yapeng1, LIU Quan1,2, LI Junwei1
CLC Number:
[1]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].MIT press,2018. [2]GOODFELLOW I,BENGIO Y,COURVILLE A,et al.Deeplearning[M].MIT press,2016. [3]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [4]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of go without human knowledge[J].Nature,2017,550(7676):354-359. [5]SALLAB A E,ABDOU M,PEROT E,et al.Deep reinforcement learning framework for autonomous driving[J].Electronic Imaging,2017,2017(19):70-76. [6]GOTTIPATI S K,SATTAROV B,NIU S,et al.Learning to navigate the synthetically accessible chemical space using reinforcement learning[C]//International Conference on Machine Learning.PMLR,2020:3668-3679. [7]LIU Q,ZHAI J W,ZHANG Z Z,et al.A survey on deep reinforcement learning[J].Chinese Journal of Computers,2018,41(1):1-27. [8]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[C]//ICLR.2016. [9]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning.2016:1928-1937. [10]BARTO A G,MAHADEVAN S.Recent advances in hierarchical reinforcement learning[J].Discrete Event Dynamic Systems,2003,13(4):341-379. [11]RASHID T,SAMVELYAN M,SCHROEDER C,et al.Qmix:Monotonic value function factorisation for deep multi-agent reinforcement learning[C]//International Conference on Machine Learning.PMLR,2018:4295-4304. [12]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017. [13]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//International Conference on Machine Learning.2018:1861-1870. [14]SUTTON R S,PRECUP D,SINGH S.Between mdps and semi-mdps:A framework for temporal abstraction in reinforcement learning[J].Artificial Intelligence,1999,112(1/2):181-211. [15]BACON P L,HARB J,PRECUP D.The option-critic architecture[C]//AAAI Conference on Artificial Intelligence.2017:1726-1734. [16]ZHANG S,WHITESON S.Dac:The double actor-critic architecture for learning options[C]//Advances in Neural Information Processing Systems.2019:2012-2022. [17]SMITH M,HOOF H,PINEAU J.An inference-based policygradient method for learning options[C]//International Confe-rence on Machine Learning.PMLR,2018:4703-4712. [18]OSA T,TANGKARATT V,SUGIYAMA M.Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization[C]//International Conference on Learning Representations.2018. [19]FUJIMOTO S,HOOF H,MEGER D.Addressing function approximation error in actor-critic methods[C]//International Conference on Machine Learning.2018:1587-1596. [20]LI C,MA X,ZHANG C,et al.SOAC:The Soft Option Actor-Critic Architecture[J].arXiv:2006.14363,2020. [21]LEVINE S.Reinforcement learning and control as probabilistic inference:Tutorial and review[J].arXiv:1805.00909,2018. [22]BROCKMAN G,CHEUNG V,PETTERSSON L,et al.Openai gym[J].arXiv:1606.01540,2016. |
[1] | LIU Xingguang, ZHOU Li, ZHANG Xiaoying, CHEN Haitao, ZHAO Haitao, WEI Jibo. Edge Intelligent Sensing Based UAV Space Trajectory Planning Method [J]. Computer Science, 2023, 50(9): 311-317. |
[2] | LIN Xinyu, YAO Zewei, HU Shengxi, CHEN Zheyi, CHEN Xing. Task Offloading Algorithm Based on Federated Deep Reinforcement Learning for Internet of Vehicles [J]. Computer Science, 2023, 50(9): 347-356. |
[3] | JIN Tiancheng, DOU Liang, ZHANG Wei, XIAO Chunyun, LIU Feng, ZHOU Aimin. OJ Exercise Recommendation Model Based on Deep Reinforcement Learning and Program Analysis [J]. Computer Science, 2023, 50(8): 58-67. |
[4] | XIONG Liqin, CAO Lei, CHEN Xiliang, LAI Jun. Value Factorization Method Based on State Estimation [J]. Computer Science, 2023, 50(8): 202-208. |
[5] | ZENG Qingwei, ZHANG Guomin, XING Changyou, SONG Lihua. Intelligent Attack Path Discovery Based on Hierarchical Reinforcement Learning [J]. Computer Science, 2023, 50(7): 308-316. |
[6] | WANG Hanmo, ZHENG Shijie, XU Ruonan, GUO Bin, WU Lei. Self Reconfiguration Algorithm of Modular Robot Based on Swarm Agent Deep Reinforcement Learning [J]. Computer Science, 2023, 50(6): 266-273. |
[7] | ZHANG Qiyang, CHEN Xiliang, CAO Lei, LAI Jun, SHENG Lei. Survey on Knowledge Transfer Method in Deep Reinforcement Learning [J]. Computer Science, 2023, 50(5): 201-216. |
[8] | YU Ze, NING Nianwen, ZHENG Yanliu, LYU Yining, LIU Fuqiang, ZHOU Yi. Review of Intelligent Traffic Signal Control Strategies Driven by Deep Reinforcement Learning [J]. Computer Science, 2023, 50(4): 159-171. |
[9] | XU Linling, ZHOU Yuan, HUANG Hongyun, LIU Yang. Real-time Trajectory Planning Algorithm Based on Collision Criticality and Deep Reinforcement Learning [J]. Computer Science, 2023, 50(3): 323-332. |
[10] | Cui ZHANG, En WANG, Funing YANG, Yong jian YANG , Nan JIANG. UAV Frequency-based Crowdsensing Using Grouping Multi-agentDeep Reinforcement Learning [J]. Computer Science, 2023, 50(2): 57-68. |
[11] | ZHOU Tianyu, GUAN Zheng. Study on Relay Decision in Wireless Heterogeneous Networks Based on Deep ReinforcementLearning [J]. Computer Science, 2023, 50(11A): 221000088-5. |
[12] | PENG Yingxuan, SHI Dianxi, YANG Huanhuan, HU Haomeng, YANG Shaowu. Intention-based Multi-agent Motion Planning Method with Deep Reinforcement Learning [J]. Computer Science, 2023, 50(10): 156-164. |
[13] | LIN Zeyang, LAI Jun, CHEN Xiliang, WANG Jun. UAV Anti-tank Policy Training Model Based on Curriculum Reinforcement Learning [J]. Computer Science, 2023, 50(10): 214-222. |
[14] | WEI Nan, WEI Xianglin, FAN Jianhua, XUE Yu, HU Yongyang. Backdoor Attack Against Deep Reinforcement Learning-based Spectrum Access Model [J]. Computer Science, 2023, 50(1): 351-361. |
[15] | HUANG Yuzhou, WANG Lisong, QIN Xiaolin. Bi-level Path Planning Method for Unmanned Vehicle Based on Deep Reinforcement Learning [J]. Computer Science, 2023, 50(1): 194-204. |
|