Computer Science ›› 2024, Vol. 51 ›› Issue (2): 252-258.doi: 10.11896/jsjkx.221100019
• Artificial Intelligence • Previous Articles Next Articles
LI Junwei1, LIU Quan1,2,3,4, XU Yapeng1
CLC Number:
[1]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].MIT Press,1998. [2]LIU Q,ZHAI J W,ZHANG Z Z,et al.A survey on deep reinforcement learning[J].Chinese Journal of Computers,2018,41(1):1-27. [3]LIU J W,GAO F,LUO X L.Survey of deep reinforcementlearning based on value function and policy gradient[J].Chinese Journal of Computers,2019,42(6):1406-1438. [4]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [5]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[J].arXiv:1509.02971,2015. [6]FUJIMOTO S,HOOF H,MEGER D.Addressing function approximation error in actor-critic methods[C]//Proceedings of the International Conference on Machine Learning.2018:1587-1596. [7]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region po-licy optimization[C]//Proceedings of the International Confe-rence on Machine Learning.2015:1889-1897. [8]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft actor-critic:Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the International Confe-rence on Machine Learning.2018:1861-1870. [9]KULKARNI T D,NARASIMHAN K,SAEEDI A,et al.Hie-rarchical deep reinforcement learning:Integrating temporal abstraction and intrinsic motivation[C]//Advances in Neural Information Processing Systems.2016:3675-3683. [10]ZHAO D,ZHANG L,ZHANG B,et al.Mahrl:Multi-goals abstraction based deep hierarchical reinforcement learning for re-commendations[C]//Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:871-880. [11]DUAN J,LI S E,GUAN Y,et al.Hierarchical reinforcementlearning for self-driving decision-making without reliance on labelled driving data[J].IET Intelligent Transport Systems,2020,14(5):297-305. [12]LIU J,PAN F,LUO L.Gochat:Goal-oriented chatbots withhierarchical reinforcement learning[C]//Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:1793 -1796. [13]SUTTON R S,PRECUP D,SINGH S.Between mdps and semi-mdps:A framework for temporal abstraction in reinforcement learning[J].Artificial Intelligence,1999,112(1/2):181-211. [14]LIU C H,ZHU F,LIU Q.Option-Critic Algorithm Based onSub-Goal Quantity Optimization[J].Chinese Journal of Computers,2021,44(9):1922-1933. [15]HUANG Z G,LIU Q,ZHANG L H,et al.Research and Deve-lopment on Deep Hierarchical Reinforcement Learning[J].Journal of Software,2023,34(2):733-760. [16]BACON P L,HARB J,PRECUP D.The option-critic architecture[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2017. [17]SUTTON R S,MCALLESTER D A,SINGH SP,et al.Policy gradient methods for reinforcement learning with function approximation[C]//Proceedings of the Advances in Neural Information Processing Systems.2000:1057-1063. [18]EYSENBACH B,GUPTA A,IBARZ J,et al.Diversity is all you need:Learning skills without a reward function[J].arXiv:1802.06070,2018. [19]BAUMLI K,WARDE F D,HANSEN S,et al.Relative variational intrinsic control[C]//Proceeding of the AAAI Conference on Artificial Intelligence.2021:6732-6740. [20]ZHANG J,YU H,XU W.Hierarchical reinforcement learning by discovering intrinsic options[C]//Proceeding of the International Conference on Learning Representations.2021. [21]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithms[J].arXiv:1707.06347,2017. [22]KLISSAROV M,BACON P L,HARB J,et al.Learnings options end-to-end for continuous action tasks[J].arXiv:1712.00004,2017. [23]BROCKMAN G,CHEUNG V,PETTERSSON L,et at.Openai gym[J].arXiv:1606.01540,2016. |
[1] | SHI Dianxi, PENG Yingxuan, YANG Huanhuan, OUYANG Qianying, ZHANG Yuhui, HAO Feng. DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Learning [J]. Computer Science, 2024, 51(2): 268-277. |
[2] | ZHAO Xiaoyan, ZHAO Bin, ZHANG Junna, YUAN Peiyan. Study on Cache-oriented Dynamic Collaborative Task Migration Technology [J]. Computer Science, 2024, 51(2): 300-310. |
[3] | LIU Xingguang, ZHOU Li, ZHANG Xiaoying, CHEN Haitao, ZHAO Haitao, WEI Jibo. Edge Intelligent Sensing Based UAV Space Trajectory Planning Method [J]. Computer Science, 2023, 50(9): 311-317. |
[4] | LIN Xinyu, YAO Zewei, HU Shengxi, CHEN Zheyi, CHEN Xing. Task Offloading Algorithm Based on Federated Deep Reinforcement Learning for Internet of Vehicles [J]. Computer Science, 2023, 50(9): 347-356. |
[5] | JIN Tiancheng, DOU Liang, ZHANG Wei, XIAO Chunyun, LIU Feng, ZHOU Aimin. OJ Exercise Recommendation Model Based on Deep Reinforcement Learning and Program Analysis [J]. Computer Science, 2023, 50(8): 58-67. |
[6] | XIONG Liqin, CAO Lei, CHEN Xiliang, LAI Jun. Value Factorization Method Based on State Estimation [J]. Computer Science, 2023, 50(8): 202-208. |
[7] | ZENG Qingwei, ZHANG Guomin, XING Changyou, SONG Lihua. Intelligent Attack Path Discovery Based on Hierarchical Reinforcement Learning [J]. Computer Science, 2023, 50(7): 308-316. |
[8] | ZHU Yuying, GUO Yan, WAN Yizhao, TIAN Kai. New Word Detection Based on Branch Entropy-Segmentation Probability Model [J]. Computer Science, 2023, 50(7): 221-228. |
[9] | WANG Hanmo, ZHENG Shijie, XU Ruonan, GUO Bin, WU Lei. Self Reconfiguration Algorithm of Modular Robot Based on Swarm Agent Deep Reinforcement Learning [J]. Computer Science, 2023, 50(6): 266-273. |
[10] | ZHANG Qiyang, CHEN Xiliang, CAO Lei, LAI Jun, SHENG Lei. Survey on Knowledge Transfer Method in Deep Reinforcement Learning [J]. Computer Science, 2023, 50(5): 201-216. |
[11] | YU Ze, NING Nianwen, ZHENG Yanliu, LYU Yining, LIU Fuqiang, ZHOU Yi. Review of Intelligent Traffic Signal Control Strategies Driven by Deep Reinforcement Learning [J]. Computer Science, 2023, 50(4): 159-171. |
[12] | XU Linling, ZHOU Yuan, HUANG Hongyun, LIU Yang. Real-time Trajectory Planning Algorithm Based on Collision Criticality and Deep Reinforcement Learning [J]. Computer Science, 2023, 50(3): 323-332. |
[13] | Cui ZHANG, En WANG, Funing YANG, Yong jian YANG , Nan JIANG. UAV Frequency-based Crowdsensing Using Grouping Multi-agentDeep Reinforcement Learning [J]. Computer Science, 2023, 50(2): 57-68. |
[14] | XU Yapeng, LIU Quan, LI Junwei. Hierarchical Reinforcement Learning Method Based on Trajectory Information [J]. Computer Science, 2023, 50(12): 314-321. |
[15] | ZHOU Tianyu, GUAN Zheng. Study on Relay Decision in Wireless Heterogeneous Networks Based on Deep ReinforcementLearning [J]. Computer Science, 2023, 50(11A): 221000088-5. |
|