Computer Science ›› 2025, Vol. 52 ›› Issue (1): 277-288.doi: 10.11896/jsjkx.240100221
• Artificial Intelligence • Previous Articles Next Articles
WANG Qidi, SHEN Liwei, WU Tianyi
CLC Number:
[1]ZHOU X,BAI T,GAO Y,et al.Vision-based robot navigation through combining unsupervised learning and hierarchical reinforcement learning[J].Sensors,2019,19(7):1576. [2]JAIN D,ISCEN A,CALUWAERTS K.Hierarchical reinforcement learning for quadruped locomotion[C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2019:7551-7557. [3]YIN C,YANG R,ZHU W,et al.Survey on multi-agent hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15(4):646-655. [4]SUTTON R S,PRECUP D,SINGH S.Between mdps and semi-mdps:A framework for temporal abstraction in reinforcement learning[J].Artificial Intelligence,1999,112(1/2):181-211. [5]HOVLAND G E,SIKKA P,MCCARRAGHER B J.Skill acquisition from human demonstration using a hidden markov model[C]//Proceedings of IEEE International Conference on Robotics and Automation:volume 3.IEEE,1996:2706-2711. [6]SCHAAL S.Dynamic movement primitives-A framework formotor control in humans and humanoid robotics[M]//Adaptive Motion of Animals and Machines.Berlin:Springer,2006:261-280. [7]KONIDARIS G,KUINDERSMA S,GRUPEN R,et al.Robotlearning from demonstration by constructing skill trees[J].The International Journal of Robotics Research,2012,31(3):360-375. [8]KIPF T,LI Y,DAI H,et al.Compile:Compositional imitation learning and execution[C]//International Conference on Machine Learning.PMLR,2019:3418-3428. [9]SHANKAR T,TULSIANI S,PINTO L,et al.Discovering motor programs by recomposing demonstrations[C]//8th International Conference on Learning Representations(ICLR).2020. [10]CHEN Y,WANG C,BASTANI O,et al.Program synthesisusing deduction-guided reinforcement learning[C]//Computer Aided Verification:32nd International Conference(CAV 2020).Springer,2020:587-610. [11]ICARTE R T,KLASSEN T,VALENZANO R,et al.Using reward machines for high-level task specification and decomposition in reinforcement learning[C]//International Conference on Machine Learning.PMLR,2018:2107-2116. [12]ANDREAS J,KLEIN D,LEVINE S.Modular multitask rein-forcement learning with policy sketches[C]//International Conference on Machine Learning.PMLR,2017:166-175. [13]SHIARLIS K,WULFMEIER M,SALTER S,et al.Taco:Learning task decomposition via temporal alignment for control[C]//International Conference on Machine Learning.PMLR,2018:4654-4663. [14]ARGALL B D,CHERNOVA S,VELOSO M,et al.A survey of robot learning from demonstration[J].Robotics and Autonomous Systems,2009,57(5):469-483. [15]ESMAILI N,SAMMUT C,SHIRAZI G.Behavioural cloning in control of a dynamic system[C]//1995 IEEE International Conference on Systems,Man and Cybernetics.Intelligent Systems for the 21st Century:volume 3.IEEE,1995:2904-2909. [16]PETERS J,KOBER J,MÜLLING K,et al.Towards robot skill learning:From simple skills to table tennis[C]//Machine Lear-ning and Knowledge Discovery in Databases:European Confe-rence(ECML PKDD 2013).Springer,2013:627-631. [17]NIEKUM S,OSENTOSKI S,KONIDARIS G,et al.Learningand generalization of complex tasks from unstructured demonstrations[C]//2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.IEEE,2012:5239-5246. [18]NIEKUM S,OSENTOSKI S,KONIDARIS G,et al.Learninggrounded finite-state representations from unstructured demonstrations[J].The International Journal of Robotics Research,2015,34(2):131-157. [19]ZHU Y,STONE P,ZHU Y.Bottom-up skill discovery from unsegmented demonstrations for long-horizon robot manipulation[J].IEEE Robotics and Automation Letters,2022,7(2):4126-4133. [20]ARTO A G,MAHADEVAN S.Recent advances in hierarchical reinforcement learning[J].Discrete Event Dynamic Systems,2003,13(1/2):41-77. [21]PARR R,RUSSELL S.Reinforcement learning with hierarchies of machines[M]//Advances in Neural Information Processing Systems 10.The MIT Press,1997:1043-1049. [22]DAYAN P,HINTON G E.Feudal reinforcement learning[M]//Advances in Neural Information Processing Systems 5.Morgan Kaufmann,1992:271-278. [23]SUTTON R S,PRECUP D,SINGH S.Intra-option learningabout temporally abstract actions[C]//ICML:volume 98.1998:556-564. [24]FOX R,KRISHNAN S,STOICA I,et al.Multi-level discovery of deep options[J].arXiv:1703.08294,2017. [25]KRISHNAN S,FOX R,STOICA I,et al.Ddco:Discovery ofdeep continuous options for robot learning from demonstrations[C]//Conference on Robot Learning.PMLR,2017:418-437. [26]SHANKAR T,GUPTA A.Learning robot skills with temporal variational inference[C]//International Conference on Machine Learning.PMLR,2020:8624-8633. [27]XIE Y,ZHOU F,SOH H.Embedding symbolic temporal know-ledge into deep sequential models[C]//2021 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2021:4267-4273. [28]YANG F,LYU D,LIU B,et al.Peorl:Integrating symbolicplanning and hierarchical reinforcement learning for robust decision-making[C]//Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence(IJCAI).2018:4860-4866. |
[1] | WANG Xianwei, FENG Xiang, YU Huiqun. Multi-agent Cooperative Algorithm for Obstacle Clearance Based on Deep Deterministic PolicyGradient and Attention Critic [J]. Computer Science, 2024, 51(7): 319-326. |
[2] | LI Junwei, LIU Quan, XU Yapeng. Option-Critic Algorithm Based on Mutual Information Optimization [J]. Computer Science, 2024, 51(2): 252-258. |
[3] | ZENG Qingwei, ZHANG Guomin, XING Changyou, SONG Lihua. Intelligent Attack Path Discovery Based on Hierarchical Reinforcement Learning [J]. Computer Science, 2023, 50(7): 308-316. |
[4] | XU Yapeng, LIU Quan, LI Junwei. Hierarchical Reinforcement Learning Method Based on Trajectory Information [J]. Computer Science, 2023, 50(12): 314-321. |
[5] | ZHOU Qin, LUO Fei, DING Wei-chao, GU Chun-hua, ZHENG Shuai. Double Speedy Q-Learning Based on Successive Over Relaxation [J]. Computer Science, 2022, 49(3): 239-245. |
[6] | QIAN Jing, WU Ke-yu, CHEN Chao, HU Xing-chen. Optimal Order Acceptance Decision Based on After-state Reinforcement Learning [J]. Computer Science, 2022, 49(11A): 210800261-9. |
[7] | ZHANG Fan, GONG Ao-yu, DENG Lei, LIU Fang, LIN Yan, ZHANG Yi-jin. Wireless Downlink Scheduling with Deadline Constraint for Realistic Channel Observation Environment [J]. Computer Science, 2021, 48(9): 264-270. |
[8] | WANG Ying-kai, WANG Qing-shan. Reinforcement Learning Based Energy Allocation Strategy for Multi-access Wireless Communications with Energy Harvesting [J]. Computer Science, 2021, 48(7): 333-339. |
[9] | FANG Ting, GONG Ao-yu, ZHANG Fan, LIN Yan, JIA Lin-qiong, ZHANG Yi-jin. Dynamic Broadcasting Strategy in Cognitive Radio Networks Under Delivery Deadline [J]. Computer Science, 2021, 48(7): 340-346. |
[10] | YU Li, DU Qi-han, YUE Bo-yan, XIANG Jun-yao, XU Guan-yu, LENG You-fang. Survey of Reinforcement Learning Based Recommender Systems [J]. Computer Science, 2021, 48(10): 1-18. |
[11] | WANG Zheng-ning, ZHOU Yang, LV Xia, ZENG Fan-wei, ZHANG Xiang, ZHANG Feng-jun. Improved MDP Tracking Method by Combining 2D and 3D Information [J]. Computer Science, 2019, 46(3): 97-102. |
[12] | CHAI Ye-sheng, ZHU Xue-yang, YAN Rong-jie and ZHANG Guang-quan. MARTE Models Based System Reliability Prediction [J]. Computer Science, 2015, 42(12): 82-86. |
[13] | HUANG Zhen-jin,LU Yang,YANG Juan and FANG Huan. Property Patterns of Markov Decision Process Nondeterministic Choice Scheduler [J]. Computer Science, 2013, 40(4): 263-266. |
[14] | NIU Jun,ZENG Guo-sun, LU Xin-rong,XU Chang. Stochastic Model Checking Continuous Time Markov Process [J]. Computer Science, 2011, 38(9): 112-115. |
[15] | WANG Guan-jun,WANG Mao-li,ZHAO Ying. Research on Novel Test Vector Ordering Approach Based on Markov Decision Processes [J]. Computer Science, 2010, 37(5): 287-290. |
|