Computer Science ›› 2024, Vol. 51 ›› Issue (5): 179-192.doi: 10.11896/jsjkx.230800099
• Artificial Intelligence • Previous Articles Next Articles
XIN Yuanxia1, HUA Daoyang2, ZHANG Li3
CLC Number:
[1]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atariwith deep reinforcement learning[J].arXiv:1312.5602,2013. [2]LIU Q,ZHAI J W,ZHANG Z Z,et al.A survey on deep reinforcement learning[J].Chinese Journal of Computers,2017,40(1):1-28. [3]SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural network and tree search[J].Nature,2016,529(7587):484-489. [4]FU J,CO-REYES J D,LEVINE S.EX2:Exploration with exemplar models for deep reinforcement learning[J].arXiv:1703.01260,2017. [5]HARE J.Dealing with sparse rewards in reinforcement learning[J].arXiv:1910.09281,2019. [6]PAPOUDAKIS G,CHRISTIANOS F,RAHMAN A,et al.Dea-ling with non-stationarity in multi-agent deep reinforcement learning[J].arXiv:1906.04737,2019. [7]THOMPSON W R.On the likelihood that one unknown probability exceeds another in view of the evidence of two samples[J].Biometrika,1933,25:285-294. [8]SUN W F,LEE C K,LEE C Y.DFAC Framework:Factorizingthe value function via quantile mixture for multi-agent distributional q-learning[J].arXiv:2102.07936,2021. [9]ZHAO J,YANG M Y,ZHAO Y P,et al.MCMARL:Paramete-rizing value function via mixture of categorical distributions for multi-agent reinforcement learning[J].arXiv:2202.10134,2022. [10]AUER P,CESA-BIANCHI N,FISCHER P.Finite-time analysis of themultiarmed bandit problem[J].Machine Learning,2002,47(2):235-256. [11]CARMEL D,MARKOVITCH S.Exploration strategies formodel-based learning in multi-agent systems[J].Autonomous Agents and Multi-Agent Systems,1999,2(2):141-172. [12]CHAKRABORTY M,CHUA K Y P,DAS S,et al.Coordinated versus decentralized exploration in multi-agent multi-armed bandits[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence.2017:164-170. [13]ZHANG K Q,YANG Z R,BAAR T.Multi-agent reinforcement learning:A selective overview of theories and algorithms[J].arXiv:1911.10635,2019. [14]WELD D S.Recent advances in AI planning[J].AI Magazine,2002,20:93-123. [15]VINYALS O,BABUSCHKIN I,CZARNECKI W M,et al.Grandmaster level in starcraft II using multi-agent reinforcement learning[J].Nature,2019,575(7782):350-354. [16]SUTTON R S,MCALLESTER D A,SINGH S,et al.Policy gradient methods for reinforcement learning with function approximation[C]//Proceedings of the 12th International Confe-rence on Neural Information Processing Systems.1999:1057-1063. [17]FORTUNATO M,AZAR M G,PIOT B,et al.Noisy networks for exploration[C]//Proceedings of the ICLR.2018. [18]ZHANG J W,LÜ S,ZHANG Z H,et al.Survey on deep reinforcement learning methods based on sample efficiency optimization[J].Ruan Jian Xue Bao/Journal of Software,2022,33(11):4217-4238. [19]WITT C S D,GUPTA T,MAKOVIICHUK D,et al.Is independent learning all you need in the starcraft multi-agent challenge?[J].arXiv:2011.09533,2020. [20]LOWE R,WU Y,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:6382-6393. [21]FOERSTER J N,FARQUHAR G,AFOURAS T,et al.Counterfactual multi-agent policy gradients[C]//Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence.2018:2974-2982. [22]RASHID T,SAMVELYAN M,WITT C S D,et al.Monotonic value function factorisation for deep multi-agent reinforcement learning[J].Journal of Machine Learning Research,2020,21(178):7234-7284. [23]SUNEHAG P,LEVER G,AUDR G,et al.Value-decomposition networks for cooperative multi-agent learning based on team reward[J].arXiv:1706.05296,2017. [24]SON K,KIM D,KANG W J,et al.QTRAN:Learning to facto-rize with transformation for cooperative multi-agent reinforcement learning[C]//Proceedings of the 36th International Conference on Machine Learning.2019:5887-5896. [25]SON K,AHN S,REYES R D,et al.QTRAN++:Improved value transformation for cooperative multi-agent reinforcement learning[J].arXiv:2006.12010,2020. [26]BONET B,GEFFNER H.Planning as heuristic search[J].Artificial Intelligence,2001,129(1):5-33. [27]RANGANATHAN A,RIABOV A,UDREA O.Mashup-basedinformation retrieval for domain experts[C]//Proceedings of the 18th ACM Conference on Information and Knowledge Mana-gement.2009:711-720. [28]SOHRABI S,RIABOV A,KATZ M,et al.An AI planning solution to scenario generation for enterprise risk management[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018. [29]KATZ M,RAM P,SOHRABI S,et al.Exploring context-free languages via planning:The case for automating machine lear-ning[C]//Proceedings of the International Conference on Automated Planning and Scheduling.2020:403-411. [30]GARRETT C R,CHITNIS R,HOLLADAY R,et al.Integrated task and motion planning[J].Annual Review of Control Robo-tics and Autonomous Systems,2021,4:265-293. [31]LIU J,LIANG R S,XIAN J W.An AI planning approach to factory production planning and scheduling[C]//Proceedings of 2022 International Conference on Machine Learning and Know-ledge Engineering.2022:110-114. [32]SILVER T,CHITNIS R.PDDLGym:Gym environments fromPDDL problems[J].arXiv:2002.06432,2020. [33]RIVLIN O,HAZAN T,KARPAS E.Generalized planning with deep reinforcement learning[J].arXiv:2005.02305,2020. [34]GEHRING C,ASAI M,CHITNIS R,et al.Reinforcement lear-ning for classical planning:Viewing heuristics as dense reward generators[C]//Proceedings of the International Conference on Automated Planning and Scheduling.2022:588-596. [35]LEE J,KATZ M,AGRAVANTE D J,et al.AI planning annotation for sample efficient reinforcement learning[J].arXiv:2203.00669,2022. [36]SUTTON R S,BARTO A G,WILLIAMS R J.Reinforcementlearning is direct adaptive optimal control[J].IEEE Control Systems Magazine,1992,12(2):19-22. [37]LEWIS F L,VRABIE D,VAMVOUDAKIS K G.Reinforcement learning and feedback control:Using natural decision methods to design optimal adaptive controllers[J].IEEE Control Systems Magazine,2012,32(6):76-105. [38]GEREVINI A E,HASLUM P,LONG D,et al.Deterministicplanning in the fifth international planning competition:PDDL3 and experimental evaluation of the planners[J].Artificial Intelligence,2009,173(5/6):619-668. [39]HU Y J,WANG W X,JIA H T,et al.Learning to utilize shaping rewards:A new approach of reward shaping[C]//Procee-dings of the 34th International Conference on Neural Information Processing Systems.2020:15931-15941. [40]SAMVELYAN M,RASHID T,WITT C S D,et al.The starcraft multi-agent challenge[C]//Proceedings of the 18th International Conference on Autonomous Agents and Multi-Agent Systems.2019:2186-2188. [41]SUKHBAATAR S,SZLAM A,FERGUS R.Learning multiagent communication with backpropagation[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:2252-2260. |
[1] | SHI Dianxi, HU Haomeng, SONG Linna, YANG Huanhuan, OUYANG Qianying, TAN Jiefu , CHEN Ying. Multi-agent Reinforcement Learning Method Based on Observation Reconstruction [J]. Computer Science, 2024, 51(4): 280-290. |
[2] | LUO Ruiqing, ZENG Kun, ZHANG Xinjing. Curriculum Learning Framework Based on Reinforcement Learning in Sparse HeterogeneousMulti-agent Environments [J]. Computer Science, 2024, 51(1): 301-309. |
[3] | XIONG Liqin, CAO Lei, CHEN Xiliang, LAI Jun. Value Factorization Method Based on State Estimation [J]. Computer Science, 2023, 50(8): 202-208. |
[4] | LIN Xiangyang, XING Qinghua, XING Huaixi. Study on Intelligent Decision Making of Aerial Interception Combat of UAV Group Based onMADDPG [J]. Computer Science, 2023, 50(6A): 220700031-7. |
[5] | RONG Huan, QIAN Minfeng, MA Tinghuai, SUN Shengjie. Novel Class Reasoning Model Towards Covered Area in Given Image Based on InformedKnowledge Graph Reasoning and Multi-agent Collaboration [J]. Computer Science, 2023, 50(1): 243-252. |
[6] | SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256. |
[7] | DU Wei, DING Shi-fei. Overview on Multi-agent Reinforcement Learning [J]. Computer Science, 2019, 46(8): 1-8. |
[8] | BIAN Rui, WU Xiang-jun and CHEN Ai-xiang. Decomposition Strategy for Knowledge Tree of Predicate Based on Static Preconditions [J]. Computer Science, 2017, 44(1): 235-242. |
[9] | . Expressive Temporal Planning Algorithm under Dynamic Constraint Satisfaction Framework [J]. Computer Science, 2012, 39(6): 226-230. |
[10] | CHEN Yi-xiong,WU Zhong-fu,FEND Yong,ZHU Zheng-zhou. Learning-Task Scheduling Algorithm Based on CSP Model [J]. Computer Science, 2010, 37(12): 41-46. |
[11] | FANG Qi-qingl, PENG Xiao-ming, LIU Qing-hua, HU Ya-hui. Study of Web Service Composition on Combining AI Planning with Workflow [J]. Computer Science, 2009, 36(9): 110-114. |
[12] | . [J]. Computer Science, 2008, 35(1): 135-139. |
[13] | ZHANG Pei-Yun ,SUN Ya-Min (School of Computer Science & Technology, Nanjing University of Science & Technology, Nanjing 210094). [J]. Computer Science, 2007, 34(5): 4-7. |
|