Computer Science ›› 2023, Vol. 50 ›› Issue (5): 201-216.doi: 10.11896/jsjkx.220400235
• Artificial Intelligence • Previous Articles Next Articles
ZHANG Qiyang, CHEN Xiliang, CAO Lei, LAI Jun, SHENG Lei
CLC Number:
[1]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[J].arXiv:1312.5602,2013. [2]TORRADO R R,BONTRAGER P,TOGELIUS J,et al.Deep Reinforcement Learning for General Video Game AI[C]//14th IEEE Conference on Computational Intelligence and Games,CIG 2018.IEEE Computer Society,2018:14-17. [3]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of go without human knowledge[J].Nature,2017,550(7676):354-359. [4]GU S,HOLLY E,LILLICRAP T,et al.Deep reinforcementlearning for robotic manipulation with asynchronous off-policy updates[C]//2017 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2017:3389-3396. [5]LI J,MONROE W,RITTER A,et al.Deep ReinforcementLearning for Dialogue Generation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Proces-sing.2016:1192-1202. [6]ANDERSEN P A,GOODWIN M,GRANMO O C.Deep RTS:a game environment for deep reinforcement learning in real-time strategy games[C]//2018 IEEE Conference on Computational Intelligence and Games(CIG).IEEE,2018:1-8. [7]LING Y,HASAN S A,DATLA V,et al.Learning to diagnose:assimilating clinical narratives using deep reinforcement learning[C]//Proceedings of the Eighth International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2017:895-905. [8]HESSEL M,MODAYIL J,VAN HASSELT H,et al.Rainbow:Combining improvements in deep reinforcement learning[C]//Thirty-second AAAI Conference on Artificial Intelligence.2018:156-167. [9]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].Massachusetts:MIT press,2018. [10]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [11]SILVER D,LEVER G,HEESS N,et al.Deterministic policygradient algorithms[C]//International Conference on Machine Learning.PMLR,2014:387-395. [12]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[C]//ICLR(Poster).2016. [13]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1928-1937. [14]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region po-licy optimization[C]//International Conference on Machine Learning.PMLR,2015:1889-1897. [15]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017. [16]RASHID T,SAMVELYAN M,SCHROEDER C,et al.Qmix:Monotonic value function factorisation for deep multi-agent reinforcement learning[C]//International Conference on Machine Learning.PMLR,2018:4295-4304. [17]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-Decomposition Networks For Cooperative Multi-Agent Learning Based on Team Reward[C]//Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems.2018:2085-2087. [18]LOWE R,WU Y,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:6382-6393. [19]ASLANIDES J,LEIKE J,HUTTER M.Universal reinforce-ment learning algorithms:survey and experiments[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence.2017:1403-1410. [20]WATKINS C J C H,DAYAN P.Q-learning[J].Machine Lear-ning,1992,8(3/4):279-292. [21]FISCHER A,IGEL C.An introduction to restricted Boltzmann machines[C]//Iberoamerican Congress on Pattern Recognition.Berlin:Springer,2012:14-36. [22]RIEDMILLER M.Neural fitted Q iteration-first experienceswith a data efficient neural reinforcement learning method[C]//European Conference on Machine Learning.Berlin:Springer,2005:317-328. [23]ZHOU Z H,FENG J.Deep forest:towards an alternative to deep neural networks[C]//Proceedings of the 26th Interna-tional Joint Conference on Artificial Intelligence.2017:3553-3559. [24]YOSINSKI J,CLUNE J,BENGIO Y,et al.How transferable are features in deep neural networks?[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2.2014:3320-3328. [25]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks[C]//NIPS.2012:654-669. [26]AYTAR Y,PFAFF T,BUDDEN D,et al.Playing hard exploration games by watching YouTube[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:2935-2945. [27]HENDERSON P,ISLAM R,BACHMAN P,et al.Deep reinforcement learning that matters[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018,32(1):245-256. [28]BELLEMARE M G,SRINIVASAN S,OSTROVSKI G,et al.Unifying count-based exploration and intrinsic motivation[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:1479-1487. [29]WATKINS C J C H.Learning from delayed rewards[D].Cambridge:University of Cambridge,1989. [30]KAELBLING L P,LITTMAN M L,MOORE A W.Reinforcement learning:A survey[J].Journal of Artificial Intelligence Research(S1076-9757),1996,4:237-285. [31]LI C X,CAO L,CHEN X L,et al.Cloud Reasoning Model-based Exploration for Deep Reinforcement Learning[J].Journal of Electronics & Information Technology,2018,40(1):244-248. [32]ECOFFET A,HUIZINGA J,LEHMAN J,et al.First return,then explore[J].Nature,2021,590(7847):580-586. [33]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized Experience Replay[C]//ICLR(Poster).2016:1312-1320. [34]NARASIMHAN K,KULKARNI T,BARZILAY R.LanguageUnderstanding for Text-based Games using Deep Reinforcement Learning[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:1-11. [35]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [36]HOU Y,LIU L,WEI Q,et al.A novel DDPG method withprioritized experience replay[C]//2017 IEEE International Confe-rence on Systems,Man,nd Cybernetics(SMC).IEEE,2017:316-321. [37]HEESS N,WAYNE G,SILVER D,et al.Learning continuous control policies by stochastic value gradients[J].Advances in Neural Information Processing Systems,2015,28:1056-1068. [38]MAHMOOD A R,VAN HASSELT H,SUTTON R S.Weighted importance sampling for off-policy learning with linear function approximation[C]//NIPS.2014:3014-3022. [39]HORGAN D,QUAN J,BUDDEN D,et al.Distributed Prioritized Experience Replay[C]//International Conference on Learning Representations.2018. [40]FEDUS W,RAMACHANDRAN P,AGARWAL R,et al.Re-visiting fundamentals of experience replay[C]//International Conference on Machine Learning.PMLR,2020:3061-3071. [41]PATHAK D,AGRAWAL P,EFROS A A,et al.Curiosity-dri-ven exploration by self-supervised prediction[C]//International Conference on Machine Learning.PMLR,2017:2778-2787. [42]MOHAMED S,REZENDE D J.Variational Information Ma-ximisation for Intrinsically Motivated Reinforcement Learning[C]//NIPS.2015:456-468. [43]HOUTHOOFT R,CHEN X,DUAN Y,et al.VIME:variational information maximizing exploration[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:1117-1125. [44]SILVER D,SINGH S,PRECUP D,et al.Reward is enough[J].Artificial Intelligence,2021:1035-1046. [45]ARGALL B D,CHERNOVA S,VELOSO M,et al.A survey of robot learning from demonstration[J].Robotics and Autonomous Systems,2009,57(5):469-483. [46]SCHAAL S.Is imitation learning the route to humanoid robots?[J].Trends in Cognitive Sciences,1999,3(6):233-242. [47]ABBEEL P,NG A Y.Exploration and apprenticeship learning in reinforcement learning[C]//Proceedings of the 22nd International Conference on Machine Learning.2005:1-8. [48]MUNOS R.Error bounds for approximate policy iteration[C]//ICML.2003:560-567. [49]THIERY C,SCHERRER B.Least-squares λ policy iteration:Bias-variance trade-off in control problems[C]//International Conference on Machine Learning.2010:2058-2072. [50]BERTSEKAS D P.Approximate policy iteration:A survey and some new methods[J].Journal of Control Theory and Applications,2011,9(3):310-335. [51]KIM B,FARAHMAND A,PINEAU J,et al.Learning fromLimited Demonstrations[C]//NIPS.2013:2859-2867. [52]PIOT B,GEIST M,PIETQUIN O.Boosted bellman residual minimization handling expert demonstrations[C]//Joint European Conference on Machine Learning and Knowledge Discoveryin Databases.Berlin:Springer,2014:549-564. [53]CHEMALI J,LAZARIC A.Direct policy iteration with demonstrations[C]//Twenty-Fourth International Joint Conference on Artificial Intelligence.2015:1045-1065. [54]HESTER T,VECERIK M,PIETQUIN O,et al.Deep Q-lear-ning from demonstrations[C]//Thirty-second AAAI Conference on Artificial Intelligence.2018:746-752. [55]VECERIK M,HESTER T,SCHOLZ J,et al.Leveraging de-monstrations for deep reinforcement learning on robotics problems with sparse rewards[J].arXiv:1707.08817,2017. [56]NAIR A,MCGREW B,ANDRYCHOWICZ M,et al.Overcoming exploration in reinforcement learning with demonstrations[C]//2018 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2018:6292-6299. [57]KANG B,JIE Z,FENG J.Policy optimization with demonstrations[C]//International Conference on Machine Learning.PMLR,2018:2469-2478. [58]HO J,ERMON S.Generative Adversarial Imitation Learning[C]//NIPS.2016:198-211. [59]BURDA Y,EDWARDS H,STORKEY A,et al.Exploration by random network distillation[C]//Seventh International Confe-rence on Learning Representations.2019:1-17. [60]LI Z,CHEN X H.Efficient Exploration by Novelty-Pursuit[C]//International Confe-rence on Distributed Artificial Intelligence.Cham:Springer,2020:85-102. [61]NG A Y,HARADA D,RUSSELL S J.Policy Invariance Under Reward Transformations:Theory and Application to Reward Shaping[C]//Proceedings of the Sixteenth International Confe-rence on Machine Learning.1999:278-287. [62]DEVLIN S M,KUDENKO D.Dynamic potential-based reward shaping[C]//Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems.IFAAMAS,2012:433-440. [63]LIU Y,HU Y,GAO Y,et al.Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns[C]//IJCAI.2019:457-463. [64]TIRINZONI A,RODRÍGUEZ-SÁNCHEZ R,RESTELLI M.Transfer of Value Functions via Variational Methods[C]//NeurIPS.2018:6182-6192. [65]GE H,SONG Y,WU C,et al.Cooperative deep Q-learning with Q-value transfer for multi-intersection signal control[J].IEEE Access,2019,7:40797-40809. [66]RUSU A A,COLMENAREJO S G,GÜLÇEHRE Ç,et al.Policy Distillation[C]//ICLR(Poster).2016. [67]TEH Y W,BAPST V,CZARNECKI W M,et al.Distral:Robust multitask reinforcement learning[C]//NIPS.2017. [68]PARISOTTO E,BA L J,SALAKHUTDINOV R.Actor-Mimic:Deep Multitask and Transfer Reinforcement Learning[C]//International Conference on Learning Representations.2016:23-28. [69]YIN H,PAN S J.Knowledge transfer for deep reinforcementlearning with hierarchical experience replay[C]//Thirty-First AAAI Conference on Artificial Intelligence.2017:68-82. [70]ARNEKVIST I,KRAGIC D,STORK J A.Vpe:Variational po-licy embedding for transfer reinforcement learning[C]//2019 International Conference on Robotics and Automation(ICRA).IEEE,2019:36-42. [71]YANG J,PETERSEN B,ZHA H,et al.Single Episode Policy Transfer in Reinforcement Learning[C]//International Confe-rence on Learning Representations.2019:1256-1268. [72]DAYAN P.Improving generalization for temporal differencelearning:The successor representation[J].Neural Computation,1993,5(4):613-624. [73]BARRETO A,DABNEY W,MUNOS R,et al.Successor fea-tures for transfer in reinforcement learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:4058-4068. [74]RUSU A A,RABINOWITZ N C,DESJARDINS G,et al.Progressive neural networks[J].arXiv:1606.04671,2016. [75]ZHANG A,SATIJA H,PINEAU J.Decoupling dynamics andreward for transfer learning[J].arXiv:1804.10689,2018. [76]BARRETO A,BORSA D,QUAN J,et al.Transfer in deep reinforcement learning using successor features and generalized policy improvement[C]//International Conference on Machine Learning.PMLR,2018:501-510. [77]BARRETO A,HOU S,BORSA D,et al.Fast reinforcementlearning with generalized policy updates[J].Proceedings of the National Academy of Sciences,2020,117(48):30079-30087. [78]SCHAUL T,HORGAN D,GREGOR K,et al.Universal Value Function Approximators[C]//International Conference on Machine Learning.PMLR,2015:1312-1320. [79]BORSA D,BARRETO A,QUAN J,et al.Universal Successor Features Approximators[C]//International Conference on Learning Representations.2018:359-369. [80]BRAYLAN A E,MIIKKULAINEN R.Object-model transfer in the general video game domain[C]//Twelfth Artificial Intelligence and Interactive Digital Entertainment Conference.2016:142-168. [81]BARRETT S,STONE P.Cooperating with unknown teammates in complex domains:A robot soccer case study of ad hoc teamwork[C]//Twenty-ninth AAAI Conference on Artificial Intelligence.2015:178-190. [82]BARRETT S,STONE P,KRAUS S,et al.Teamwork with li-mited knowledge of teammates[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2013:6984-6992. [83]ROY S,MINCU D,LOREAUX E,et al.Multitask prediction of organ dysfunction in the intensive care unit using sequential subnetwork routing[J].Journal of the American Medical Informa-tics Association,2021,28(9):986-997. [84]JOHNSON A E W,POLLARD T J,SHEN L,et al.MIMIC-III,a freely accessible critical care database[J].Scientific Data,2016,3(1):1-9. [85]MCGRATH T,KAPISHNIKOV A,TOMAEV N,et al.Ac-quisition of Chess Knowledge in AlphaZero[J].arXiv:2111.09259,2021. [86]VON RUEDEN L,MAYER S,BECKH K,et al.Informed Machine Learning-A Taxonomy and Survey of Integrating Prior Knowledge into Learning Systems[J].IEEE Transactions on Knowledge and Data Engineering,2021,35(1):12-25. [87]AMMANABROLU P,RIEDL M.Transfer in Deep Reinforcement Learning Using Knowledge Graphs[C]//Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing(TextGraphs-13).2019:1-10. [88]HU Y,GAO Y,AN B.Accelerating multiagent reinforcementlearning by equilibrium transfer[J].IEEE Transactions on Cybernetics,2014,45(7):1289-1302. |
[1] | LIN Xiangyang, XING Qinghua, XING Huaixi. Study on Intelligent Decision Making of Aerial Interception Combat of UAV Group Based onMADDPG [J]. Computer Science, 2023, 50(6A): 220700031-7. |
[2] | WANG Tianran, WANG Qi, WANG Qingshan. Transfer Learning Based Cross-object Sign Language Gesture Recognition Method [J]. Computer Science, 2023, 50(6A): 220300232-5. |
[3] | WANG Dongli, YANG Shan, OUYANG Wanli, LI Baopu, ZHOU Yan. Explainability of Artificial Intelligence:Development and Application [J]. Computer Science, 2023, 50(6A): 220600212-7. |
[4] | HU Mingyang, GUO Yan, JIN Yangshuang. PSwin:Edge Detection Algorithm Based on Swin Transformer [J]. Computer Science, 2023, 50(6): 194-199. |
[5] | WANG Hanmo, ZHENG Shijie, XU Ruonan, GUO Bin, WU Lei. Self Reconfiguration Algorithm of Modular Robot Based on Swarm Agent Deep Reinforcement Learning [J]. Computer Science, 2023, 50(6): 266-273. |
[6] | MIAO Kuan, LI Chongshou. Optimization Algorithms for Job Shop Scheduling Problems Based on Correction Mechanisms and Reinforcement Learning [J]. Computer Science, 2023, 50(6): 274-282. |
[7] | WANG Zihan, TONG Xiangrong. Research Progress of Multi-agent Path Finding Based on Conflict-based Search Algorithms [J]. Computer Science, 2023, 50(6): 358-368. |
[8] | SHI Liang, WEN Liangming, LEI Sheng, LI Jianhui. Virtual Machine Consolidation Algorithm Based on Decision Tree and Improved Q-learning by Uniform Distribution [J]. Computer Science, 2023, 50(6): 36-44. |
[9] | XING Ying. Review of Software Engineering Techniques and Methods Based on Explainable Artificial Intelligence [J]. Computer Science, 2023, 50(5): 3-11. |
[10] | YU Ze, NING Nianwen, ZHENG Yanliu, LYU Yining, LIU Fuqiang, ZHOU Yi. Review of Intelligent Traffic Signal Control Strategies Driven by Deep Reinforcement Learning [J]. Computer Science, 2023, 50(4): 159-171. |
[11] | WANG Xiaofei, FAN Xueqiang, LI Zhangwei. Improving RNA Base Interactions Prediction Based on Transfer Learning and Multi-view Feature Fusion [J]. Computer Science, 2023, 50(3): 164-172. |
[12] | HU Zhongyuan, XUE Yu, ZHA Jiajie. Survey on Evolutionary Recurrent Neural Networks [J]. Computer Science, 2023, 50(3): 254-265. |
[13] | XU Linling, ZHOU Yuan, HUANG Hongyun, LIU Yang. Real-time Trajectory Planning Algorithm Based on Collision Criticality and Deep Reinforcement Learning [J]. Computer Science, 2023, 50(3): 323-332. |
[14] | Cui ZHANG, En WANG, Funing YANG, Yong jian YANG , Nan JIANG. UAV Frequency-based Crowdsensing Using Grouping Multi-agentDeep Reinforcement Learning [J]. Computer Science, 2023, 50(2): 57-68. |
[15] | LI Xiaoling, WU Haotian, ZHOU Tao, LU Hui. Password Guessing Model Based on Reinforcement Learning [J]. Computer Science, 2023, 50(1): 334-341. |
|