Computer Science ›› 2023, Vol. 50 ›› Issue (5): 201-216.doi: 10.11896/jsjkx.220400235

• Artificial Intelligence • Previous Articles     Next Articles

Survey on Knowledge Transfer Method in Deep Reinforcement Learning

ZHANG Qiyang, CHEN Xiliang, CAO Lei, LAI Jun, SHENG Lei   

  1. College of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210007,China
  • Received:2022-04-24 Revised:2022-07-12 Online:2023-05-15 Published:2023-05-06
  • About author:ZHANG Qiyang,born in 1998,postgra-duate.His main research interests include deep reinforcement learning and knowledge transfer.
    CHEN Xiliang,born in 1985,Ph.D,associate professor.His main research interests include command information system engineering and deep reinforcement learning.
  • Supported by:
    National Natural Science Foundation of China(61806221).

Abstract: Deep reinforcement learning is a hot issue in artificial intelligence research.With the deepening of research,some shortcomings are gradually exposed,such as low data utilization,weak generalization ability,difficult exploration,lack of reasoning and representation ability,etc.These problems greatly restrict the application of deep reinforcement learning method in practical pro-blems.Knowledge transfer is a very effective method to solve this problem.This study discusses how to use knowledge transfer to accelerate the process of agent training and cross domain transfer from the perspective of deep reinforcement learning,analyzes the existing forms and action modes of knowledge in deep reinforcement learning,and classifies and summarizes the knowledge transfer methods in deep reinforcement learning according to the basic elements of reinforcement learning.Finally,the existing problems and cutting-edge development direction of knowledge transfer in deep reinforcement learning in algorithm,theory and application are reported.

Key words: Artificial intelligence, Knowledge transfer, Reinforcement learning, Deep reinforcement learning, Transfer learning

CLC Number: 

  • TP181
[1]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[J].arXiv:1312.5602,2013.
[2]TORRADO R R,BONTRAGER P,TOGELIUS J,et al.Deep Reinforcement Learning for General Video Game AI[C]//14th IEEE Conference on Computational Intelligence and Games,CIG 2018.IEEE Computer Society,2018:14-17.
[3]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of go without human knowledge[J].Nature,2017,550(7676):354-359.
[4]GU S,HOLLY E,LILLICRAP T,et al.Deep reinforcementlearning for robotic manipulation with asynchronous off-policy updates[C]//2017 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2017:3389-3396.
[5]LI J,MONROE W,RITTER A,et al.Deep ReinforcementLearning for Dialogue Generation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Proces-sing.2016:1192-1202.
[6]ANDERSEN P A,GOODWIN M,GRANMO O C.Deep RTS:a game environment for deep reinforcement learning in real-time strategy games[C]//2018 IEEE Conference on Computational Intelligence and Games(CIG).IEEE,2018:1-8.
[7]LING Y,HASAN S A,DATLA V,et al.Learning to diagnose:assimilating clinical narratives using deep reinforcement learning[C]//Proceedings of the Eighth International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2017:895-905.
[8]HESSEL M,MODAYIL J,VAN HASSELT H,et al.Rainbow:Combining improvements in deep reinforcement learning[C]//Thirty-second AAAI Conference on Artificial Intelligence.2018:156-167.
[9]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].Massachusetts:MIT press,2018.
[10]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[11]SILVER D,LEVER G,HEESS N,et al.Deterministic policygradient algorithms[C]//International Conference on Machine Learning.PMLR,2014:387-395.
[12]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[C]//ICLR(Poster).2016.
[13]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1928-1937.
[14]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region po-licy optimization[C]//International Conference on Machine Learning.PMLR,2015:1889-1897.
[15]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[16]RASHID T,SAMVELYAN M,SCHROEDER C,et al.Qmix:Monotonic value function factorisation for deep multi-agent reinforcement learning[C]//International Conference on Machine Learning.PMLR,2018:4295-4304.
[17]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-Decomposition Networks For Cooperative Multi-Agent Learning Based on Team Reward[C]//Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems.2018:2085-2087.
[18]LOWE R,WU Y,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:6382-6393.
[19]ASLANIDES J,LEIKE J,HUTTER M.Universal reinforce-ment learning algorithms:survey and experiments[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence.2017:1403-1410.
[20]WATKINS C J C H,DAYAN P.Q-learning[J].Machine Lear-ning,1992,8(3/4):279-292.
[21]FISCHER A,IGEL C.An introduction to restricted Boltzmann machines[C]//Iberoamerican Congress on Pattern Recognition.Berlin:Springer,2012:14-36.
[22]RIEDMILLER M.Neural fitted Q iteration-first experienceswith a data efficient neural reinforcement learning method[C]//European Conference on Machine Learning.Berlin:Springer,2005:317-328.
[23]ZHOU Z H,FENG J.Deep forest:towards an alternative to deep neural networks[C]//Proceedings of the 26th Interna-tional Joint Conference on Artificial Intelligence.2017:3553-3559.
[24]YOSINSKI J,CLUNE J,BENGIO Y,et al.How transferable are features in deep neural networks?[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2.2014:3320-3328.
[25]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks[C]//NIPS.2012:654-669.
[26]AYTAR Y,PFAFF T,BUDDEN D,et al.Playing hard exploration games by watching YouTube[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:2935-2945.
[27]HENDERSON P,ISLAM R,BACHMAN P,et al.Deep reinforcement learning that matters[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018,32(1):245-256.
[28]BELLEMARE M G,SRINIVASAN S,OSTROVSKI G,et al.Unifying count-based exploration and intrinsic motivation[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:1479-1487.
[29]WATKINS C J C H.Learning from delayed rewards[D].Cambridge:University of Cambridge,1989.
[30]KAELBLING L P,LITTMAN M L,MOORE A W.Reinforcement learning:A survey[J].Journal of Artificial Intelligence Research(S1076-9757),1996,4:237-285.
[31]LI C X,CAO L,CHEN X L,et al.Cloud Reasoning Model-based Exploration for Deep Reinforcement Learning[J].Journal of Electronics & Information Technology,2018,40(1):244-248.
[32]ECOFFET A,HUIZINGA J,LEHMAN J,et al.First return,then explore[J].Nature,2021,590(7847):580-586.
[33]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized Experience Replay[C]//ICLR(Poster).2016:1312-1320.
[34]NARASIMHAN K,KULKARNI T,BARZILAY R.LanguageUnderstanding for Text-based Games using Deep Reinforcement Learning[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:1-11.
[35]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[36]HOU Y,LIU L,WEI Q,et al.A novel DDPG method withprioritized experience replay[C]//2017 IEEE International Confe-rence on Systems,Man,nd Cybernetics(SMC).IEEE,2017:316-321.
[37]HEESS N,WAYNE G,SILVER D,et al.Learning continuous control policies by stochastic value gradients[J].Advances in Neural Information Processing Systems,2015,28:1056-1068.
[38]MAHMOOD A R,VAN HASSELT H,SUTTON R S.Weighted importance sampling for off-policy learning with linear function approximation[C]//NIPS.2014:3014-3022.
[39]HORGAN D,QUAN J,BUDDEN D,et al.Distributed Prioritized Experience Replay[C]//International Conference on Learning Representations.2018.
[40]FEDUS W,RAMACHANDRAN P,AGARWAL R,et al.Re-visiting fundamentals of experience replay[C]//International Conference on Machine Learning.PMLR,2020:3061-3071.
[41]PATHAK D,AGRAWAL P,EFROS A A,et al.Curiosity-dri-ven exploration by self-supervised prediction[C]//International Conference on Machine Learning.PMLR,2017:2778-2787.
[42]MOHAMED S,REZENDE D J.Variational Information Ma-ximisation for Intrinsically Motivated Reinforcement Learning[C]//NIPS.2015:456-468.
[43]HOUTHOOFT R,CHEN X,DUAN Y,et al.VIME:variational information maximizing exploration[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:1117-1125.
[44]SILVER D,SINGH S,PRECUP D,et al.Reward is enough[J].Artificial Intelligence,2021:1035-1046.
[45]ARGALL B D,CHERNOVA S,VELOSO M,et al.A survey of robot learning from demonstration[J].Robotics and Autonomous Systems,2009,57(5):469-483.
[46]SCHAAL S.Is imitation learning the route to humanoid robots?[J].Trends in Cognitive Sciences,1999,3(6):233-242.
[47]ABBEEL P,NG A Y.Exploration and apprenticeship learning in reinforcement learning[C]//Proceedings of the 22nd International Conference on Machine Learning.2005:1-8.
[48]MUNOS R.Error bounds for approximate policy iteration[C]//ICML.2003:560-567.
[49]THIERY C,SCHERRER B.Least-squares λ policy iteration:Bias-variance trade-off in control problems[C]//International Conference on Machine Learning.2010:2058-2072.
[50]BERTSEKAS D P.Approximate policy iteration:A survey and some new methods[J].Journal of Control Theory and Applications,2011,9(3):310-335.
[51]KIM B,FARAHMAND A,PINEAU J,et al.Learning fromLimited Demonstrations[C]//NIPS.2013:2859-2867.
[52]PIOT B,GEIST M,PIETQUIN O.Boosted bellman residual minimization handling expert demonstrations[C]//Joint European Conference on Machine Learning and Knowledge Discoveryin Databases.Berlin:Springer,2014:549-564.
[53]CHEMALI J,LAZARIC A.Direct policy iteration with demonstrations[C]//Twenty-Fourth International Joint Conference on Artificial Intelligence.2015:1045-1065.
[54]HESTER T,VECERIK M,PIETQUIN O,et al.Deep Q-lear-ning from demonstrations[C]//Thirty-second AAAI Conference on Artificial Intelligence.2018:746-752.
[55]VECERIK M,HESTER T,SCHOLZ J,et al.Leveraging de-monstrations for deep reinforcement learning on robotics problems with sparse rewards[J].arXiv:1707.08817,2017.
[56]NAIR A,MCGREW B,ANDRYCHOWICZ M,et al.Overcoming exploration in reinforcement learning with demonstrations[C]//2018 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2018:6292-6299.
[57]KANG B,JIE Z,FENG J.Policy optimization with demonstrations[C]//International Conference on Machine Learning.PMLR,2018:2469-2478.
[58]HO J,ERMON S.Generative Adversarial Imitation Learning[C]//NIPS.2016:198-211.
[59]BURDA Y,EDWARDS H,STORKEY A,et al.Exploration by random network distillation[C]//Seventh International Confe-rence on Learning Representations.2019:1-17.
[60]LI Z,CHEN X H.Efficient Exploration by Novelty-Pursuit[C]//International Confe-rence on Distributed Artificial Intelligence.Cham:Springer,2020:85-102.
[61]NG A Y,HARADA D,RUSSELL S J.Policy Invariance Under Reward Transformations:Theory and Application to Reward Shaping[C]//Proceedings of the Sixteenth International Confe-rence on Machine Learning.1999:278-287.
[62]DEVLIN S M,KUDENKO D.Dynamic potential-based reward shaping[C]//Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems.IFAAMAS,2012:433-440.
[63]LIU Y,HU Y,GAO Y,et al.Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns[C]//IJCAI.2019:457-463.
[64]TIRINZONI A,RODRÍGUEZ-SÁNCHEZ R,RESTELLI M.Transfer of Value Functions via Variational Methods[C]//NeurIPS.2018:6182-6192.
[65]GE H,SONG Y,WU C,et al.Cooperative deep Q-learning with Q-value transfer for multi-intersection signal control[J].IEEE Access,2019,7:40797-40809.
[66]RUSU A A,COLMENAREJO S G,GÜLÇEHRE Ç,et al.Policy Distillation[C]//ICLR(Poster).2016.
[67]TEH Y W,BAPST V,CZARNECKI W M,et al.Distral:Robust multitask reinforcement learning[C]//NIPS.2017.
[68]PARISOTTO E,BA L J,SALAKHUTDINOV R.Actor-Mimic:Deep Multitask and Transfer Reinforcement Learning[C]//International Conference on Learning Representations.2016:23-28.
[69]YIN H,PAN S J.Knowledge transfer for deep reinforcementlearning with hierarchical experience replay[C]//Thirty-First AAAI Conference on Artificial Intelligence.2017:68-82.
[70]ARNEKVIST I,KRAGIC D,STORK J A.Vpe:Variational po-licy embedding for transfer reinforcement learning[C]//2019 International Conference on Robotics and Automation(ICRA).IEEE,2019:36-42.
[71]YANG J,PETERSEN B,ZHA H,et al.Single Episode Policy Transfer in Reinforcement Learning[C]//International Confe-rence on Learning Representations.2019:1256-1268.
[72]DAYAN P.Improving generalization for temporal differencelearning:The successor representation[J].Neural Computation,1993,5(4):613-624.
[73]BARRETO A,DABNEY W,MUNOS R,et al.Successor fea-tures for transfer in reinforcement learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:4058-4068.
[74]RUSU A A,RABINOWITZ N C,DESJARDINS G,et al.Progressive neural networks[J].arXiv:1606.04671,2016.
[75]ZHANG A,SATIJA H,PINEAU J.Decoupling dynamics andreward for transfer learning[J].arXiv:1804.10689,2018.
[76]BARRETO A,BORSA D,QUAN J,et al.Transfer in deep reinforcement learning using successor features and generalized policy improvement[C]//International Conference on Machine Learning.PMLR,2018:501-510.
[77]BARRETO A,HOU S,BORSA D,et al.Fast reinforcementlearning with generalized policy updates[J].Proceedings of the National Academy of Sciences,2020,117(48):30079-30087.
[78]SCHAUL T,HORGAN D,GREGOR K,et al.Universal Value Function Approximators[C]//International Conference on Machine Learning.PMLR,2015:1312-1320.
[79]BORSA D,BARRETO A,QUAN J,et al.Universal Successor Features Approximators[C]//International Conference on Learning Representations.2018:359-369.
[80]BRAYLAN A E,MIIKKULAINEN R.Object-model transfer in the general video game domain[C]//Twelfth Artificial Intelligence and Interactive Digital Entertainment Conference.2016:142-168.
[81]BARRETT S,STONE P.Cooperating with unknown teammates in complex domains:A robot soccer case study of ad hoc teamwork[C]//Twenty-ninth AAAI Conference on Artificial Intelligence.2015:178-190.
[82]BARRETT S,STONE P,KRAUS S,et al.Teamwork with li-mited knowledge of teammates[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2013:6984-6992.
[83]ROY S,MINCU D,LOREAUX E,et al.Multitask prediction of organ dysfunction in the intensive care unit using sequential subnetwork routing[J].Journal of the American Medical Informa-tics Association,2021,28(9):986-997.
[84]JOHNSON A E W,POLLARD T J,SHEN L,et al.MIMIC-III,a freely accessible critical care database[J].Scientific Data,2016,3(1):1-9.
[85]MCGRATH T,KAPISHNIKOV A,TOMAŠEV N,et al.Ac-quisition of Chess Knowledge in AlphaZero[J].arXiv:2111.09259,2021.
[86]VON RUEDEN L,MAYER S,BECKH K,et al.Informed Machine Learning-A Taxonomy and Survey of Integrating Prior Knowledge into Learning Systems[J].IEEE Transactions on Knowledge and Data Engineering,2021,35(1):12-25.
[87]AMMANABROLU P,RIEDL M.Transfer in Deep Reinforcement Learning Using Knowledge Graphs[C]//Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing(TextGraphs-13).2019:1-10.
[88]HU Y,GAO Y,AN B.Accelerating multiagent reinforcementlearning by equilibrium transfer[J].IEEE Transactions on Cybernetics,2014,45(7):1289-1302.
[1] LIN Xiangyang, XING Qinghua, XING Huaixi. Study on Intelligent Decision Making of Aerial Interception Combat of UAV Group Based onMADDPG [J]. Computer Science, 2023, 50(6A): 220700031-7.
[2] WANG Tianran, WANG Qi, WANG Qingshan. Transfer Learning Based Cross-object Sign Language Gesture Recognition Method [J]. Computer Science, 2023, 50(6A): 220300232-5.
[3] WANG Dongli, YANG Shan, OUYANG Wanli, LI Baopu, ZHOU Yan. Explainability of Artificial Intelligence:Development and Application [J]. Computer Science, 2023, 50(6A): 220600212-7.
[4] HU Mingyang, GUO Yan, JIN Yangshuang. PSwin:Edge Detection Algorithm Based on Swin Transformer [J]. Computer Science, 2023, 50(6): 194-199.
[5] WANG Hanmo, ZHENG Shijie, XU Ruonan, GUO Bin, WU Lei. Self Reconfiguration Algorithm of Modular Robot Based on Swarm Agent Deep Reinforcement Learning [J]. Computer Science, 2023, 50(6): 266-273.
[6] MIAO Kuan, LI Chongshou. Optimization Algorithms for Job Shop Scheduling Problems Based on Correction Mechanisms and Reinforcement Learning [J]. Computer Science, 2023, 50(6): 274-282.
[7] WANG Zihan, TONG Xiangrong. Research Progress of Multi-agent Path Finding Based on Conflict-based Search Algorithms [J]. Computer Science, 2023, 50(6): 358-368.
[8] SHI Liang, WEN Liangming, LEI Sheng, LI Jianhui. Virtual Machine Consolidation Algorithm Based on Decision Tree and Improved Q-learning by Uniform Distribution [J]. Computer Science, 2023, 50(6): 36-44.
[9] XING Ying. Review of Software Engineering Techniques and Methods Based on Explainable Artificial Intelligence [J]. Computer Science, 2023, 50(5): 3-11.
[10] YU Ze, NING Nianwen, ZHENG Yanliu, LYU Yining, LIU Fuqiang, ZHOU Yi. Review of Intelligent Traffic Signal Control Strategies Driven by Deep Reinforcement Learning [J]. Computer Science, 2023, 50(4): 159-171.
[11] WANG Xiaofei, FAN Xueqiang, LI Zhangwei. Improving RNA Base Interactions Prediction Based on Transfer Learning and Multi-view Feature Fusion [J]. Computer Science, 2023, 50(3): 164-172.
[12] HU Zhongyuan, XUE Yu, ZHA Jiajie. Survey on Evolutionary Recurrent Neural Networks [J]. Computer Science, 2023, 50(3): 254-265.
[13] XU Linling, ZHOU Yuan, HUANG Hongyun, LIU Yang. Real-time Trajectory Planning Algorithm Based on Collision Criticality and Deep Reinforcement Learning [J]. Computer Science, 2023, 50(3): 323-332.
[14] Cui ZHANG, En WANG, Funing YANG, Yong jian YANG , Nan JIANG. UAV Frequency-based Crowdsensing Using Grouping Multi-agentDeep Reinforcement Learning [J]. Computer Science, 2023, 50(2): 57-68.
[15] LI Xiaoling, WU Haotian, ZHOU Tao, LU Hui. Password Guessing Model Based on Reinforcement Learning [J]. Computer Science, 2023, 50(1): 334-341.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!