Computer Science ›› 2024, Vol. 51 ›› Issue (3): 183-197.doi: 10.11896/jsjkx.230400058
• Artificial Intelligence • Previous Articles Next Articles
WANG Yao1,2, LUO Junren1, ZHOU Yanzhong1, GU Xueqiang1, ZHANG Wanpeng1
CLC Number:
[1]DRUGAN M M.Reinforcement learning versus evolutionarycomputation:A survey on hybrid algorithms[J].Swarm and Evolutionary Computation,2019,44:228-246. [2]FRANCOIS-LAVET V,HENDERSON P,ISLAM R,et al.An introduction to deep reinforcement learning[J].Foundations and Trends in Machine Learning,2018,11(3/4):219-354. [3]GONG X,YU J,LU S,et al.Actor-critic with familiarity-based trajectory experience replay[J].Information Sciences,2022,582:633-647. [4]EIBEN A E,SMITH J E.Introduction to evolutionary computing[M].Berlin Heidelberg:Springer-Verlag,2015. [5]GE J K,QIU Y H,WU C M,et al.Survey on Genetic Algorithm[J].Computer Application Research,2008,25(10):2911-2916. [6]YANG T,TANG H,BAI C,et al.Exploration in deep reinforcement learning:a comprehensive survey [J].arXiv:2109.06668,2021. [7]AMIN S,GOMROKCHI M,SATIJA H,et al.A survey of exploration methods in reinforcement learning [J].arXiv:2109.00157,2021. [8]MAJID A Y,SAAYBI S,VAN RIETBERGEN T,et al.Deepreinforcement learning versus evolution strategies:a comparative survey [J].arXiv:2110.01411,2021. [9]SIGAUD O.Combining Evolution and Deep ReinforcementLearning for Policy Search:A Survey [J].ACM Transactions on Evolutionary Learning and Optimization,2023,3(3):1-20. [10]BAI H,CHENG R,JIN Y.Evolutionary Reinforcement Lear-ning:A Survey [J].arXiv:2303.04150,2023. [11]SUTTON R S,MCALLESTER D,SINGH S,et al.Policy gradient methods for reinforcement learning with function approximation[J].Advances in Neural Information Processing Systems,1999,12:1057-1063. [12]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning.New York:ACM,2016:1928-1937. [13]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region po-licy optimization[C]//International Conference on Machine Learning.New York:ACM,2015:1889-1897. [14]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithms [J].arXiv:1707.06347,2017. [15]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning [J].arXiv:1312.5602,2013. [16]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Menlo Park CA:AAAI,2016. [17]WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning.New York:ACM,2016:1995-2003. [18]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning [J].arXiv:1509.02971,2015. [19]FUJIMOTO S,MEGER D,PRECUP D.Off-policy deep rein-forcement learning without exploration[C]//International Conference on Machine Learning.New York:ACM,2019:2052-2062. [20]KUMAR A,FU J,SOH M.Stabilizing off-policy Q-learning via bootstrapping error reduction[J].Advances in Neural Information Processing Systems,2019,32(11):11784-11794. [21]WU Y,TUCKER G,NACHUM O,Behavior regularized offline reinforcement learning [J].arXiv:1911.11361,2019. [22]KOSTRIKOV I,FERGUS R,TOMPSON J,et al.Offline reinforcement learning with fisher divergence critic regularization[C]//International Conference on Machine Learning.New York:ACM,2021:5774-5783. [23]WANG Q,XIONG J,HAN L,et al.Exponentially weighted imitation learning for batched historical data[J].Advances in Neural Information Processing Systems,2018,31:6288-6297. [24]NAIR A,GUPTA A,DALAL M,et al.AWAC:Acceleratingonline reinforcement learning with offline dataset [J].arXiv:2006.09359,2020. [25]FUJIMOTO S,GU S S.A minimalist approach to offline reinforcement learning[J].Advances in Neural Information Proces-sing Systems,2021,34:20132-20145. [26]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft actor-critic:Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International Conference on Machine Learning.New York:ACM,2018:1861-1870. [27]NACHUM O,DAI B,KOSTRIKOV I,et al.Algaedice:Policy gradient from arbitrary experience[J].arXiv:1912.02074,2019. [28]KUMAR A,ZHOU A,TUCKER G,et al.Conservative q-lear-ning for offline reinforcement learning[J].Advances in Neural Information Processing Systems,2020,33:1179-1191. [29]AGARWAL R,SCHUURMANS D,NOROUZI M.An optimistic perspective on offline reinforcement learning[C]//International Conference on Machine Learning.New York:ACM,2020:104-114. [30]KIDAMBI R,RAJESWARAN A,NETRAPALLI P,et al.Morel:Model-based offline reinforcement learning[J].Advances in Neural Information Processing Systems,2020,33:21810-21823. [31]YU T,THOMAS G,YU L,et al.MOPO:Model-based offline policy optimization[J].Advances in Neural Information Proces-sing Systems,2020,33:14129-14142. [32]MATSUSHIMA T,FURUTA H,MATSUO Y,et al.Deploy-ment-efficient reinforcement learning via model-based offline optimization [J].arXiv:2006.03647,2020. [33]YU T,KUMAR A,RAFAILOV R,et al.Combo:Conservative offline model-based policy optimization[J].Advances in Neural Information Processing Systems,2021,34:28954-28967. [34]MIRJALILI S.Evolutionary Algorithms and Neural Networks:Theory and Applications[M].Springer,2019. [35]HANSEN N,ARNOLD D V,AUGER A.Evolution strategies[M]//Springer Handbook Of Computational Intelligence.Berlin Heidelberg:Springer,2015. [36]HAUSCHILD M,PELIKAN M.An introduction and survey of estimation of distribution algorithms[J].Swarm and Evolutio-nary Computation,2011,1(3):111-128. [37]HANSEN N,OSTERMEIER A.Adapting arbitrary normal mutation distributions in evolution strategies:The covariance matrix adaptation[C]// Proceedings of IEEE International Confe-rence on Evolutionary Computation.Nagoya Japan:IEEE,1996:312-317. [38]WIERSTRA D,SCHAUL T,GLASMACHERS T,et al.Natural evolution strategies[J].The Journal of Machine Learning Research,2014,15(1):949-980. [39]MA X,LI X,ZHANG Q,et al.A survey on cooperative co-evolutionary algorithms[J].IEEE Transactions on Evolutionary Computation,2018,23(3):421-441. [40]SALIMANS T,HO J,CHEN X,et al.Evolution strategies as a scalable alternative to reinforcement learning [J].arXiv:1703.03864,2017. [41]MORITZ P,NISHIHARA R,WANG S,et al.Ray:A distributed framework for emerging AI applications[C]//13th USENIX Symposium on Operating Systems Design and Implementation.Carlsbad,CA,2018:561-577. [42]LIANG E,LIAW R,NISHIHARA R,et al.Ray rllib:A composable and scalable reinforcement learning library [J].arXiv:1712.09381,2017. [43]AUER P,CESA-BIANCHI N,FISCHER P.Finite-time analysis of the multiarmed bandit problem[J].Machine Learning,2002,47:235-256. [44]RUSSO D J,VAN ROY B,KAZEROUNI A,et al.A tutorial on Thompson sampling[J].Foundations and Trends© in Machine Learning,2018,11(1):1-96. [45]KIRSCHNER J,KRAUSE A.Information directed samplingand bandits with heteroscedastic noise[C]// Conference on Learning Theory.New York:PMLR,2018:358-384. [46]NIKOLOV N,KIRSCHNER J,BERKENKAMP F,et al.Information-directed exploration for deep reinforcement learning [J].arXiv:1812.07544,2019. [47]MAVRIN B,YAO H,KONG L,et al.Distributional reinforcement learning for efficient exploration[C]//International Conference on Machine Learning.New York:ACM,2019:4424-4434. [48]ZHOU F,WANG J,FENG X.Non-crossing quantile regression for distributional reinforcement learning[J].Advances in Neural Information Processing Systems,2020,33:15909-15919. [49]OSBAND I,VAN ROY B.Bootstrapped Thompson samplingand deep exploration [J].arXiv:1507.00300,2015. [50]OSBAND I,BLUNDELL C,PRITZEL A,et al.Deep exploration via bootstrapped DQN[J].Advances in Neural Information Processing Systems,2016,29:4026-4034. [51]ZHANG Y,GOH W B.Bootstrapped policy gradient for difficulty adaptation in intelligent tutoring systems[C]//Proceedings of the 18th International Conference on Autonomous Agents and Multi-Agent Systems.Cham Switzerland:Springer,2019:711-719. [52]KALWEIT G,BOEDECKER J.Uncertainty-driven imagination for continuous deep reinforcement learning[C]//Conference on Robot Learning.New York:PMLR,2017:195-206. [53]YANG Z,MERRICK K E,ABBASS H A,et al.Multi-task deep reinforcement learning for continuous action control[C]//International Joint Conference on Artificial Intelligence.San Francisco,USA:Morgan Kaufmann,2017:3301-3307. [54]ZHENG Z,YUAN C,CHENG Y.Self-adaptive double boot-strapped DDPG[C]//International Joint Conference on Artificial Intelligence.San Francisco,USA,Morgan Kaufmann,2018:3198-3204. [55]OSBAND I,VAN ROY B,WEN Z.Generalization and exploration via randomized value functions[C]// International Confe-rence on Machine Learning.New York:ACM,2016:2377-2386. [56]ZANETTE A,BRANDFONBRENER D,BRUNSKILL E,et al.Frequentist regret bounds for randomized least-squares value iteration[C]//International Conference on Artificial Intelligence and Statistics.New York:ACM,2020:1954-1964. [57]AZIZZADENESHELI K,BRUNSKILL E,ANANDKUMARA.Efficient exploration through Bayesian deep q-networks[C]//2018 Information Theory and Applications Workshop(ITA).CA USA:IEEE,2018:1-9. [58]STADIE B C,LEVINE S,ABBEEL P.Incentivizing exploration in reinforcement learning with deep predictive models [J].ar-Xiv:1507.00814,2015. [59]PATHAK D,AGRAWAL P,EFROS A A,et al.Curiosity-dri-ven exploration by self-supervised prediction[C]//International Conference on Machine Learning.New York:ACM,2017:2778-2787. [60]CHARIKAR M S.Similarity estimation techniques from roun-ding algorithms[C]//Proceedings of the Thirty-fourth Annual ACM Symposium on Theory of Computing.Quebec,Canada,2002:380-388. [61]BELLEMARE M,SRINIVASAN S,OSTROVSKI G,et al.Unifying count-based exploration and intrinsic motivation[J].Advances in Neural Information Processing Systems,2016,29:1471-1479. [62]OSTROVSKI G,BELLEMARE M G,OORD A,et al.Count-based exploration with neural density models[C]//International Conference on Machine Learning.New York:ACM,2017:2721-2730. [63]BELLEMARE M,VENESS J,TALVITIE E.Skip context tree switching[C]// International Conference on Machine Learning.New York:ACM,2014:1458-1466. [64]MACHADO M C,BELLEMARE M G,BOWLING M.Count-based exploration with the successor representation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Menlo Park,CA:AAAI,2020:5125-5133. [65]CHOSHEN L,FOX L,LOEWENSTEIN Y.Dora the explorer:Directed outreaching reinforcement action-selection [J].arXiv:1804.04012,2018. [66]CHOI J,GUO Y,MOCZULSKI M,et al.Contingency-aware exploration in reinforcement learning [J].arXiv:1811.01483,2019. [67]BURDA Y,EDWARDS H,STORKEY A,et al.Exploration by random network distillation[C]//Seventh International Confe-rence on Learning Representations.New Orleans,LA,2019:1-17. [68]FU J,CO-REYES J,LEVINE S.Ex2:Exploration with exemplar models for deep reinforcement learning[J].Advances in Neural Information Processing Systems,2017,30:2577-2587. [69]ZHANG J,WETZEL N,DORKA N,et al.Scheduled intrinsic drive:A hierarchical take on intrinsically motivated exploration [J].arXiv:1903.07400,2019. [70]BELLEMARE M,SRINIVASAN S,OSTROVSKI G,et al.Unifying count-based exploration and intrinsic motivation[J].Advances in Neural Information Processing Systems,2016,29:1479-1487. [71]HOUTHOOFT R,CHEN X,DUAN Y,et al.VIME:variational information maximizing exploration[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.Cambridge,MA:MIT Press,2016:1117-1125. [72]SAVINOV N,RAICHUK A,MARINIER R,et al.Episodic Curiosity through Reachability [J].arXiv:1810.02274,2019. [73]TAO R Y,FRANÇOIS-LAVET V,PINEAU J.Novelty search in representational space for sample efficient exploration[J].Advances in Neural Information Processing Systems,2020,33:8114-8126. [74]BADIA A P,SPRECHMANN P,VITVITSKYI A,et al.Never give up:Learning directed exploration strategies [J].arXiv:2002.06038,2020. [75]SCHAUL T,SCHMIDHUBER J.Meta learning[J].Scholarpedia,2010,5(6):4650. [76]GUPTA A,MENDONCA R,LIU Y X,et al.Meta-reinforce-ment learning of structured exploration strategies[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.Cambridge,MA:MIT Pres,2018:5307-5316. [77]HOCHREITER S,YOUNGER A S,CONWELL P R.Learning to learn using gradient descent[J].Lecture Notesin Computer Science,2001,2130:87-94. [78]WANG J,KURTH-NELSON Z,SOYER H,et al.Learning to reinforcement learn [J].arXiv:1611.05763,2017. [79]LI H,CHEN W C,LEVY A,et al.One-shot learning with me-mory-augmented neural networks using a 64-kbit,118 GOPS/W RRAM-based non-volatile associative memory[C]//2021 Symposium on VLSI Technology.Piscataway,NJ:IEEE,2021:1-2. [80]DUAN Y,SCHULMAN J,CHEN X.Rl2:Fast reinforcementlearning via slow reinforcement learning [J].arXiv:1611.02779,2016. [81]ROBLES J G,VANSCHOREN J.Learning to reinforcementlearn for neural architecture search [J].arXiv:1911.03769,2019. [82]MISHRA N,ROHANINEJAD M,CHEN X,et al.A simpleneural attentive meta-learner [J].arXiv:1707.03141,2018. [83]FINN C,ABBEEL P,LEVINE S.Model-agnostic meta-learning for fast adaptation of deep networks[C]//International Confe-rence on Machine Learning.New York:ACM,2017:1126-1135. [84]NICHOL A,ACHIAM J,SCHULMAN J.On First-Order Meta-Learning Algorithms [J].arXiv:1803.02999,2018. [85]ZHANG J X,TRAN H,ZHANG G N.Accelerating reinforcement learning with a Directional-Gaussian-Smoothing evolution strategy[J].Electronic Research Archive,2021,29(6):4119-4135. [86]JADERBERG M,DALIBARD V,OSINDERO S,et al.Population based training of neural networks [J].arXiv:1711.09846,2017. [87]FRANKE J K II,KOEHLER G,BIEDENKAPP A,et al.Sample efficient automated deep reinforcement learning[C]//Procee-dings of the 8th International Conference on Learning Representations.New Orleans,LA:OpenReview.net,2020:1-12. [88]STULP F,SIGAUD O.Path integral policy improvement withcovariance matrix adaptation[C]//Proceedings of the 29th International Conference on Machine Learning.New York:ACM,2012:1547-1554. [89]HANSEN N,OSTERMEIER A.Completely derandomized self-adaptation in evolution strategies[J].Evolutionary Computation,2001,9(2):159-195. [90]KOUTNIK J,GOMEZ F,SCHMIDHUBER J.Evolving neural networks in compressed weight space[C]//Proceedings of the 12th Annual Conferenceon Genetic and Evolutionary Computation.New York:ACM,2010:619-626. [91]SALIMANS T,HO J,CHEN X,et al.Evolution Strategies as a Scalable Alternative to Reinforcement Learning [J].arXiv:1703.03864,2017. [92]SEHNKE F,OSENDORFER C,RÜCKSTIEß T,et al.Parameter exploring policy gradients[J].Neural Networks,2010,23(4):551-559. [93]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region po-licy optimization[C]//International Conference On Machine Learning.New York:ACM,2015:1889-1897. [94]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning.New York:ACM,2016:1928-1937. [95]CONTI E,MADHAVAN V,SUCH F P,et al.Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.Cambridge,MA:MIT Press,2018:5032-5043. [96]SALIMANS T,HO J,CHEN X,et al.Evolution Strategies as a Scalable Alternative to Reinforcement Learning [J].arXiv:1703.03864,2017. [97]SALIMANS T,GOODFELLOW I,ZAREMBA W,et al.Im-proved techniques for training GANs[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.Cambridge,MA:MIT Press,2016:2234-2242. [98]GEWEKE J.Antithetic acceleration of Monte Carlo integration in Bayesian inference[J].Journal of Econometrics,1988,38(1/2):73-89. [99]WIERSTRA D,SCHAUL T,GLASMACHERS T,et al.Natural evolution strategies[J].The Journal of Machine Learning Research,2014,15(1):949-980. [100]CHOROMANSKI K,ROWLAND M,SINDHWANI V,et al.Structured evolution with compact architectures for scalable policy optimization[C]//International Conference on Machine Learning.New York:ACM,2018:970-978. [101]MAHESWARANATHAN N,METZ L,TUCKER G,et al.Guided evolutionary strategies:Augmenting random search with surrogate gradients[C]//International Conference on Machine Learning.New York:ACM,2019:4264-4273. [102]CHOROMANSKI K,PACCHIANO A,PARKER-HOLDER J,et al.From complexity to simplicity:adaptive ES-Active subspaces for black box optimization[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.Cambridge,MA:MIT Press,2019:10299-10309. [103]LIU F Y,LI Z N,QIAN C.Self-guided evolution strategies with historical estimated gradients[C]// International Joint Confe-rence on Artificial Intelligence.San Francisco,USA:Morgan Kaufmann,2020:1474-1480. [104]ZHANG J,TRAN H,LU D,et al.A novel evolution strategy with directional gaussian smoothing for black box optimization [J].arXiv:2002.03001,2020. [105]LEHMAN J,STANLEY K O.Novelty search and the problem with objectives[J].Genetic Programming Theory And Practice,2011,21:37-56. [106]PUGH J K,SOROS L B,STANLEY K O.Quality diversity:A new frontier for evolutionary computation[J].Frontiers in Robotics and AI,2016,3:40. [107]GAJEWSKI A,CLUNE J,STANLEY K O,et al.Evolvability ES:Scalable and direct optimization of evolvability[C]//Proceedings of the Genetic and Evolutionary Computation Confe-rence.New York;ACM,2019:107-115. [108]MENGISTU H,LEHMAN J,CLUNE J.Evolvability search:di-rectly selecting for evolvability in order to study and produce it[C]//Proceedings of the Genetic and Evolutionary Computation Conference.New York:ACM,2016:141-148. [109]KATONA A,FRANKS D W,WALKER J A.Quality evolvabi-lity es:Evolving individuals with a distribution of well perfor-ming and diverse offspring [J].arXiv:2103.10790,2021. [110]HOSPEDALES T,ANTONIOU A,MICAELLI P,et al.Meta-learning in neural networks:A survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44(9):5149-5169. [111]SONG X,GAO W,YANG Y,et al.ES-MAML:Simple hessian-free meta learning [J].arXiv:1910.01215,2020. [112]SONG X,YANG Y,CHOROMANSKI K,et al.Rapidly adaptable legged robots via evolutionary meta-learning[C]//International Conference on Intelligent Robots and Systems.Pisca-taway,NJ:IEEE,2020:3769-3776. [113]WANG Z,CHEN C,DONG D.Instance weighted incrementalevolution strategies for reinforcement learning in dynamic environments [J].arXiv:2010.04605,2022. [114]TAN J,ZHANG T,COUMANS E,et al.Sim-to-real:Learning agile locomotion for quadruped robots [J].arXiv:1804.10332,2018. [115]NAGABANDI A,CLAVERA I,LIU S,et al.Learning to adapt in dynamic,real-world environments via Meta-reinforcement learning [J].arXiv:1803.11347,2019. [116]ARNDT K,HAZARA M,GHADIRZADEH A,et al.Meta reinforcement learning for sim-to-real domain adaptation[C]//2020 IEEE International Conferenceon Robotics and Automation.Piscataway,NJ:IEEE,2020:2725-2731. [117]HANSEL K,MOOS J,DERSTROFF C.Benchmarking the na-tural gradient in policy gradient methods and evolution strategies[J].Reinforcement Learning Algorithms:Analysis and Applications,2021,883:69-84. [118]ECOFFET P,FONTBONNE N,ANDRÉ J B,et al.Policysearch with rare significant events:Choosing the right partner to cooperate with[J].PLOS one,2022,17(4):e0266841. [119]KHADKA S,TUMER K.Evolution-guided policy gradient inreinforcement learning[J].Advances in Neural Information Processing Systems,2018,31:1196-1208. [120]KHADKA S,MAJUMDAR S,NASSAR T,et al.Collaborative evolutionary reinforcement learning[C]//International Confe-rence On Machine Learning.New York:ACM,2019:3341-3350. [121]BODNAR C,DAY B,LIÓ P.Proximal distilled evolutionary reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Menlo Park CA:AAAI,2020. [122]SURI K,SHI X Q,PLATANIOTIS K N,et al.Maximum mutation reinforcement learning for scalable control [J].arXiv:2007.13690,2020. [123]SHI L,LI S,CAO L,et al.FiDi-RL:Incorporating deep rein-forcement learning with finite-difference policy search for efficient learning of continuous control [J].arXiv:1907.00526,2019. [124]POURCHOT A,SIGAUD O.CEM-RL:Combining evolutionary and gradient-based methods for policy search[C]//7th International Conference on Learning Representations.New Orleans,LA:OpenReview.net,2019. [125]TANG Y.Guiding evolutionary strategies with off-policy actor-critic[C]//Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems.2021:1317-1325. [126]CHANG S,YANG J,CHOI J,et al.Genetic-gated networks for deep reinforcement learning[J].Advances in Neural Information Processing Systems,2018,31:1754-1763. [127]MARCHESINI E,CORSI D,FARINELLI A.Genetic soft updates for policy evolution in deep reinforcement learning[C]//International Conference on Learning Representations.New Orleans,LA:OpenReview.net,2021. [128]LIU Q,WANG Y,LIU X.PNS:Population-Guided noveltysearch for reinforcement learning in hard exploration environments[C]//2021 International Conference on Intelligent Robots and Systems.Piscataway,NJ:IEEE,2021:5627-5634. [129]SHI L,LI S,ZHENG Q,et al.Efficient novelty search through deep reinforcement learning[J].IEEE Access,2020,8:128809-128818. [130]NILSSON O,CULLY A.Policy gradient assisted map-elites[C]//Proceedings of the Genetic and Evolutionary Computation Conference.New York:ACM,2021:866-875. [131]CIDERON G,PIERROT T,PERRIN N,et al.QD-RL:Efficient mixing of quality and diversity in reinforcement learning [J].arXiv:2006.08505,2020. [132]PIERROT T,MACÉ V,CIDERON G,et al.Sample efficientquality diversity for neural continuous control [C]//ICLR 2021 Conference.2021. [133]TJANAKA B,FONTAINE M C,TOGELIUS J,et al.Approximating gradients for differentiable quality diversity in reinforcement learning[C]//Proceedings of the Genetic and Evolutionary Computation Conference.New York:ACM,2022:1102-1111. [134]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].Massachusetts,USA:MIT press,2018. [135]OSBAND I,BLUNDELL C,PRITZEL A,et al.Deep explora-tion via bootstrapped DQN[C]// Proceedings of the 30th International Conference on Neural Information Processing Systems.Cambridge,MA:MIT Press,2016:4033-4041. [136]BELLEMARE M G,SRINIVASAN S,OSTROVSKI G,et al.Unifying count-based exploration and intrinsic motivation[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.Cambridge,MA:MIT Press,2016:1479-1487. [137]POURCHOT A,PERRIN N,SIGAUD O.Importance mixing:Improving sample reuse in evolutionary policy search methods [J].arXiv:1808.05832,2018. [138]KRAUSE O.Large-scale noise-resilient evolution strategies[C]// Proceedings of the Genetic and Evolutionary Computation Conference.New York:ACM,2019:682-690. [139]MÜLLER N,GLASMACHERS T.Challenges in high-dimen-sional reinforcement learning with evolution strategies[C]//Parallel Problem Solving from Nature--PPSN XV:15th International Conference.Coimbra Portugal:Springer International Publishing,2018:411-423. |
[1] | SHI Dianxi, HU Haomeng, SONG Linna, YANG Huanhuan, OUYANG Qianying, TAN Jiefu , CHEN Ying. Multi-agent Reinforcement Learning Method Based on Observation Reconstruction [J]. Computer Science, 2024, 51(4): 280-290. |
[2] | ZHAO Miao, XIE Liang, LIN Wenjing, XU Haijiao. Deep Reinforcement Learning Portfolio Model Based on Dynamic Selectors [J]. Computer Science, 2024, 51(4): 344-352. |
[3] | WANG Yan, WANG Tianjing, SHEN Hang, BAI Guangwei. Optimal Penetration Path Generation Based on Maximum Entropy Reinforcement Learning [J]. Computer Science, 2024, 51(3): 360-367. |
[4] | LI Junwei, LIU Quan, XU Yapeng. Option-Critic Algorithm Based on Mutual Information Optimization [J]. Computer Science, 2024, 51(2): 252-258. |
[5] | SHI Dianxi, PENG Yingxuan, YANG Huanhuan, OUYANG Qianying, ZHANG Yuhui, HAO Feng. DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Learning [J]. Computer Science, 2024, 51(2): 268-277. |
[6] | WANG Yangmin, HU Chengyu, YAN Xuesong, ZENG Deze. Study on Deep Reinforcement Learning for Energy-aware Virtual Machine Scheduling [J]. Computer Science, 2024, 51(2): 293-299. |
[7] | ZHAO Xiaoyan, ZHAO Bin, ZHANG Junna, YUAN Peiyan. Study on Cache-oriented Dynamic Collaborative Task Migration Technology [J]. Computer Science, 2024, 51(2): 300-310. |
[8] | WNAG Yuzhen, ZONG Guoxiao, WEI Qiang. SGPot:A Reinforcement Learning-based Honeypot Framework for Smart Grid [J]. Computer Science, 2024, 51(2): 359-370. |
[9] | LUO Ruiqing, ZENG Kun, ZHANG Xinjing. Curriculum Learning Framework Based on Reinforcement Learning in Sparse HeterogeneousMulti-agent Environments [J]. Computer Science, 2024, 51(1): 301-309. |
[10] | LIU Xingguang, ZHOU Li, ZHANG Xiaoying, CHEN Haitao, ZHAO Haitao, WEI Jibo. Edge Intelligent Sensing Based UAV Space Trajectory Planning Method [J]. Computer Science, 2023, 50(9): 311-317. |
[11] | LIN Xinyu, YAO Zewei, HU Shengxi, CHEN Zheyi, CHEN Xing. Task Offloading Algorithm Based on Federated Deep Reinforcement Learning for Internet of Vehicles [J]. Computer Science, 2023, 50(9): 347-356. |
[12] | JIN Tiancheng, DOU Liang, ZHANG Wei, XIAO Chunyun, LIU Feng, ZHOU Aimin. OJ Exercise Recommendation Model Based on Deep Reinforcement Learning and Program Analysis [J]. Computer Science, 2023, 50(8): 58-67. |
[13] | XIONG Liqin, CAO Lei, CHEN Xiliang, LAI Jun. Value Factorization Method Based on State Estimation [J]. Computer Science, 2023, 50(8): 202-208. |
[14] | ZHANG Naixin, CHEN Xiaorui, LI An, YANG Leyao, WU Huaming. Edge Offloading Framework for D2D-MEC Networks Based on Deep Reinforcement Learningand Wireless Charging Technology [J]. Computer Science, 2023, 50(8): 233-242. |
[15] | XING Linquan, XIAO Yingmin, YANG Zhibin, WEI Zhengmin, ZHOU Yong, GAO Saijun. Spacecraft Rendezvous Guidance Method Based on Safe Reinforcement Learning [J]. Computer Science, 2023, 50(8): 271-279. |
|