Computer Science ›› 2023, Vol. 50 ›› Issue (1): 253-261.doi: 10.11896/jsjkx.211100167
• Artificial Intelligence • Previous Articles Next Articles
XU Ping'an1, LIU Quan1,2,3,4
CLC Number:
[1]SUTTON R S,BARTO A G.Reinforcement learning:An introduction [M].Massachusetts:MIT press,2018. [2]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of go without human knowledge [J].Nature,2017,550(7676):354-359. [3]KOBER J,BAGNELL J A,PETERS J.Reinforcement learning in robotics:A survey [J].The International Journal of Robotics Research,2013,32(11):1238-1274. [4]SALLAB A E,ABDOU M,PEROT E,et al.Deep reinforcement learning framework for autonomous driving [J].Electronic Imaging,2017,2017(19):70-76. [5]LIU Q,ZHAI J W,ZHANG Z Z,et al.A survey on deep reinforcement learning [J].Chinese Journal of Computers,2018,41(1):1-27. [6]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deepreinforcement learning [J].Nature,2015,518(7540):529-533. [7]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning.2016:1928-1937. [8]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[C]//ICLR.2016. [9]FUJIMOTO S,HOOF H,MEGER D.Addressing function approximation error in actor-critic methods[C]//International Conference on Machine Learning.2018:1587-1596. [10]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region po-licy optimization[C]//International Conference on Machine Learning.2015:1889-1897. [11]SCHULMAN J,MORITZ P,LEVINE S,et al.High-dimen-sional continuous control using generalized advantage estimation[J].arXiv:1506.02438,2015. [12]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017. [13]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//International Conference on Machine Learning.2018:1861-1870. [14]SALIMANS T,HO J,CHEN X,et al.Evolution strategies as a scalable alternative to reinforcement learning[J].arXiv:1703.03864,2017. [15]TAO Y,GENC S,CHUNG J,et al.REPAINT:KnowledgeTransfer in Deep Reinforcement Learning[C]//International Conference on Machine Learning.2021:10141-10152. [16]BARRETO A,BORSA D,QUAN J,et al.Transfer in deep reinforcement learning using successor features and generalised policy improvement[C]//International Conference on Machine Learning.2018:501-510. [17]CZARNECKI W M,PASCANU R,OSINDERO S,et al.Distilling policy distillation[C]//International Conference on Artificial Intelligence and Statistics.2019:1331-1340. [18]LAI KH,ZHA D,LI Y,et al.Dual Policy Distillation[C]//International Joint Conference on Artificial Intelligence.2020:3146-3152. [19]RUSU A A,COLMENAREJO S G,GULCEHRE C,et al.Policy distillation[J].arXiv:1151.06295,2015. [20]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015. [21]WADHWANIA S,KIM DK,OMIDSHAFIEI S,et al.Policy distillation and value matching in multiagent reinforcement learning[C]//International Conference on Intelligent Robots and Systems.2019:8193-8200. [22]CHEN G.A New Framework for Multi-Agent ReinforcementLearning-Centralized Training and Exploration with Decentra-lized Execution via Policy Distillation[C]//International Confe-rence on Autonomous Agents and MultiAgent Systems.2020:1801-1803. [23]ZHA D,LAI K H,ZHOU K,et al.Experience replay optimization[C]//International Joint Conference on Artificial Intelligence.2019:4243-4249. [24]XU T,LIU Q,ZHAO L,et al.Learning to explore via meta-po-licy gradient[C]//International Conference on Machine Lear-ning.2018:5463-5472. [25]FANG Y,REN K,LIU W,et al.Universal Trading for Order Execution with Oracle Policy Distillation[J].arXiv:2103.10860,2021. [26]FAN S,ZHANG X,SONG Z.Reinforced knowledge distillation:Multi-class imbalanced classifier based on policy gradient reinforcement learning [J].Neurocomputing,2021,463:422-436. [27]HA J S,PARK Y J,CHAE H J,et al.Distilling a hierarchical policy for planning and control via representation and reinforcement learning[C]//IEEE International Conference on Robotics and Automation.2021:4459-4466. [28]LI Z H,YU Y,CHEN Y,et al.Neural-to-Tree Policy Distillation with Policy Improvement Criterion[J].arXiv:2108.06898,2021. [29]ZHAO C,HOSPEDALES T.Robust domain randomised rein-forcement learning through peer-to-peer distillation[C]//Asian Conference on Machine Learning.2021:1237-1252. [30]CHA H,PARK J,KIM H,et al.Proxy experience replay:Fede-rated distillation for distributed reinforcement learning [J].IEEE Intelligent Systems,2020,35(4):94-101. [31]SUN H,PAN X,DAI B,et al.Evolutionary Stochastic Policy Distillation[J].arXiv:2004.12909,2020. [32]BROCKMAN G,CHEUNG V,PETTERSSON L,et al.Openai gym[J].arXiv:1606.01540,2016. |
[1] | ZHANG Qiyang, CHEN Xiliang, ZHANG Qiao. Sparse Reward Exploration Method Based on Trajectory Perception [J]. Computer Science, 2023, 50(1): 262-269. |
[2] | TANG Feng, FENG Xiang, YU Hui-qun. Multi-task Cooperative Optimization Algorithm Based on Adaptive Knowledge Transfer andResource Allocation [J]. Computer Science, 2022, 49(7): 254-262. |
[3] | ZHANG Jian-hang, LIU Quan. Deep Deterministic Policy Gradient with Episode Experience Replay [J]. Computer Science, 2021, 48(10): 37-43. |
|