Computer Science ›› 2020, Vol. 47 ›› Issue (3): 182-191.doi: 10.11896/jsjkx.190200352
• Artificial Intelligence • Previous Articles Next Articles
ANG Wei-yi1,BAI Chen-jia2,CAI Chao1,ZHAO Ying-nan2,LIU Peng2
CLC Number:
[1]SUTTON R S,BARTO A G.Reinforcement learning:An intro- duction[M].MIT Press,US,2018. [2]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].nature,2015,521(7553):436. [3]LI Y.Deep reinforcement learning:An overview[J].arXiv: 1701.07274,2017. [4]SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484. [5]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of go without human knowledge[J].Nature,2017,550(7676):354. [6]SILVER D,HUBERT T,SCHRITTWIESER J,et al.A general reinforcement learning algorithm that masters chess,shogi,and Go through self-play[J].Science,2018,362(6419):1140-1144. [7]PLAPPERT M,ANDRYCHOWICZ M,RAY A,et al.Multi- goal reinforcement learning:Challenging robotics environments and request for research[J].arXiv:1802.09464,2018. [8]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized ex- perience replay[J].arXiv:1511.05952,2015. [9]LEVINE S,PASTOR P,KRIZHEVSKY A,et al.Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection[J].The International Journal of Robotics Research,2018,37(4/5):421-436. [10]ISELE D,RAHIMI R,COSGUN A,et al.Navigating occluded intersections with autonomous vehicles using deep reinforcement learning[C]∥2018 IEEE International Conference on Robotics and Automation (ICRA).IEEE,2018:2034-2039. [11]BELLMAN R.A Markovian decision process[J].Journal of Mathematics and Mechanics,1957,6(5):679-684. [12]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013. [13]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529. [14]HASSELT H V.Double Q-learning[C]∥Advances in Neural Information Processing Systems.2010:2613-2621. [15] HASSELT H V,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]∥Thirtieth AAAI Confe-rence on Artificial Intelligence.2016. [16]HORGAN D,QUAN J,BUDDEN D,et al.Distributed prioritized experience replay[C]∥International Conference onLear-ning Representations.2018. [17]WANG Z,SCHAUL T,HESSEL M,et al.Dueling Network Ar- chitectures for Deep Reinforcement Learning[C]∥International Conference on Machine Learning.2016:1995-2003. [18]BELLEMARE M G,DABNEY W,MUNOS R.A distributional perspective on reinforcement learning[C]∥International Conference on Machine Learning.2017:449-458. [19]HESSEL M,MODAYIL J,VAN HASSELT H,et al.Rainbow:Combining improvements in deep reinforcement learning[C]∥Thirty-Second AAAI Conference on Artificial Intelligence.2018. [20]DE ASIS K,HERNANDEZ-GARCIA J F,HOLLAND G Z, et al.Multi-step reinforcement learning:A unifying algorithm[C]∥Thirty-Second AAAI Conference on Artificial Intelligence.2018. [21]FORTUNATO M,AZAR M G,PIOT B,et al.Noisy networks for exploration[C]∥International Conference on Learning Representations.2018. [22]PRECUP D,SUTTON R S,DASGUPTA S.Off-policy temporal-difference learning with function approximation[C]∥International Conference on Machine Learning.2001:417-424. [23]BROWNE C B,POWLEY E,WHITEHOUSE D,et al.A survey of monte carlo tree search methods[J].IEEE Transactions on Computational Intelligence and AI in Games,2012,4(1):1-43. [24]SILVER D,LEVER G,HEESS N,et al.Deterministic policy gradient algorithms[C]∥International Conference on Machine Learning.2014. [25]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]∥International conference on machine learning.2016:1928-1937. [26]WYMANN B,ESPIÉ E,GUIONNEAU C,et al.Torcs,the open racing car simulator[J].Software,2000,4(6). [27]TODOROV E,EREZ T,TASSA Y.Mujoco:A physics engine for model-based control[C]∥2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.IEEE,2012:5026-5033. [28]KEMPKA M,WYDMUCH M,RUNC G,et al.Vizdoom:A doom-based ai research platform for visual reinforcement lear-ning[C]∥2016 IEEE Conference on Computational Intelligence and Games (CIG).IEEE,2016:1-8. [29]BEATTIE C,LEIBO J Z,TEPLYASHIN D,et al.Deepmind lab[J].arXiv:1612.03801,2016. [30]BABAEIZADEH M,FROSIO I,TYREE S,et al.Reinforcement learning through asynchronous advantage actor-critic on a gpu[C]∥International Conference on Learning Representations.2017. [31]ESPEHOLT L,SOYER H,MUNOS R,et al.IMPALA:Scala- ble Distributed Deep-RL with Importance Weighted Actor-Learner Architectures[C]∥International Conference on Machine Learning.2018:1406-1415. [32]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust Region Policy Optimization[C]∥International Conference on Machine Learning.2015,37:1889-1897. [33]WU Y,MANSIMOV E,GROSSE R B,et al.Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation[C]∥Advances in neural information processing systems.2017:5279-5288. [34]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithms[J].arXiv:1707.06347,2017. [35]SCHULMAN J,MORITZ P,LEVINE S,et al.High-dimensional continuous control using generalized advantage estimation[C]∥International Conference on Learning Representations.2016. [36]NACHUM O,NOROUZI M,XU K,et al.Bridging the gap between value and policy based reinforcement learning[C]∥Advances in Neural Information Processing Systems.2017:2775-2785. [37]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[C]∥International Conference on Learning Representations.2016. [38]FUJIMOTO S,HOOF H,MEGER D.Addressing Function Approximation Error in Actor-Critic Methods[C]∥International Conference on Machine Learning.2018:1582-1591. [39]HAUSKNECHT M,STONE P.Deep reinforcement learning in parameterized action space[C]∥International Conference on Learning Representations.2016. [40]STONE P.What’s hot at RoboCup[C]∥Thirtieth AAAI Conference on Artificial Intelligence.2016. [41]HAARNOJA T,TANG H,ABBEEL P,et al.Reinforcement learning with deep energy-based policies[C]∥International Conference on Machine Learning.2017:1352-1361. [42]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]∥International Conference on Machine Learning.2018:1856-1865. [43]SCHULMAN J,CHEN X,ABBEEL P.Equivalence between policy gradients and soft q-learning[J].arXiv:1704.06440,2017. [44]GU S,LILLICRAP T,GHAHRAMANI Z,et al.Q-prop:Sample-efficient policy gradient with an off-policy critic[C]∥International Conference on Learning Representations.2017. [45]O’DONOGHUE B,MUNOS R,KAVUKCUOGLU K,et al. Combining policy gradient and Q-learning[C]∥International Conference on Learning Representations.2017. [46]WANG Z,BAPST V,HEESS N,et al.Sample efficient actor-critic with experience replay[C]∥International Conference on Learning Representations.2017. [47]ZHAO X Y,DING S F.Research on Deep Reinforcement Lear- ning[J].Computer Science,2018,45(7):1-6. [48]OPENAI.Faulty Reward Functions in the Wild[EB/OL].ht- tps://blog.openai.com/faulty-reward-functions.2017. [49]RUSSELL S,NORVIG P.Artificial Intelligence A Modern Approach 3rd Edition Pdf[J].Hong Kong:Pearson Education Asia,2011. [50]AMODEI D,OLAH C,STEINHARDT J,et al.Concrete Problems in AI Safety[J].arXiv:1606.06565,2016. [51]NG A Y,RUSSELL S J.Algorithms for inverse reinforcement learning[C]∥ICML.2000,1:2. [52]ZIEBART B D,MAAS A L,BAGNELL J A,et al.Maximum entropy inverse reinforcement learning[C]∥AAAI Conference on Artificial Intelligence.2008:1433-1438. [53]AGHASADEGHI N,BRETL T.Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals[C]∥2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.IEEE,2011:1561-1566. [54]FINN C,LEVINE S,ABBEEL P.Guided cost learning:Deep inverse optimal control via policy optimization[C]∥International Conference on Machine Learning.2016:49-58. [55]HADFIELD-MENELL D,MILLI S,ABBEEL P,et al.Inverse reward design[C]∥Advances in neural information processing systems.2017:6765-6774. [56]CHRISTIANO P F,LEIKE J,BROWN T,et al.Deep reinforcement learning from human preferences[C]∥Advances in Neural Information Processing Systems.2017:4299-4307. [57]ZHANG K F,YU Y.Methodologies for Imitation Learning via Inverse Reinforcement Learning:A Review[J].Journal of Computer Research and Development,2019,56(2):254-261. [58]HOU Y,LIU L,WEI Q,et al.A novel DDPG method with prioritized experience replay[C]∥2017 IEEE International Confe-rence on Systems,Man,and Cybernetics (SMC).IEEE,2017:316-321. [59]TAVAKOLI A,PARDO F,KORMUSHEV P.Action branching architectures for deep reinforcement learning[C]∥Thirty-Se-cond AAAI Conference on Artificial Intelligence.2018. [60]HORGAN D,QUAN J,BUDDEN D,et al.Distributed prioritized experience replay[C]∥International Conference on Lear-ning Representations.2018. [61]DE BRUIN T,KOBER J,TUYLS K,et al.Experience selection in deep reinforcement learning for control[J].The Journal of Machine Learning Research,2018,19(1):347-402. [62]BAI C J,LIU P,ZHAO W,et al.Active Sampling for Deep Q-Learning Based on TD-error Adaptive Correction[J].Journal of Computer Research and Development,2019,56(2):262-280. [63]CHAPELLE O,LI L.An empirical evaluation of thompson sampling[C]∥Advances in neural information processing systems.2011:2249-2257. [64]KOLTER J Z,NG A Y.Near-Bayesian exploration in polyno- mial time[C]∥Proceedings of the 26th Annual International Conference on Machine Learning.ACM,2009:513-520. [65]OSBAND I,BLUNDELL C,PRITZEL A,et al.Deep explora- tion via bootstrapped DQN[C]∥Advances in neural information processing systems.2016:4026-4034. [66]BELLEMARE M,SRINIVASAN S,OSTROVSKI G,et al.Unifying count-based exploration and intrinsic motivation[C]∥Advances in Neural Information Processing Systems.2016:1471-1479. [67]OSTROVSKI G,BELLEMARE M G,VAN DEN OORD A,et al.Count-based exploration with neural density models[C]∥Proceedings of the 34th International Conference on Machine Learning.2017:2721-2730. [68]VAN OORD A,KALCHBRENNER N,KAVUKCUOGLU K. Pixel Recurrent Neural Networks[C]∥International Conference on Machine Learning.2016:1747-1756. [69]SALIMANS T,KARPATHY A,CHEN X,et al.Pixelcnn++:A pixelcnn implementation with discretized logistic mixture likelihood and other modifications[C]∥International Conference on Learning Representations (ICLR).2017. [70]TANG H,HOUTHOOFT R,FOOTE D,et al.#Exploration:A study of count-based exploration for deep reinforcement learning[C]∥Advances in Neural Information Processing Systems.2017:2753-2762. [71]HOUTHOOFT R,CHEN X,DUAN Y,et al.Vime:Variational information maximizing exploration[C]∥Advances in Neural Information Processing Systems.2016:1109-1117. [72]STADIE B C,LEVINE S,ABBEEL P.Incentivizing exploration in reinforcement learning with deep predictive models[J].arXiv:1507.00814,2015. [73]PATHAK D,AGRAWAL P,EFROS A A,et al.Curiosity-dri- ven Exploration by Self-supervised Prediction[C]∥International Conference on Machine Learning.2017:2778-2787. [74]BURDA Y,EDWARDS H,PATHAK D,et al.Large-scale study of curiosity-driven learning[C]∥International Conference on Learning Representations (ICLR).2019. [75]BURDA Y,EDWARDS H,STORKEY A,et al.Exploration by random network distillation[C]∥International Conference on Learning Representations (ICLR).2019. [76]FU J,CO-REYES J,LEVINE S.Ex2:Exploration with exemplar models for deep reinforcement learning[C]∥Advances in Neural Information Processing Systems.2017:2577-2587. [77]OSBAND I,ASLANIDES J,CASSIRER A.Randomized prior functions for deep reinforcement learning[C]∥Advances in Neural Information Processing Systems.2018:8626-8638. [78]CONTI E,MADHAVAN V,SUCH F P,et al.Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents[C]∥Advances in Neural Information Processing Systems.2018:5032-5043. [79]GUPTA A,MENDONCA R,LIU Y X,et al.Meta-reinforce- ment learning of structured exploration strategies[C]∥Advances in Neural Information Processing Systems.2018:5307-5316. [80]ANDRYCHOWICZ M,WOLSKI F,RAY A,et al.Hindsight experience replay[C]∥Advances in Neural Information Processing Systems.2017:5048-5058. [81]SUTTON R S,MODAYIL J,DELP M,et al.Horde:A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction[C]∥The 10th International Confe-rence on Autonomous Agents and Multiagent Systems.2011:761-768. [82]SCHAUL T,HORGAN D,GREGOR K,et al.Universal value function approximators[C]∥International Conference on Machine Learning.2015:1312-1320. [83]RAUBER P,UMMADISINGU A,MUTZ F,et al.Hindsight policy gradients[C]∥International Conference on Learning Representations (ICLR).2019. [84]FANG M,ZHOU C,SHI B,et al.DHER:Hindsight Experience Replay for Dynamic Goals[C]∥International Conference on Learning Representations (ICLR).2019. [85]LANKA S,WU T.ARCHER:Aggressive Rewards to Counter bias in Hindsight Experience Replay[J].arXiv:1809.02070,2018. [86]NAIR A V,PONG V,DALAL M,et al.Visual reinforcement learning with imagined goals[C]∥Advances in Neural Information Processing Systems.2018:9209-9220. [87]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013. [88]SCHMIDHUBER J.Powerplay:Training an increasingly general problem solver by continually searching for the simplest still unsolvable problem[J].Frontiers in psychology,2013,4:313. [89]FLORENSA C,HELD D,WULFMEIER M,et al.Reverse curriculum generation for reinforcement learning[C]∥International conference on Robot Learning.2017. [90]FLORENSA C,HELD D,GENG X,et al.Automatic goal genera- tion for reinforcement learning agents[C]∥International Conference on Machine Learning.2018:1514-1523. [91]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Gene- rative adversarial nets[C]∥Advances in Neural Information Processing Systems.2014:2672-2680. [92]SUKHBAATAR S,LIN Z,KOSTRIKOV I,et al.Intrinsic motivation and automatic curricula via asymmetric self-play[C]∥International Conference on Learning Representations (ICLR).2018. [93]JADERBERG M,MNIH V,CZARNECKI W M,et al.Rein- forcement learning with unsupervised auxiliary tasks[C]∥International Conference on Learning Representations (ICLR).2017. [94]MIROWSKI P,PASCANU R,VIOLA F,et al.Learning to navi- gate in complex environments[C]∥International Conference on Learning Representations (ICLR).2017. [95]MIROWSKI P,GRIMES M,MALINOWSKI M,et al.Learning to navigate in cities without a map[C]∥Advances in Neural Information Processing Systems.2018:2424-2435. [96]PARISOTTO E,SALAKHUTDINOV R.Neural map:Struc- tured memory for deep reinforcement learning[C]∥Internatio-nal Conference on Learning Representations.2018. [97]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [98]GU S,LILLICRAP T,SUTSKEVER I,et al.Continuous deep q-learning with model-based acceleration[C]∥International Conference on Machine Learning.2016:2829-2838. [99]XU Z,VAN HASSELT H P,SILVER D.Meta-gradient reinforcement learning[C]∥Advances in Neural Information Processing Systems.2018:2402-2413. [100]NACHUM O,GU S S,LEE H,et al.Data-efficient hierarchical reinforcement learning[C]∥Advances in Neural Information Processing Systems.2018:3307-3317. [101]TENENBAUM J.Building machines that learn and think like people[C]∥Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems.2018:5-5. |
[1] | RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207. |
[2] | LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241. |
[3] | TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305. |
[4] | XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171. |
[5] | SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256. |
[6] | WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293. |
[7] | HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329. |
[8] | JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335. |
[9] | SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177. |
[10] | YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204. |
[11] | HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78. |
[12] | CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126. |
[13] | HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163. |
[14] | ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169. |
[15] | SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235. |
|