Computer Science ›› 2021, Vol. 48 ›› Issue (2): 238-244.doi: 10.11896/jsjkx.191100107
• Artificial Intelligence • Previous Articles Next Articles
JIANG Chong1, ZHANG Zong-zhang2, CHEN Zi-xuan1, ZHU Jia-cheng1, JIANG Jun-peng1
CLC Number:
[1] LIU Q,ZHAI J W,ZHANG Z Z,et al.A Survey of Deep Reinforcement Learning [J].Chinese Journal of Computers,2018,41(1):1-27. [2] SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search [J].Nature,2016,529(7587):484-489. [3] SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of Go without human knowledge [J].Nature,2017,550(7676):354-359. [4] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with deep reinforcement learning[C]//Proceedings of the Workshops at the 27th Neural Information Processing Systems (NIPS).2013:201-220. [5] SUTTON R S,BARTO A G.Reinforcement learning:An introduction (2nd edition) [M].MIT Press,2018. [6] SCHAAL S.Is imitation learning the route to humanoid robots? [J].Trendsin Cognitive Sciences,1999,3(6):233-242. [7] OSA T,PAJARINEN J,NEUMANN G,et al.An algorithmicperspective on imitation learning [J].Foundationsand Trends in Robotics,2018,7(1/2):1-179. [8] ABBEEL P,NG A Y.Apprenticeship learning via inverse reinforcement learning[C]//Proceedings of the 21st International Conference on Machine Learning (ICML).2004:1-8. [9] NG A Y,RUSSELL S J.Algorithms for inverse reinforcement learning[C]//Proceedings of the 17th International Conference on Machine Learning (ICML).2000:663-670. [10] HO J,ERMON S.Generative adversarial imitation learning[C]//Proceedings of the 30th Neural Information Processing Systems (NIPS).2016:4565-4573. [11] STADIE B C,ABBEEL P,SUTSKEVER I.Third-person imitation learning[C]//Proceedings of the 5th International Confe-rence on Learning Representations (ICLR).2017. [12] SHARMA P,PATHAK D,GUPTA A.Third-person visual imitation learning via decoupled hierarchical controller[C]//Proceedings of the 33rd Neural Information Processing Systems (NIPS).2019:2593-2603. [13] JIANG C,ZHANG Z Z,CHEN Z X,et al.Third-person imitation learning via image difference and variational discriminator bottleneck (student abstract version)[C]//Proceedings of the 44th AAAI Conference on Artificial Intelligence (AAAI).2020. [14] TODOROV E,EREZ T,TASSA Y.Mujoco:A physics engine for model-based control[C]//2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.2012:5026-5033. [15] LIN J H,ZHANG Z Z,JIANG C,et al.A Survey of imitation learning based on Generative Adversarial Nets [J].Chinese Journal of Computers,2020,43(2):326-351. [16] GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Proceedings of the 28th Neural Information Processing Systems (NIPS).2014:2672-2680. [17] MEREL J,TASSA Y,TB D,et al.Learning human behaviors from motion capture by adversarial imitation[J].arXiv:1707.02201,2017. [18] TORABI F,WARNELL G,STONE P.Generative adversarialimitation from observation[J].arXiv:1807.06158,2018. [19] TZENG E,HOFFMAN J,ZHANG N,et al.Deep domain confusion:maximizing for domain invariance[J].arXiv:1412.3474,2014. [20] GANIN Y,LEMPITSKY V.Unsupervised domain adaptationby backpropagation[J].arXiv:1409.7495,2014. [21] PENG X B,KANAZAWA A,TOYER S,et al.Variational discriminator bottleneck:improving imitation learning,inverse RL,and GANs by constraining information flow[J].arXiv:1810.00821,2018. [22] ALEMI A A,FISCHER I,DILLON J V,et al.Deep variationalinformation bottleneck[J].arXiv:1612.00410,2016. [23] KINGMA D P,BA J L.Adam:a method for stochastic optimization[C]//Proceedings of the 4th International Conference on Learning Representations (ICLR).2015. [24] SCHULMAN J,LEVINE S,MORITZ P,et al.Trust region po-licy optimization[C]//Proceedings of the 32nd International Conference on Machine Learning (ICML).2015:1889-1897. |
[1] | CHEN Yan, CHEN Jia-qing, CHEN Xing. Machine Learning Process Composition Based on Hierarchical Label [J]. Computer Science, 2021, 48(6A): 306-312. |
[2] | FAN Jia-kuan, WANG Hao-yue, ZHAO Sheng-yu, ZHOU Tian-yi, WANG Wei. Data-driven Methods for Quantitative Assessment and Enhancement of Open Source Contributions [J]. Computer Science, 2021, 48(5): 45-50. |
[3] | ZHENG Jing-hua, GUO Shi-ze, GAO Liang and ZHONG Xiao-feng. Survey on Cognitive Domain Feature Prediction of Social Network Users [J]. Computer Science, 2018, 45(3): 16-22. |
[4] | YANG Dan. Color Image Difference Prediction Based on Image Difference Measure [J]. Computer Science, 2015, 42(1): 308-311. |
[5] | WANG Lei,ZENG Xian-ting and SU Jin-yang. Steganalysis Based on Multi-domain Features for JPEG Images [J]. Computer Science, 2014, 41(6): 94-98. |
|