Computer Science ›› 2022, Vol. 49 ›› Issue (6): 335-341.doi: 10.11896/jsjkx.210300081
• Artificial Intelligence • Previous Articles Next Articles
FAN Jing-yu1, LIU Quan1,2,3,4
CLC Number:
[1] SUTTON R S,BARTO A G.Reinforcement Learning:An In-troduction[M].Massachusetts:MIT Press,2018. [2] HUA J,ZENG L,LI G,et al.Learning for a Robot:Deep Reinforcement Learning,Imitation Learning,Transfer Learning[J].Sensors,2021,21(4):1278. [3] SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the Game of Go without Human Knowledge[J].Nature,2017,550(7676):354-359. [4] ARTHUR L,SAMUE L.Some Studies in Machine LearningUsing the Game of Checkers[J].IBM Journal of Research and Development,2000,44(1/2):206-226. [5] CHEN J P,ZOU F,LIU Q,et al.A Reinforcement Learning Algorithm Based on Generative Adversarial Networks[J].Theoretical Computer Science,2019,46(10):265-272. [6] WATKINS C,DAYAN P.Technical Note Q-Learning[J].Machine Learning,1992,8:279-292. [7] SUTTON R S.Learning to Predict by the Method of Temporal Differences[J].Machine Learning,1988,3(1):9-44. [8] GOODFELLOW I,BENGIO Y,COURVILLE A.Deep Learning[M].Massachusetts:MIT Press,2016. [9] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[J].arXiv:1312.5602,2013. [10] HASSELT H V,GUEZ A,SILVER D.Deep ReinforcementLearning with Double Q-Learning[C]//Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.AAAI Press,2016:2094-2100. [11] LILLICRA T P,HUNT J J,PRITZEL A,et al.ContinuousControl with Deep Reinforcement Learning[C]//Proceedings of the 4th International Conference on Learning Representations.ICLR,2016. [12] FUJIMOTO S,HOOF H V,MEGER D.Addressing Function Approximation Error in Actor-Critic Methods[C]//International Conference on Machine Learning.PMLR,2018:1587-1596. [13] HAARNOJA T,ZHOU A,ABBEEL P,et al.SOFT ACTOR-CRITIC:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//Proceedings of the 35th International Conference on Machine Learning.PMLR,2018:1856-1865. [14] SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust Region Policy Optimization[C]//International Conference on Machine Learning.PMLR,2015:1889-1897. [15] SCHULMAN J,WOLSKI F,DHARIWAL P,et al.ProximalPolicy Optimization Algorithms[J].arXiv:/1707.06347,2017. [16] MNIH V,BADIA A P,MIRZA M,et al.Asynchronous Me-thods for Deep Reinforcement Learning[C]//International Conference on Machine Learning.PMLR,2016:1928-1937. [17] HASSELT H.Double Q-learning[J].Advances in Neural Information Processing Systems,2010,23:2613-2621. [18] RUDER R.An Overview of Gradient Descent Optimization Algorithms[J].arXiv:1609.04747,2016. [19] KINGMA D P,BA J.Adam:A Method for Stochastic Optimization[J].arXiv:1412.6980,2014. [20] THRUN S,SCHWARTZ A.Issues in using Function Approximation for Reinforcement Learning[C]//Proceedings of the Fourth Connectionist Models Summer School.Erlbaum,1993:255-263. [21] BROCKMAN G,CHEUNG V,PETTERSSON L,et al.Openai Gym[J].arXiv:1606.01540,2016. [22] TODOROV E,EREZ T,TASSA Y.MuJoCo:A Physics Engine for Model-based Control[C]//Intelligent Robots and Systems.IEEE,2012:5026-5033. |
[1] | RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207. |
[2] | TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305. |
[3] | XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171. |
[4] | WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293. |
[5] | HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329. |
[6] | JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335. |
[7] | SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177. |
[8] | HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78. |
[9] | CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126. |
[10] | HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163. |
[11] | ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169. |
[12] | SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235. |
[13] | LIU Wei-ye, LU Hui-min, LI Yu-peng, MA Ning. Survey on Finger Vein Recognition Research [J]. Computer Science, 2022, 49(6A): 1-11. |
[14] | SUN Fu-quan, CUI Zhi-qing, ZOU Peng, ZHANG Kun. Brain Tumor Segmentation Algorithm Based on Multi-scale Features [J]. Computer Science, 2022, 49(6A): 12-16. |
[15] | KANG Yan, XU Yu-long, KOU Yong-qi, XIE Si-yu, YANG Xue-kun, LI Hao. Drug-Drug Interaction Prediction Based on Transformer and LSTM [J]. Computer Science, 2022, 49(6A): 17-21. |
|