计算机科学 ›› 2019, Vol. 46 ›› Issue (10): 265-272.doi: 10.11896/jsjkx.180901655
陈建平1,2,3, 邹锋1,2,3, 刘全4, 吴宏杰1,2,3, 胡伏原1,2,3, 傅启明1,2,3
CHEN Jian-ping1,2,3, ZOU Feng1,2,3, LIU Quan4, WU Hong-jie1,2,3, HU Fu-yuan1,2,3, FU Qi-ming1,2,3
摘要: 针对强化学习方法在训练初期由于缺少经验样本所导致的学习速度慢的问题,提出了一种基于生成对抗网络的强化学习算法。在训练初期,该算法通过随机策略收集经验样本以构成真实样本池,并利用所收集的经验样本来训练生成对抗网络,然后利用生成对抗网络生成新的样本以构成虚拟样本池,再结合真实样本池和虚拟样本池来批量选择训练样本,以此来提高学习速度。同时,该算法引入了关系修正单元,结合深度神经网络,训练了真实样本池中样本的状态、动作与后续状态、奖赏之间的内部联系,结合相对熵优化生成对抗网络,提高生成样本的质量。最后,将所提出的算法与DQN算法应用于OpenAI Gym中的CartPole问题和MountainCar问题。实验结果表明,与DQN算法相比,所提算法可以有效地加快训练初期的学习速度,且收敛时间缩短了15%。
中图分类号:
[1]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].Cambridge:MIT Press,1998. [2]PUTERMAN M.Markov decision process [J].Statistica Neerlandica,1985,39(2):219-233. [3]WU Y,SHEN T.Policy Iteration algorithm for optimal control of stochastic logical dynamical systems [J].IEEE Transactions on Neural Networks & Learning Systems,2017,28(99):1-6. [4]WEI Q,LIU D,LIN H.Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems [J].IEEE Transactions on Cybernetics,2016,46(3):840-853. [5]BRADTKE S J,BARTO A G.Linear least-squares algorithms for temporal difference learning [J].Machine Learning,1996,22(1/2/3):33-57. [6]HACHIYA H,AKIYAMA T,SUGIAYMA M,et al.Adaptive importance sampling for value function approximation in off-po-licy reinforcement learning [J].Neural Networks,2009,22(10):1399-1410. [7]MAHMOOD A R,SUTTON R S.Off-policy learning based on weighted importance sampling with linear computational complexity[C]//Proceedings of the 31st International Conference on Uncertainty in Artificial Intelligence.Amsterdam:AUAI,2015:552-561. [8]CHEN X L,CAO L,LI C X,et al.Deep reinforcement learning via good choice resampling experience replay memory [J].Control and Decision,2018,33(4):129-134. [9]LEDIG C,THEIS L,HUSZÁR F,et al.Photo-realistic single image super-resolution using a generative adversarial network[C]//Proceedings of the 30th IEEE Conference on ComputerVision and Pattern Recognition.Hawaii:IEEE,2017:105-114. [10]CAO Z Y,NIU S Z,ZHANG J W.Masked image inpainting algorithm based on generative adversarial networks [J].Journal of Beijing University of Posts and Telecom,2018,41(3):81-86.(in Chinese) 曹志义,牛少彰,张继威.基于生成对抗网络的遮挡图像修复算法[J].北京邮电大学学报,2018,41(3):81-86. [11]ZHENG W B,WANG K F,WANG F Y.Background subtraction algorithm with bayesian generative adversarial networks [J].Acta Automatica Sinica,2018,44(5):878-890.(in Chinese) 郑文博,王坤峰,王飞跃.基于贝叶斯生成对抗网络的背景消减算法[J].自动化学报,2018,44(5):878-890. [12]ZHANG Y Z,GAN Z,CARIN L.Generating text via adversarial training[C]//Proceedings of the 30th Conference on Neural Information Processing Systems.Barcelona:MIT Press,2016:1543-1551. [13]REED S,AKATA Z,YAN X C,et al.Generative adver-sarial text to image synthesis[C]//Proceedings of the 33rd International Conference on Machine Learning.New York:ACM,2016:1060-1069. [14]WANG K F,GOU C,DUAN Y J,et al.Generative adversarial networks:the state of the art and beyond[J].Acta Automatica Sinica,2017,43(3):321-332.(in Chinese) 王坤峰,苟超,段艳杰,等.生成式对抗网络GAN的研究进展与展望[J].自动化学报,2017,43(3):321-332. [15]ARJVSKY M,CHINTALA S,BOTTOU L.Wasserstein gene-rative adversarial networks[C]//Proceedings of the 34th International Conference on Machine Learning.Sydney:ACM,2017:214-223. [16]MIRZA M,OSINDERO S.Conditional generative adversarial nets [J].Computer Science,2014,8(13):2672-2680. [17]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444. [18]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning [J].Nature,2015,518(7540):529-533. |
[1] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[2] | 刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波. 基于边缘智能的频谱地图构建与分发方法 Construction and Distribution Method of REM Based on Edge Intelligence 计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148 |
[3] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[4] | 张佳, 董守斌. 基于评论方面级用户偏好迁移的跨领域推荐算法 Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer 计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131 |
[5] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[6] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[7] | 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100 |
[8] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[9] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[10] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[11] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[12] | 袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟. 智能博弈对抗方法:博弈论与强化学习综合视角对比分析 Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning 计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174 |
[13] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[14] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[15] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
|