一种基于生成对抗网络的强化学习算法

doi:10.11896/jsjkx.180901655

Abstract

Abstract: With respect to the slow learning rate caused by the lack of experience samples at the early stage for most traditional reinforcement learning algorithms,this paper proposed a novel reinforcement learning algorithm based on the generative adversarial networks.At the early stage,the algorithm collects a small amount of experience samples to construct a real sample set by a stochastic policy,and utilizes the collected samples to train GAN.Then,this algorithm uses the GAN to generate samples to construct a virtual sample set.After that,by combining two sample set,this algorithm selects a batch of samples to train value function network,thus improving the learning rate to some extent.Moreover,combining a deep neural network,this algorithm introduces a new model namely rectified relationship unit to train the internal relationship between the state,action and the next state and reward,feedbacks the GAN with the relative entropy and improves the sample quality generated by GAN.Finally,this paper applied the proposed algorithm and DQN algorithm to the traditional CartPole and MountainCar problem on OpenAI Gym platform The experimental results show that the learning rate is accelerated effectively and the convergence time is cut down by 15% through the proposed method compared with DQN.

Key words: Deep learning, Experience samples, Generative adversarial networks, Reinforcement learning

CLC Number:

TP391

CHEN Jian-ping, ZOU Feng, LIU Quan, WU Hong-jie, HU Fu-yuan, FU Qi-ming. Reinforcement Learning Algorithm Based on Generative Adversarial Networks[J].Computer Science, 2019, 46(10): 265-272.

References

[1]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].Cambridge:MIT Press,1998.
[2]PUTERMAN M.Markov decision process [J].Statistica Neerlandica,1985,39(2):219-233.
[3]WU Y,SHEN T.Policy Iteration algorithm for optimal control of stochastic logical dynamical systems [J].IEEE Transactions on Neural Networks & Learning Systems,2017,28(99):1-6.
[4]WEI Q,LIU D,LIN H.Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems [J].IEEE Transactions on Cybernetics,2016,46(3):840-853.
[5]BRADTKE S J,BARTO A G.Linear least-squares algorithms for temporal difference learning [J].Machine Learning,1996,22(1／2／3):33-57.
[6]HACHIYA H,AKIYAMA T,SUGIAYMA M,et al.Adaptive importance sampling for value function approximation in off-po-licy reinforcement learning [J].Neural Networks,2009,22(10):1399-1410.
[7]MAHMOOD A R,SUTTON R S.Off-policy learning based on weighted importance sampling with linear computational complexity[C]//Proceedings of the 31st International Conference on Uncertainty in Artificial Intelligence.Amsterdam:AUAI,2015:552-561.
[8]CHEN X L,CAO L,LI C X,et al.Deep reinforcement learning via good choice resampling experience replay memory [J].Control and Decision,2018,33(4):129-134.
[9]LEDIG C,THEIS L,HUSZÁR F,et al.Photo-realistic single image super-resolution using a generative adversarial network[C]//Proceedings of the 30th IEEE Conference on ComputerVision and Pattern Recognition.Hawaii:IEEE,2017:105-114.
[10]CAO Z Y,NIU S Z,ZHANG J W.Masked image inpainting algorithm based on generative adversarial networks [J].Journal of Beijing University of Posts and Telecom,2018,41(3):81-86.(in Chinese)
曹志义,牛少彰,张继威.基于生成对抗网络的遮挡图像修复算法[J].北京邮电大学学报,2018,41(3):81-86.
[11]ZHENG W B,WANG K F,WANG F Y.Background subtraction algorithm with bayesian generative adversarial networks [J].Acta Automatica Sinica,2018,44(5):878-890.(in Chinese)
郑文博,王坤峰,王飞跃.基于贝叶斯生成对抗网络的背景消减算法[J].自动化学报,2018,44(5):878-890.
[12]ZHANG Y Z,GAN Z,CARIN L.Generating text via adversarial training[C]//Proceedings of the 30th Conference on Neural Information Processing Systems.Barcelona:MIT Press,2016:1543-1551.
[13]REED S,AKATA Z,YAN X C,et al.Generative adver-sarial text to image synthesis[C]//Proceedings of the 33rd International Conference on Machine Learning.New York:ACM,2016:1060-1069.
[14]WANG K F,GOU C,DUAN Y J,et al.Generative adversarial networks:the state of the art and beyond[J].Acta Automatica Sinica,2017,43(3):321-332.(in Chinese)
王坤峰,苟超,段艳杰,等.生成式对抗网络GAN的研究进展与展望[J].自动化学报,2017,43(3):321-332.
[15]ARJVSKY M,CHINTALA S,BOTTOU L.Wasserstein gene-rative adversarial networks[C]//Proceedings of the 34th International Conference on Machine Learning.Sydney:ACM,2017:214-223.
[16]MIRZA M,OSINDERO S.Conditional generative adversarial nets [J].Computer Science,2014,8(13):2672-2680.
[17]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[18]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning [J].Nature,2015,518(7540):529-533.

Related Articles 15

[1]	RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2]	LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241.
[3]	TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[4]	XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[5]	SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256.
[6]	WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[7]	HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[8]	JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[9]	SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[10]	YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204.
[11]	HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[12]	CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[13]	HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[14]	ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[15]	SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Reinforcement Learning Algorithm Based on Generative Adversarial Networks

PDF (PC)