Computer Science ›› 2019, Vol. 46 ›› Issue (10): 265-272.doi: 10.11896/jsjkx.180901655

• Artificial Intelligence • Previous Articles     Next Articles

Reinforcement Learning Algorithm Based on Generative Adversarial Networks

CHEN Jian-ping1,2,3, ZOU Feng1,2,3, LIU Quan4, WU Hong-jie1,2,3, HU Fu-yuan1,2,3, FU Qi-ming1,2,3   

  1. (Institute of Electronics and Information Engineering,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China)1
    (Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency,Suzhou University ofScience and Technology,Suzhou,Jiangsu 215009,China)2
    (Suzhou Key Laboratory of Mobile Networking and Applied Technologies,Suzhou University ofScience and Technology,Suzhou,Jiangsu 215009,China)3
    (School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215009,China)4
  • Received:2018-09-05 Revised:2018-11-24 Online:2019-10-15 Published:2019-10-21

Abstract: With respect to the slow learning rate caused by the lack of experience samples at the early stage for most traditional reinforcement learning algorithms,this paper proposed a novel reinforcement learning algorithm based on the generative adversarial networks.At the early stage,the algorithm collects a small amount of experience samples to construct a real sample set by a stochastic policy,and utilizes the collected samples to train GAN.Then,this algorithm uses the GAN to generate samples to construct a virtual sample set.After that,by combining two sample set,this algorithm selects a batch of samples to train value function network,thus improving the learning rate to some extent.Moreover,combining a deep neural network,this algorithm introduces a new model namely rectified relationship unit to train the internal relationship between the state,action and the next state and reward,feedbacks the GAN with the relative entropy and improves the sample quality generated by GAN.Finally,this paper applied the proposed algorithm and DQN algorithm to the traditional CartPole and MountainCar problem on OpenAI Gym platform The experimental results show that the learning rate is accelerated effectively and the convergence time is cut down by 15% through the proposed method compared with DQN.

Key words: Reinforcement learning, Deep learning, Experience samples, Generative adversarial networks

CLC Number: 

  • TP391
[1]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].Cambridge:MIT Press,1998.
[2]PUTERMAN M.Markov decision process [J].Statistica Neerlandica,1985,39(2):219-233.
[3]WU Y,SHEN T.Policy Iteration algorithm for optimal control of stochastic logical dynamical systems [J].IEEE Transactions on Neural Networks & Learning Systems,2017,28(99):1-6.
[4]WEI Q,LIU D,LIN H.Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems [J].IEEE Transactions on Cybernetics,2016,46(3):840-853.
[5]BRADTKE S J,BARTO A G.Linear least-squares algorithms for temporal difference learning [J].Machine Learning,1996,22(1/2/3):33-57.
[6]HACHIYA H,AKIYAMA T,SUGIAYMA M,et al.Adaptive importance sampling for value function approximation in off-po-licy reinforcement learning [J].Neural Networks,2009,22(10):1399-1410.
[7]MAHMOOD A R,SUTTON R S.Off-policy learning based on weighted importance sampling with linear computational complexity[C]//Proceedings of the 31st International Conference on Uncertainty in Artificial Intelligence.Amsterdam:AUAI,2015:552-561.
[8]CHEN X L,CAO L,LI C X,et al.Deep reinforcement learning via good choice resampling experience replay memory [J].Control and Decision,2018,33(4):129-134.
[9]LEDIG C,THEIS L,HUSZÁR F,et al.Photo-realistic single image super-resolution using a generative adversarial network[C]//Proceedings of the 30th IEEE Conference on ComputerVision and Pattern Recognition.Hawaii:IEEE,2017:105-114.
[10]CAO Z Y,NIU S Z,ZHANG J W.Masked image inpainting algorithm based on generative adversarial networks [J].Journal of Beijing University of Posts and Telecom,2018,41(3):81-86.(in Chinese)
[11]ZHENG W B,WANG K F,WANG F Y.Background subtraction algorithm with bayesian generative adversarial networks [J].Acta Automatica Sinica,2018,44(5):878-890.(in Chinese)
[12]ZHANG Y Z,GAN Z,CARIN L.Generating text via adversarial training[C]//Proceedings of the 30th Conference on Neural Information Processing Systems.Barcelona:MIT Press,2016:1543-1551.
[13]REED S,AKATA Z,YAN X C,et al.Generative adver-sarial text to image synthesis[C]//Proceedings of the 33rd International Conference on Machine Learning.New York:ACM,2016:1060-1069.
[14]WANG K F,GOU C,DUAN Y J,et al.Generative adversarial networks:the state of the art and beyond[J].Acta Automatica Sinica,2017,43(3):321-332.(in Chinese)
[15]ARJVSKY M,CHINTALA S,BOTTOU L.Wasserstein gene-rative adversarial networks[C]//Proceedings of the 34th International Conference on Machine Learning.Sydney:ACM,2017:214-223.
[16]MIRZA M,OSINDERO S.Conditional generative adversarial nets [J].Computer Science,2014,8(13):2672-2680.
[17]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[18]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning [J].Nature,2015,518(7540):529-533.
[1] MA Yu-yin, ZHENG Wan-bo, MA Yong, LIU Hang, XIA Yun-ni, GUO Kun-yin, CHEN Peng, LIU Cheng-wu. Multi-workflow Offloading Method Based on Deep Reinforcement Learning and ProbabilisticPerformance-awarein Edge Computing Environment [J]. Computer Science, 2021, 48(1): 40-48.
[2] ZHANG Yang, MA Xiao-hu. Anime Character Portrait Generation Algorithm Based on Improved Generative Adversarial Networks [J]. Computer Science, 2021, 48(1): 182-189.
[3] WANG Rui-ping, JIA Zhen, LIU Chang, CHEN Ze-wei, LI Tian-rui. Deep Interest Factorization Machine Network Based on DeepFM [J]. Computer Science, 2021, 48(1): 226-232.
[4] YU Wen-jia, DING Shi-fei. Conditional Generative Adversarial Network Based on Self-attention Mechanism [J]. Computer Science, 2021, 48(1): 241-246.
[5] TONG Xin, WANG Bin-jun, WANG Run-zheng, PAN Xiao-qin. Survey on Adversarial Sample of Deep Learning Towards Natural Language Processing [J]. Computer Science, 2021, 48(1): 258-267.
[6] DING Yu, WEI Hao, PAN Zhi-song, LIU Xin. Survey of Network Representation Learning [J]. Computer Science, 2020, 47(9): 52-59.
[7] HE Xin, XU Juan, JIN Ying-ying. Action-related Network:Towards Modeling Complete Changeable Action [J]. Computer Science, 2020, 47(9): 123-128.
[8] YE Ya-nan, CHI Jing, YU Zhi-ping, ZHAN Yu-liand ZHANG Cai-ming. Expression Animation Synthesis Based on Improved CycleGan Model and Region Segmentation [J]. Computer Science, 2020, 47(9): 142-149.
[9] DENG Liang, XU Geng-lin, LI Meng-jie, CHEN Zhang-jin. Fast Face Recognition Based on Deep Learning and Multiple Hash Similarity Weighting [J]. Computer Science, 2020, 47(9): 163-168.
[10] BAO Yu-xuan, LU Tian-liang, DU Yan-hui. Overview of Deepfake Video Detection Technology [J]. Computer Science, 2020, 47(9): 283-292.
[11] LIU Ling-yun, QIAN Hui, XING Hong-jie, DONG Chun-ru, ZHANG Feng. Incremental Classification Model Based on Q-learning Algorithm [J]. Computer Science, 2020, 47(8): 171-177.
[12] MENG Li-sha, REN Kun, FAN Chun-qi, HUANG Long. Dense Convolution Generative Adversarial Networks Based Image Inpainting [J]. Computer Science, 2020, 47(8): 202-207.
[13] LIU Jun-liang, LI Xiao-guang. Techniques for Recommendation System:A Survey [J]. Computer Science, 2020, 47(7): 47-55.
[14] YUAN Ye, HE Xiao-ge, ZHU Ding-kun, WANG Fu-lee, XIE Hao-ran, WANG Jun, WEI Ming-qiang, GUO Yan-wen. Survey of Visual Image Saliency Detection [J]. Computer Science, 2020, 47(7): 84-91.
[15] WANG Wen-dao, WANG Run-ze, WEI Xin-lei, QI Yun-liang, MA Yi-de. Automatic Recognition of ECG Based on Stacked Bidirectional LSTM [J]. Computer Science, 2020, 47(7): 118-124.
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .