Computer Science ›› 2019, Vol. 46 ›› Issue (5): 169-174.doi: 10.11896/j.issn.1002-137X.2019.05.026

Previous Articles     Next Articles

Asynchronous Advantage Actor-Critic Algorithm with Visual Attention Mechanism

LI Jie1,2, LING Xing-hong1,2, FU Yu-chen1,2, LIU Quan1,2,3,4   

  1. (School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)1
    (Provincial Key Laboratory for Computer Information Processing Technology,Soochow University,Suzhou,Jiangsu 215006,China)2
    (Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University,Changchun 130012,China)3
    (Collaborative Innovation Center of Novel Software Technology and Industrialization,Nanjing 210000,China)4
  • Received:2018-05-10 Revised:2018-08-11 Published:2019-05-15

Abstract: Asynchronous deep reinforcement learning (ADRL) can greatly reduce the training time required for learning models by adopting the multiple threading techniques.However,as an exemplary algorithm of ADRL,asynchronous advantage actor-critic (A3C) algorithm fails to completely utilize some valuable regional information,leading to unsatisfactory performance for model training.Aiming at the above problem,this paper proposed an asynchronous advantage actor-critic model with visual attention mechanism (VAM-A3C).AM-A3C integrates visual attention mechanism with traditional asynchronous advantage actor-critic algorithms.By calculating the visual importance value of each area point in the whole image compared with the traditional Cofi algorithm,and obtaining the context vector of the attention mechanism via regression function and weighting function,Agent can focus on smaller but more valuable image areas to accelerate network model decoding and to learn the approximate optimal strategy more efficiently.Experimental results show the superior performance of VAM-A3C in some decision-making tasks based on visual perception compared with the traditional asynchronous deep reinforcement learning algorithm.

Key words: Asynchronous deep reinforcement learning, Visual attention mechanism, Actor-critic, Asynchronous advantage actor-critic

CLC Number: 

  • TP181
[1]YU K,JIA L,CHEN Y Q,et al.Deep learning:yesterday,today,and tomorrow[J].Journal of computer Research and Deve-lopment,2013,50(9):1799-1804.(in Chinese)余凯,贾磊,陈雨强,等.深度学习的昨天、今天和明天[J].计算机研究与发展,2013,50(9):1799-1804.
[2]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].Cambridge:MIT Press,1998.
[3]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning∥Proceedings of Workshops at the 26th Neural Information Processing Systems 2013.Lake Tahoe,USA,2013:201-220.
[4]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[5]WATKINS C J C H.Learning from Delayed Rewards[J].Robotics & Autonomous Systems,1989,15(4):233-235.
[6]VAN HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Learning with Double Q-Learning[C]∥Association for the Advance of Artificial Intelligence.2016:2094-2100.
[7]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay∥Proceedings of the 4th International Conference on Learning Representations.San Juan,Puerto Rico,2016:322-355.
[8]RUMMERY G A,NIRANJAN M.On-line Q-learning usingconnectionist systems[D].Cambridge:University of Cambridge,1994.
[9]SUTTON R S.Generalization in reinforcement learning:suc-cessful examples using sparse coarse coding[C]∥International Conference on Neural Information Processing Systems.MIT Press,1995:1038-1044.
[10]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]∥International Conference on Machine Learning.2016:1928-1937.
[11]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate.arXiv:1409.0473,2014.
[12]XU K,BA J,KIROS R,et al.Show,attend and tell:Neural ima-ge caption generation with visual attention[C]∥International Conference on Machine Learning.2015:2048-2057.
[13]BUSONIU L,BABUSKA R,DE SCHUTTER B,et al.Rein-forcement learning and dynamic programming using function approximators[M].CRC Press,2010.
[14]WIERING M,OTTERLO M V.Reinforcement Learning:State-of-the-Art[M].Springer Publishing Company,Incorporated,2012.
[15]SUTTON R S,MCALLESTER D A,SINGH S P,et al.Policy gradient methods for reinforcement learning with function approximation[C]∥Advances in neural information processing systems.2000:1057-1063.
[16]KAKADE S.A natural policy gradient[C]∥International Conference on Neural Information Processing Systems:Natural and Synthetic.MIT Press,2001:1531-1538.
[17]SILVER D,LEVER G,HEESS N,et al.Deterministic policygradient algorithms[C]∥International Conference on International Conference on Machine Learning.2014:387-395.
[18]KONDA V R,TSITSIKLIS J N.Actor-critic algorithms[C]∥Advances in Neural Information Processing Systems.2000:1008-1014.
[19]BHATNAGAR S,GHAVAMZADEH M,LEE M,et al.Incremental natural actor-critic algorithms[C]∥Advances in Neural Information Processing Systems.2008:105-112.
[20]KONDA V R,TSITSIKLIS J N.Actor-critic algorithms[C]∥Advances in Neural Information Processing Systems.2000:1008-1014.
[1] YUAN Ye, HE Xiao-ge, ZHU Ding-kun, WANG Fu-lee, XIE Hao-ran, WANG Jun, WEI Ming-qiang, GUO Yan-wen. Survey of Visual Image Saliency Detection [J]. Computer Science, 2020, 47(7): 84-91.
[2] LI Li,ZHENG Jia-li,WANG Zhe,YUAN Yuan,SHI Jing. RFID Indoor Positioning Algorithm Based on Asynchronous Advantage Actor-Critic [J]. Computer Science, 2020, 47(2): 233-238.
[3] JIN Yu-jing,ZHU Wen-wen,FU Yu-chen and LIU Quan. Actor-Critic Algorithm Based on Tile Coding and Model Learning [J]. Computer Science, 2014, 41(6): 239-242.
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .