基于视觉注意力机制的异步优势行动者-评论家算法

doi:10.11896/j.issn.1002-137X.2019.05.026

Computer Science ›› 2019, Vol. 46 ›› Issue (5): 169-174.doi: 10.11896/j.issn.1002-137X.2019.05.026

Previous Articles Next Articles

Asynchronous Advantage Actor-Critic Algorithm with Visual Attention Mechanism

LI Jie^1,2, LING Xing-hong^1,2, FU Yu-chen^1,2, LIU Quan^1,2,3,4

(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)¹
(Provincial Key Laboratory for Computer Information Processing Technology,Soochow University,Suzhou,Jiangsu 215006,China)²
(Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University,Changchun 130012,China)³
(Collaborative Innovation Center of Novel Software Technology and Industrialization,Nanjing 210000,China)⁴

Received:2018-05-10 Revised:2018-08-11 Published:2019-05-15

Abstract

Abstract: Asynchronous deep reinforcement learning (ADRL) can greatly reduce the training time required for learning models by adopting the multiple threading techniques.However,as an exemplary algorithm of ADRL,asynchronous advantage actor-critic (A3C) algorithm fails to completely utilize some valuable regional information,leading to unsatisfactory performance for model training.Aiming at the above problem,this paper proposed an asynchronous advantage actor-critic model with visual attention mechanism (VAM-A3C).AM-A3C integrates visual attention mechanism with traditional asynchronous advantage actor-critic algorithms.By calculating the visual importance value of each area point in the whole image compared with the traditional Cofi algorithm,and obtaining the context vector of the attention mechanism via regression function and weighting function,Agent can focus on smaller but more valuable image areas to accelerate network model decoding and to learn the approximate optimal strategy more efficiently.Experimental results show the superior performance of VAM-A3C in some decision-making tasks based on visual perception compared with the traditional asynchronous deep reinforcement learning algorithm.

Key words: Actor-critic, Asynchronous advantage actor-critic, Asynchronous deep reinforcement learning, Visual attention mechanism

CLC Number:

TP181

LI Jie, LING Xing-hong, FU Yu-chen, LIU Quan. Asynchronous Advantage Actor-Critic Algorithm with Visual Attention Mechanism[J].Computer Science, 2019, 46(5): 169-174.

References

[1]YU K,JIA L,CHEN Y Q,et al.Deep learning:yesterday,today,and tomorrow[J].Journal of computer Research and Deve-lopment,2013,50(9):1799-1804.(in Chinese)余凯,贾磊,陈雨强,等.深度学习的昨天、今天和明天[J].计算机研究与发展,2013,50(9):1799-1804.
[2]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].Cambridge:MIT Press,1998.
[3]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning∥Proceedings of Workshops at the 26th Neural Information Processing Systems 2013.Lake Tahoe,USA,2013:201-220.
[4]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[5]WATKINS C J C H.Learning from Delayed Rewards[J].Robotics & Autonomous Systems,1989,15(4):233-235.
[6]VAN HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Learning with Double Q-Learning[C]∥Association for the Advance of Artificial Intelligence.2016:2094-2100.
[7]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay∥Proceedings of the 4th International Conference on Learning Representations.San Juan,Puerto Rico,2016:322-355.
[8]RUMMERY G A,NIRANJAN M.On-line Q-learning usingconnectionist systems[D].Cambridge:University of Cambridge,1994.
[9]SUTTON R S.Generalization in reinforcement learning:suc-cessful examples using sparse coarse coding[C]∥International Conference on Neural Information Processing Systems.MIT Press,1995:1038-1044.
[10]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]∥International Conference on Machine Learning.2016:1928-1937.
[11]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate.arXiv:1409.0473,2014.
[12]XU K,BA J,KIROS R,et al.Show,attend and tell:Neural ima-ge caption generation with visual attention[C]∥International Conference on Machine Learning.2015:2048-2057.
[13]BUSONIU L,BABUSKA R,DE SCHUTTER B,et al.Rein-forcement learning and dynamic programming using function approximators[M].CRC Press,2010.
[14]WIERING M,OTTERLO M V.Reinforcement Learning:State-of-the-Art[M].Springer Publishing Company,Incorporated,2012.
[15]SUTTON R S,MCALLESTER D A,SINGH S P,et al.Policy gradient methods for reinforcement learning with function approximation[C]∥Advances in neural information processing systems.2000:1057-1063.
[16]KAKADE S.A natural policy gradient[C]∥International Conference on Neural Information Processing Systems:Natural and Synthetic.MIT Press,2001:1531-1538.
[17]SILVER D,LEVER G,HEESS N,et al.Deterministic policygradient algorithms[C]∥International Conference on International Conference on Machine Learning.2014:387-395.
[18]KONDA V R,TSITSIKLIS J N.Actor-critic algorithms[C]∥Advances in Neural Information Processing Systems.2000:1008-1014.
[19]BHATNAGAR S,GHAVAMZADEH M,LEE M,et al.Incremental natural actor-critic algorithms[C]∥Advances in Neural Information Processing Systems.2008:105-112.
[20]KONDA V R,TSITSIKLIS J N.Actor-critic algorithms[C]∥Advances in Neural Information Processing Systems.2000:1008-1014.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Asynchronous Advantage Actor-Critic Algorithm with Visual Attention Mechanism

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 5

Metrics

Comments

Recommended 0

[1]	ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185.
[2]	DAI Shan-shan, LIU Quan. Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method [J]. Computer Science, 2021, 48(9): 235-243.
[3]	YUAN Ye, HE Xiao-ge, ZHU Ding-kun, WANG Fu-lee, XIE Hao-ran, WANG Jun, WEI Ming-qiang, GUO Yan-wen. Survey of Visual Image Saliency Detection [J]. Computer Science, 2020, 47(7): 84-91.
[4]	LI Li,ZHENG Jia-li,WANG Zhe,YUAN Yuan,SHI Jing. RFID Indoor Positioning Algorithm Based on Asynchronous Advantage Actor-Critic [J]. Computer Science, 2020, 47(2): 233-238.
[5]	JIN Yu-jing,ZHU Wen-wen,FU Yu-chen and LIU Quan. Actor-Critic Algorithm Based on Tile Coding and Model Learning [J]. Computer Science, 2014, 41(6): 239-242.