Computer Science ›› 2022, Vol. 49 ›› Issue (8): 247-256.doi: 10.11896/jsjkx.210700100

• Artificial Intelligence • Previous Articles     Next Articles

Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning

SHI Dian-xi1,2,4, ZHAO Chen-ran1, ZHANG Yao-wen3, YANG Shao-wu1, ZHANG Yong-jun2   

  1. 1 School of Computer Science,National University of Defense Technology,Changsha 410073,China
    2 National Innovation Institute of Defense Technology,Academy of Military Sciences,Beijing 100166,China
    3 Unit 32282 of People’s Liberation Army of China,Jinan 250000,China
    4 Tianjin Artificial Intelligence Innovation Center,Tianjin 300457,China
  • Received:2021-07-09 Revised:2022-01-05 Published:2022-08-02
  • About author:SHI Dian-xi,born in 1966,Ph.D,professor,Ph.D supervisor.His main research interests include distributed object middleware technology,adaptive software technology,artificial intelligence and robot operating systems.
    ZHANG Yong-Jun,born in 1966,Ph.D,professor.His main research interests include artificial intelligence,multi-agent cooperation,machine learning and feature recognition.
  • Supported by:
    National Natural Science Foundation of China(91948303).

Abstract: At present,most multi-agent reinforcement learning(MARL) algorithms using the architecture of centralized training and decentralized execution(CTDE) have good results in homogeneous multi-agent systems.However,for heterogeneous multi-agent systems composed of different roles,there is always the problem of credit assignment,which makes it difficult for agents to learn effective cooperation strategies.To tackle the above problems,an adaptive reward method with end-to-end cooperation based on multi-agent reinforcement learning is proposed.It can promote the cooperation between agents.First,a batch regularization network is proposed.It uses a graph neural network to model the cooperative relationship of heterogeneous multi-agents.And it uses the attention mechanism to calculate the weight of key information.Also,it uses the batch regularization method to generate feature vectors.Besides,it guides the algorithm to learn in the right direction,thereby effectively improving the performance of heterogeneous multi-agent cooperative strategy generation.Second,an adaptive intrinsic reward network based on the actor-critic method is proposed.It can convert sparse rewards into dense rewards,which can guide agents to generate cooperative strategies according to the situation on the field.Through experiments,compared with the current mainstream multi-agent reinforcement learning algorithms,the proposed method has achieved significantly good results in the “cooperative-game” scenario.In addition,the visual analysis of the strategy-reward-behavior correlation further verifies the effectiveness of the proposed method.

Key words: Adaptive intrinsic reward, Graph attention network, Multi-agent reinforcement learning

CLC Number: 

  • TP391
[1]WIERING M A.Multi-agent reinforcement learning for traffic light control[C]//Machine Learning:Proceedings of the Seventeenth International Conference(ICML’2000).2000:1151-1158.
[2]SALLAB A E L,ABDOU M,PEROT E,et al.Deep reinforcement learning framework for autonomous driving[J].Electronic Imaging,2017,2017(19):70-76.
[3]ZHAI Y Y.Multi-agent reinforcement learning-driven dynamic channel allocation for unmanned aerial vehicles [J/OL].Telecommunications Technology.
[4]DENG Q T,HU DAN E,CAI T T,et al.Reactive Power Optimization Strategy of Distribution Network Based on Multi-Agent Deep Reinforcement Learning [J].New Technology of Electrical Engineering,2022,41(2):10-20.
[5]WU Y,ZHANG B,YANG S,et al.Energy-efficient joint communication-motion planning for relay-assisted wireless robot surveillance[C]//IEEE INFOCOM 2017-IEEE Conference on Computer Communications.IEEE,2017:1-9.
[6]WANG T,WANG J,ZHENG C,et al.Learning nearly decomposable value functions via communication minimization[J].arXiv:1910.05366,2019.
[7]LOWE R,WU Y I,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Advances in Neural Information Processing Systems.2017:6379-6390.
[8]FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counterfactual multi-agent policy gradients[J].arXiv:1705.08926,2017.
[9]RASHID T,SAMVELYAN M,DE WITT C S,et al.QMIX:Monotonic value function factorisation for deep multi-agent reinforcement learning[J].arXiv:1803.11485,2018.
[10]YANG Y,LUO R,LI M,et al.Mean field multi-agent reinforcement learning[J].arXiv:1802.05438,2018.
[11]JAQUES N,LAZARIDOU A,HUGHES E,et al.Social in-fluence as intrinsic motivation for multi-agent deep reinforcement learning[C]//International Conference on Machine Lear-ning.PMLR,2019:3040-3049.
[12]SUKHBAATAR S,FERGUS R.Learning multiagent communication with backpropagation[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:2252-2260.
[13]LIU Y,WANG W,HU Y,et al.Multi-Agent Game Abstraction via Graph Attention Neural Network[C]//AAAI.2020:7211-7218.
[14]YOU J,LIU B,YING Z,et al.Graph convolutional policy network for goal-directed molecular graph generation[C]//Advances in Neural Information Processing Systems.2018:6410-6421.
[15]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[16]KAPETANAKIS S,KUDENKO D.Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems[M]//Adaptive Agents and Multi-Agent Systems II.Berlin:Springer,2004:119-131.
[17]IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning.PMLR,2015:448-456.
[18]WANG W,YANG T,LIU Y,et al.From Few to More:Large-Scale Dynamic Multiagent Curriculum Learning[C]//AAAI.2020:7293-7300.
[19]ZAMBALDI V,RAPOSO D,SANTORO A,et al.Relationaldeep reinforcement learning[J].arXiv:1806.01830,2018.
[20]TACCHETTI A,SONG H F,MEDIANO P A M,et al.Relational forward models for multi-agent learning[J].arXiv:1809.11044,2018.
[21]MALYSHEVA A,SUNG T T,SOHN C B,et al.Deep multi-agent reinforcement learning with relevance graphs[J].arXiv:1811.12557,2018.
[22]ZHANG T,XU H,WANG X,et al.Multi-Agent Collaboration via Reward Attribution Decomposition[J].arXiv:2010.08531,2020.
[23]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region po-licy optimization[C]//International Conference on Machine Learning.PMLR,2015:1889-1897.
[24]WANG Q,XIONG J,HAN L,et al.Exponentially WeightedImitation Learning for Batched Historical Data[C]//NeurIPS.2018:6291-6300.
[25]MORDATCH I,ABBEEL P.Emergence of Grounded Compositional Language in Multi-Agent Populations[J].arXiv:1703.04908,2017.
[26]VINYALS O,EWALDS T,BARTUNOV S,et al.Starcraft ii:A new challenge for reinforcement learning[J].arXiv:1708.04782,2017.
[27]SAMVELYAN M,RASHID T,DE WITT C S,et al.The starcraft multi-agent challenge[J].arXiv:1902.04043,2019.
[28]TAN M.Multi-agent reinforcement learning:Independent vs.cooperative agents[C]//Proceedings of the Tenth International Conference on Machine Learning.1993:330-337.
[29]DU Y,HAN L,FANG M,et al.Liir:Learning individual intrinsic reward in multi-agent reinforcement learning[C]//33rd Conference on Neural Information Processing Systems(NeurIPS 2019).Vancouver,Cannada,2019.
[1] TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[2] ZENG Wei-liang, CHEN Yi-hao, YAO Ruo-yu, LIAO Rui-xiang, SUN Wei-jun. Application of Spatial-Temporal Graph Attention Networks in Trajectory Prediction for Vehicles at Intersections [J]. Computer Science, 2021, 48(6A): 334-341.
[3] DU Shao-hua, WAN Huai-yu, WU Zhi-hao, LIN You-fang. Customs Commodity HS Code Classification Integrating Text Sequence and Graph Information [J]. Computer Science, 2021, 48(4): 97-103.
[4] LIU Zhi-xin, ZHANG Ze-hua, ZHANG Jie. Top-N Recommendation Method for Graph Attention Based on Multi-level and Multi-view [J]. Computer Science, 2021, 48(4): 104-110.
[5] DU Wei, DING Shi-fei. Overview on Multi-agent Reinforcement Learning [J]. Computer Science, 2019, 46(8): 1-8.
Full text



No Suggested Reading articles found!