基于状态估计的值分解方法

doi:10.11896/jsjkx.220500270

Abstract

Abstract: Value factorization is a popular method to solve cooperative multi-agent deep reinforcement learning problems,which factorizes joint value function into individual value functions according to IGM principle.In this method,agents select actions only according to individual value functions based on local observation,which leads to agents cannot effectively use global information to learn strategy.Although many value factorization algorithms extract the features of global state to weight individual value functions by many approaches,including attention mechanism,super network,and et al,so as to indirectly utilize global information to train agents,but this utilization is pretty limited.In a complex environment,it is difficult for agents to learn effective stra-tegies and their learning efficiency is poor.In order to improve agents' policy learning ability,an optimized value factorization method based on state estimation(SE-VF) is put forward,which introduces a state network to extract the features of global state and get a state value,and then take state loss value as part of the loss function to update agents network parameters,so as to optimize the strategy selection process of agents.Experimental results show that SE-VF performs better than QMIX and other baselines in multiple scenarios of the StarCraft 2 micromanagement mission test platform.

Key words: State estimation, Value factorization, Multi-agent reinforcement learning, Deep reinforcement learning

CLC Number:

TP181

XIONG Liqin, CAO Lei, CHEN Xiliang, LAI Jun. Value Factorization Method Based on State Estimation[J].Computer Science, 2023, 50(8): 202-208.

References

[1]SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[2]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[3]LI Y,XU F,XIE G Q,et al.Survey of development and application of multi-agent technology[J].Computer Engineering and Applications,2018,54(9):13-21.
[4]SUN Y,CAO L,CHEN X L,et al.Overview of multi-agent deep reinforcement learning[J].Computer engineering and Application,2020,56(5):13-24.
[5]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-Decomposition Networks For Cooperative Multi-Agent Learning Based on Team Reward[C]//Proceedings of the 17th International Conference on Autonomous Agents and Multi-Agent Systems.2018:2085-2087.
[6]RASHID T,SAMVELYAN M,SCHROEDER C,et al.Qmix:Monotonic value function factorisation for deep multi-agent reinforcement learning[C]//International Conference on Machine Learning.2018:4295-4304.
[7]FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counterfactual multi-agent policy gradients[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018:2974-2982.
[8]TAMPUU A,MATIISEN T,KODELJA D,et al.Multiagentcooperation and competition with deep reinforcement learning[J].PloS one,2017,12(4):e0172395.
[9]RASHID T,FARQUHAR G,PENG B,et al.Weighted QMIX:Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning[C]//Advances in Neural Information Processing Systems.2020:10199-10210.
[10]IQBAL S,WITT C S D,PENG B,et al.AI-QMIX:Attentionand Imagination for Dynamic Multi-Agent Reinforcement Lear-ning[J].arXiv:2006.04222,2020.
[11]ZHAO J,YANG M,HU X,et al.DQMIX:A Distributional Pers-pective on Multi-Agent Reinforcement Learning[J].arXiv:2202.10134,2022.
[12]YAO X,WEN C,WANG Y,et al.SMIX(λ):Enhancing Centra-lized Value Functions for Cooperative Multi-Agent Reinforcement Learning[J].IEEE Transactions on Neural Networks and Learning Systems,2021,6:1-12.
[13]SON K,KIM D,KANG W J,et al.Qtran:Learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]//International Conference on Machine Learning.2019:5887-5896.
[14]SON K,AHN S,REYES R D,et al.QTRAN++:Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning[J].arXiv:2006.12010,2020.
[15]YANG Y,HAO J,LIAO B,et al.Qatten:A general framework for cooperative multiagent reinforcement learning[J].arXiv:2002.03939,2020.
[16]ZHANG Y,MA H,WANG Y.AVD-Net:Attention Value Decomposition Network For Deep Multi-Agent Reinforcement Learning[C]//2020 25th International Conference on Pattern Recognition(ICPR).2021:7810-7816.
[17]WANG J,REN Z,LIU T,et al.QPLEX:Duplex Dueling Multi-Agent Q-Learning[J].arXiv:2008.01062,2020.
[18]IQBAL S,DE WITT C A S,PENG B,et al.Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning[C]//International Conference on Machine Learning.2021:4596-4606.
[19]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[20]XU Z,LI D,BAI Y,et al.MMD-MIX:Value Function Factorisation with Maximum Mean Discrepancy for Cooperative Multi-Agent Reinforcement Learning[C]//2021 International Joint Conference on Neural Networks(IJCNN).2021:1-7.
[21]FOERSTER J N,ASSAEL Y M,DE FREITAS N,et al.Lear-ning to communicate with Deep multi-agent reinforcement lear-ning[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.2016:2145-2153.
[22]WU B,YANG X,SUN C,et al.Learning Effective Value Function Factorization via Attentional Communication[C]//2020 IEEE International Conference on Systems,Man,and Cyberne-tics(SMC).2020:629-634.
[23]ZHOU H,LAN T,AGGARWAL V.Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients[J].arXiv:2201.01247,2022.
[24]OLIEHOEK F A,SPAAN M T,VLASSIS N.Optimal and Approximate Q-value Functions for Decentralized POMDPs[J].Journal of Artificial Intelligence Research,2008,32:289-353.
[25]HAUSKNECHT M,STONE P.Deep recurrent Q-learningfor partially observable mdps[C]//2015 AAAI Fall Symposium Series.2015:29-37.

Related Articles 15

[1]	JIN Tiancheng, DOU Liang, ZHANG Wei, XIAO Chunyun, LIU Feng, ZHOU Aimin. OJ Exercise Recommendation Model Based on Deep Reinforcement Learning and Program Analysis [J]. Computer Science, 2023, 50(8): 58-67.
[2]	LIN Xiangyang, XING Qinghua, XING Huaixi. Study on Intelligent Decision Making of Aerial Interception Combat of UAV Group Based onMADDPG [J]. Computer Science, 2023, 50(6A): 220700031-7.
[3]	WANG Hanmo, ZHENG Shijie, XU Ruonan, GUO Bin, WU Lei. Self Reconfiguration Algorithm of Modular Robot Based on Swarm Agent Deep Reinforcement Learning [J]. Computer Science, 2023, 50(6): 266-273.
[4]	ZHANG Qiyang, CHEN Xiliang, CAO Lei, LAI Jun, SHENG Lei. Survey on Knowledge Transfer Method in Deep Reinforcement Learning [J]. Computer Science, 2023, 50(5): 201-216.
[5]	YU Ze, NING Nianwen, ZHENG Yanliu, LYU Yining, LIU Fuqiang, ZHOU Yi. Review of Intelligent Traffic Signal Control Strategies Driven by Deep Reinforcement Learning [J]. Computer Science, 2023, 50(4): 159-171.
[6]	XU Linling, ZHOU Yuan, HUANG Hongyun, LIU Yang. Real-time Trajectory Planning Algorithm Based on Collision Criticality and Deep Reinforcement Learning [J]. Computer Science, 2023, 50(3): 323-332.
[7]	Cui ZHANG, En WANG, Funing YANG, Yong jian YANG , Nan JIANG. UAV Frequency-based Crowdsensing Using Grouping Multi-agentDeep Reinforcement Learning [J]. Computer Science, 2023, 50(2): 57-68.
[8]	HUANG Yuzhou, WANG Lisong, QIN Xiaolin. Bi-level Path Planning Method for Unmanned Vehicle Based on Deep Reinforcement Learning [J]. Computer Science, 2023, 50(1): 194-204.
[9]	RONG Huan, QIAN Minfeng, MA Tinghuai, SUN Shengjie. Novel Class Reasoning Model Towards Covered Area in Given Image Based on InformedKnowledge Graph Reasoning and Multi-agent Collaboration [J]. Computer Science, 2023, 50(1): 243-252.
[10]	ZHANG Qiyang, CHEN Xiliang, ZHANG Qiao. Sparse Reward Exploration Method Based on Trajectory Perception [J]. Computer Science, 2023, 50(1): 262-269.
[11]	WEI Nan, WEI Xianglin, FAN Jianhua, XUE Yu, HU Yongyang. Backdoor Attack Against Deep Reinforcement Learning-based Spectrum Access Model [J]. Computer Science, 2023, 50(1): 351-361.
[12]	SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256.
[13]	YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253.
[14]	LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279.
[15]	XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Value Factorization Method Based on State Estimation

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0