Computer Science ›› 2019, Vol. 46 ›› Issue (8): 1-8.doi: 10.11896/j.issn.1002-137X.2019.08.001
• Big Data & Data Science • Next Articles
DU Wei1, DING Shi-fei1,2
CLC Number:
[1]ZHAO Z H,GAO Y,LUO B,et al.Reinforcement Learning Technology in Multi-Agent System[J].Computer Science,2004,31(3):23-27.(in Chinese) 赵志宏,高阳,骆斌,等.多Agent系统中强化学习的研究现状和发展趋势[J].计算机科学,2004,31(3):23-27. [2]GAO Y,CHEN S F,LU X.Research on Reinforcement Lear- ning Technology:A Review[J].ACTA AUTOMATICA SINICA,2004,30(1):86-100.(in Chinese) 高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100. [3]LIU Q,ZHAI J W,ZHANG Z C,et al.A Survey on Deep Reinforcement Learning[J].Chinese Journal of Computers,2018,40(1):1-27.(in Chinese) 刘全,翟建伟,章宗长,等.深度强化学习综述[J].计算机学报,2018,40(1):1-27. [4]YANG W C,ZHANG L.Multi-agent reinforcement learning based traffic signal control for integrated urban network:survey of state of art[J].Application Research of Computers,2018,35(6):13-18.(in Chinese) 杨文臣,张轮.多智能体强化学习在城市交通网络信号控制方法中的应用综述[J].计算机应用研究,2018,35(6):13-18. [5]ZHANG W X,MA L,WANG X D.Reinforcement learning for event-triggered multi-agent systems[J].CAAI Transactions on Intelligent Systems,2017,12(1):82-87.(in Chinese) 张文旭,马磊,王晓东.基于事件驱动的多智能体强化学习研究[J].智能系统学报,2017,12(1):82-87. [6]XI L,CHEN J F,HUANG Y H,et al.Smart generation control based on multi-agent reinforcement learning with the idea of the time tunnel[J].Scientia Sinica,2018,48(4):441-456.(in Chinese) 席磊,陈建峰,黄悦华,等.基于具有时间隧道思想的多智能体强化学习的智能发电控制方法[J].中国科学:技术科学,2018,48(4):441-456. [7]LITTMAN M L.Markov games as a framework for multi-agent reinforcement learning[M].New Brunswick:Machine Learning Proceedings,1994:157-163. [8]ZHAO X Y,DING S F.Research on Deep Reinforcement Lear- ning[J].Computer Science,2018,45(7):1-6.(in Chinese) 赵星宇,丁世飞.深度强化学习研究综述[J].计算机科学,2018,45(7):1-6. [9]GU S,HOLLY E,LILLICRAP T,et al.Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates[C]∥IEEE International Conference on Robotics and Automation.Singapore:IEEE Press,2017:3389-3396. [10]FOERSTER J,ASSAEL I,DE FREITAS N,et al.Learning to communicate with deep multi-agent reinforcement learning[C]∥Advances in Neural Information Processing Systems.Spain:NIPS Press,2016:2137-2145. [11]LOWE R,WU Y,et al.Multi-agent actor-critic for mixed coo- perative-competitive environments[C]∥Advances in Neural Information Processing Systems.Los Angeles:NIPS Press,2017:6379-6390. [12]LANCTOT M,ZAMBALDI V,GRUSLYS A,et al.A unified game-theoretic approach to multi-agent reinforcement learning[C]∥Advances in Neural Information Processing Systems.Los Angeles:NIPS Press,2017:4190-4203. [13]LEIBO J,ZAMBALDI V,LANCTOT M,et al.Multi-agent reinforcement learning in sequential social dilemmas[C]∥Procee-dings of the 16th Conference on Autonomous Agents and Multi-agent Systems.Singapore:AAMAS Press,2017:464-473. [14]SHALEV-SHWARTZ S,SHAMMAH S,SHASHUA A.Safe,multi-agent,reinforcement learning for autonomous driving[J/OL].https://arxiv.org/abs/1610.03295. [15]JIN J,SONG C,LI H,et al.Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising[J/OL].https://arxiv.org/abs/1802.09756. [16]XI L,CHEN J,HUANG Y,et al.Smart generation control based on multi-agent reinforcement learning with the idea of the time tunnel[J].Energy,2018,153:977-987. [17]PEROLAT J,LEIBO J Z,ZAMBALDI V,et al.A multi-agent reinforcement learning model of common-pool resource appropriation[C]∥Advances in Neural Information Processing Systems.Los Angeles:NIPS Press,2017:3643-3652. [18]SUTTON R.Introduction:The challenge of reinforcement learning [M].Springer,Boston,MA:Reinforcement Learning,1992:1-3. [19]BUSONIU L,BABUKA R,DE SCHUTTER B.Multi-agent reinforcement learning:An overview[J].Innovations in multi-agent systems and applications-1,2010,310:183-221. [20]WATKINS C,DAYAN P.Q-learning[J].Machine Learning, 1992,8(3/4):279-292. [21]LITTMAN M.Value-function reinforcement learning in Markov games[J].Cognitive Systems Research,2001,2(1):55-66. [22]LAUER M,RIEDMILLER M.An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems[C]∥Seventeenth International Conference on Machine Lear-ning.Stanford:Morgan Kaufmann Press,2000:535-542. [23]GREENWALD A,HALL K,SERRANO R.Correlated Q-lear- ning[C]∥ICML.Washington:ICML Press,2003:242-249. [24]KONONEN V.Asymmetric multiagent reinforcement learning [C]∥International Conference on Intelligent Agent Technology.Canada:IEEE Press,2003:336-342. [25]HU J,WELLMAN M.Multiagent reinforcement learning:theoretical framework and an algorithm[C]∥ICML.Wisconsin:ICML Press,1998:242-250. [26]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning[C]∥Proceedings of Workshops at the 26th Neural Information Processing Systems 2013.Lake Tahoe,USA:NIPS Press,2013:201-220. [27]VAN HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Learning with Double Q-Learning[C]∥AAAI.Arizona:AAAI Press,2016:5. [28]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[C]∥proceedings of the 4th International Conference on Learning Representations.San Juan,Puerto Rico:ICLR Press,2016:322-355. [29]OSBAND I,VAN ROY B,WEN Z.Generalization and exploration via randomized value functions[J].Proceedings of the 33rd International Conference on International Conference on Machine Learning,2014,48(1):2377-2386. [30]MUNOS R,STEPLETON T,HARUTYUNYAN A,et al.Safe and efficient off-policy reinforcement learning[C]∥Advances in Neural Information Processing Systems.Spain:NIPS Press,2016:1054-1062. [31]FRANÇOIS-LAVET V,FONTENEAU R,ERNST D.How to discount deep reinforcement learning:Towards new dynamic strategies[C]∥Proceedings of the Workshops at the Advances in Neural Information Processing Systems.Montreal,Canada:NIPS Press,2015:1107-1160. [32]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning:U.S.Patent Application 15/217,758[P].2017-1-26. [33]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]∥International Conference on Machine Learning.New York City:ICML press,2016:1928-1937. [34]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithms[J/OL].https://arxiv.org/abs/1707.06347. [35]HEESS N,SRIRAM S,LEMMON J,et al.Emergence of locomotion behaviors in rich environments[J/OL].https://arxiv.org/abs/1707.02286. [36]FOERSTER J,NARDELLI N,FARQUHAR G,et al.Stabilizing experience replay for deep multi-agent reinforcement lear-ning[J].International Conference on Machine Learning,2017,70(3):1146-1155. [37]CIOSEK K,WHITESON S.Offer:Off environment reinforcement learning[J].AAAI Conference on Artificial Intelligence,2017. [38]TESAURO G.Extending q-learning to general adaptivemulti-agent systems[J].Advances in Neural Information Processing Systems,2004,16(4):871-878. [39]TAN M.Multi-Agent Reinforcement Learning:Independent vs.Cooperative Agents[C]∥Proceedings of the Tenth International Conference on Machine Learning.MA,USA:ICML Press,1993:330-337. [40]SHOHAM Y,LEYTON K.Multiagent Systems:Algorithmic, Game-Theoretic,and Logical Foundations[M].New York:Cambridge University Press,2009. [41]ZAWADZKI E,LIPSON A,LEYTON K.Empirically evaluating multiagent learning algorithms[J/OL].https://arxiv.org/abs/1401.8074. [42]YANG Y,LUO R,LI M,et al.Mean Field Multi-Agent Reinforcement Learning[J/OL].https://arxiv.org/abs/1802.05438. [43]PALMER G,TUYLS K,BLOEMBERGEN D,et al.Lenient multi-agent deep reinforcement learning[C]∥Proceedings of the 17th International Conference on Autonomous Agents and Multi-agent Systems.Swede:AAMAS press,2018:443-451. [44]ZHENG Y,MENG Z,HAO J,et al.Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments[C]∥ Pacific Rim International Conference on Artificial Intelligence.Springer,Cham:PRICAI press,2018:421-429. [45]TAMPUU A,MATIISEN T,KODELJA D,et al.Multiagent cooperation and competition with deep reinforcement learning [J].Plus One,2017,12(4):e0172395. [46]SONG J,REN H,SADIGH D,et al.Multi-agent generative adversarial imitation learning[J/OL].https://arxiv.org/abs/1807.09936. [47]WAI H T,YANG Z,WANG Z,et al.Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization[J/OL].https://arxiv.org/abs/1806.00877. [48]ABOUHEAF M,GUEAIEB W.Multi-agent reinforcement learning approach based on reduced value function approximations[C]∥2017 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS).Canada:IEEE Press,2017:111-116. [49]QI S,ZHU S C.Intent-aware Multi-agent Reinforcement Lear- ning[J/OL].https://arxiv.org/abs/1803.02018. [50]RAILEANU R,DENTON E,SZLAM A,et al.Modeling Others using Oneself in Multi-Agent Reinforcement Learning[J/OL].https://arxiv.org/abs/1802.09640. [51]RABINOWITZ N,PERBET F,SONG H,et al.Machine Theory of Mind[J/OL].https://arxiv.org/abs/1802.07740. [52]OMIDSHAFIEI S,KIM D,LIU M,et al.Learning to Teach in Cooperative Multiagent Reinforcement Learning[J/OL].https://arxiv.org/abs/1805.07830. [53]GU S,LILLICRAP T,SUTSKEVER I,et al.Continuous deep q-learning with model-based acceleration[C]∥International Conference on Machine Learning.New York City:ICML Press,2016:2829-2838. [54]DUAN Y,CHEN X,HOUTHOOFT R,et al.Benchmarking deep reinforcement learning for continuous control[C]∥International Conference on Machine Learning.New York City:ICML Press,2016:1329-1338. [55]KOFINAS P,DOUNIS A I,VOUROS G A.Fuzzy Q-Learning for multi-agent decentralized energy management in microgrids[J].Applied Energy,2018,219(3):53-67. [56]CHEN W,ZHOU K,CHEN C.Real-time bus holding control on a transit corridor based on multi-agent reinforcement learning[C]∥ 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).Brazil:IEEE Press,2016:100-106. [57]VIDHATE D A,KULKARNI P.Cooperative multi-agent reinforcement learning models (CMRLM) for intelligent traffic control[C]∥2017 1st International Conference on Intelligent Systems and Information Management (ICISIM).India:IEEE Press,2017:325-331. |
[1] | RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207. |
[2] | LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241. |
[3] | TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305. |
[4] | JIANG Yang-yang, SONG Li-hua, XING Chang-you, ZHANG Guo-min, ZENG Qing-wei. Belief Driven Attack and Defense Policy Optimization Mechanism in Honeypot Game [J]. Computer Science, 2022, 49(9): 333-339. |
[5] | XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171. |
[6] | SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256. |
[7] | WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293. |
[8] | HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329. |
[9] | JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335. |
[10] | SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177. |
[11] | YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204. |
[12] | HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78. |
[13] | CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126. |
[14] | HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163. |
[15] | ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169. |
|