计算机科学 ›› 2022, Vol. 49 ›› Issue (8): 191-204.doi: 10.11896/jsjkx.220200174

• 人工智能 • 上一篇    下一篇

智能博弈对抗方法:博弈论与强化学习综合视角对比分析

袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟   

  1. 国防科技大学智能科学学院 长沙 410073
  • 收稿日期:2022-02-27 修回日期:2022-03-22 发布日期:2022-08-02
  • 通讯作者: 陆丽娜(lulina16@nudt.edu.cn)
  • 作者简介:(yuanweilin12@nudt.edu.cn)
  • 基金资助:
    国家自然科学基金(61702528,61806212,62173336)

Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning

YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing   

  1. College of Intelligence Science and Technology,National University of Defense Technology,Changsha 410073,China
  • Received:2022-02-27 Revised:2022-03-22 Published:2022-08-02
  • About author:YUAN Wei-lin,born in 1994,Ph.D candidate.His main research interests include agent modelling,adversarial team game and multi-agent reinforcement learning.
    LU Li-na,born in 1984,Ph.D.Her main research interests include hierarchical multi-agent system,reinforcement lear-ning and complex network.
  • Supported by:
    National Natural Science Foundation of China(61702528,61806212,62173336).

摘要: 智能博弈对抗是人工智能认知决策领域亟待解决的前沿热点问题。以反事实后悔最小化算法为代表的博弈论方法和以虚拟自博弈算法为代表的强化学习方法,依托大规模算力支撑,在求解智能博弈策略中脱颖而出,但对两种范式之间的关联缺乏深入发掘。文中针对智能博弈对抗问题,定义智能博弈对抗的内涵与外延,梳理智能博弈对抗的发展历程,总结其中的关键挑战。从博弈论和强化学习两种视角出发,介绍智能博弈对抗模型、算法。多角度对比分析博弈理论和强化学习的优势与局限,归纳总结博弈理论与强化学习统一视角下的智能博弈对抗方法和策略求解框架,旨在为两种范式的结合提供方向,推动智能博弈技术前向发展,为迈向通用人工智能蓄力。

关键词: 反事实后悔值最小化, 纳什均衡, 强化学习, 虚拟自博弈, 智能博弈对抗

Abstract: Adversarial intelligent game is an advanced research in decision-making problem of intelligence cognitive.With the support of large computing power,game theory and reinforcement learning represented by counterfactual regret minimization and fictitious self-play respectively,are state-of-the-art approaches in searching strategies.However,the relationship between these two paradigms is not entirely explored.For adversarial intelligent game problems,this paper defines the connotation and extension of adversarial intelligent game,studies the development history of adversarial intelligent game,and summarizes the key challenges.From the perspectives of game theory and reinforcement learning,the models and algorithms of intelligent game are introduced.This paper conducts a comparative study from game theory and reinforcement learning,including the methods and framework,the main purpose is to promote the advance of intelligent game,and lay a foundation for the development of general artificial intelligence.

Key words: Adversarial intelligent game, Counterfactual regret minimization, Fictitious self-play, Nash equilibrium, Reinforcement learning

中图分类号: 

  • TP181
[1]HUANG K Q,XING J L,ZHANG J G,et al.Intelligent technologies of human-computer gaming[J].Scientia Sinica Informationis,2020,50(4):540-550.
[2]ANDREW J K.Operational decision making under uncertainty:Inferential,sequential,and adversarial approaches[R].Technical report,Air Force Institute of Technology Wright-Patterson AFB OH,2019.
[3]ARUNESH S,FEI F,BO A,et al.Stackelberg security games:Looking beyond a decade of success[C]//International Joint Conferences on Artificial Intelligence Organization.2018:5494-5501.
[4]WANG Z,YUAN Y,AN B,et al.An Overview of SecurityGames[J].Journal of Command and Control,2015,1(2):121-149.
[5]LI X,LI Q.Technical analysis of typical intelligent game system and development prospect of intelligent command and control system[J].Chinese Journal of Intelligent Science and Techno-logy,2020,2(1):36-42.
[6]HU X,RONG M.Where Do Operation Decision Support Sys-tems Go:Inspiration and Thought on Deep Green Plan[J].Journal of Command and Control,2016,2(1):22-25.
[7]LI T.Introduction and Inspiration to Military Intelligent ofAmerica[C]//The 5th Chinese Conference on Command and Control.Beijing,2017:94-98.
[8]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of Go without human knowledge[J].Nature,2017,550(7676):354-359.
[9]VINYALS O,BABUSCHKIN I,CZARNECKI W M,et al.Grandmaster level in StarCraft II using multi-agent reinforcement learning[J].Nature,2019,575(7782):350-354.
[10]ALAN B,ABDALLAH S.AI surpasses humans at six-playerpoker[J].Science,2019,365(6456):864-865.
[11]LI J,KOYAMADA S,YE Q,et al.Suphx:Mastering Mahjong with Deep Reinforcement Learning[EB/OL].(2020-04-01) [2022-03-21].https://arxiv.org/pdf/2003.13590.pdf.
[12]YE D,LIU Z,SUN M,et al.Mastering complex control in moba games with deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:6672-6679.
[13]YANG Y,WANG J.An overview of multi-agent reinforcement learning from game theoretical perspective[EB/OL].(2021-03-18) [2022-03-21].https://arxiv.org/pdf/2011.00583.pdf.
[14]LIU Q,ZHAI J,ZHANG Z,et al.A Survey on Deep Reinforcement Learning[J].Chinese Journal of Computers.2018,41(1):1-27.
[15]ZHAO D,SHAO K,ZHU Y,et al.Review of deep reinforcement learning and discussions on the development of computer Go[J].Control Theory and Applications,2016,33(6):701-717.
[16]ZINKEVICH M,JOHANSON M,BOWLING M,et al.Regret minimization in games with incomplete information[J].Advances in Neural Information Processing Systems,2007(20):1729-1736.
[17]HEINRICH J,LANCTOT M,SILVER D.Fictitious self-playin extensive-form games[C]//International Conference on Machine Learning.PMLR,2015:805-813.
[18]JADERBERG M,CZARNECKI W M,DUNNING I,et al.Human-level performance in 3D multiplayer games with population-based reinforcement learning[J].Science,2019,364(6443):859-865.
[19]SAMVELYAN M,RASHID T,DE W C S,et al.The starcraft multi-agent challenge[EB/OL].(2019-11-09) [2022-03-21].https://arxiv.org/pdf/1902.04043.pdf.
[20]ZHAO E,YAN R,LI J,et al..High-performance artificial intelligence for heads-up no-limit poker via end-to-end reinforcement learning[EB/OL].(2022-05-17) [2022-05-17].https://www.aaai.org/AAAI22Papers/AAAI-2268.ZhaoE.pdf.
[21]CHEN S,SU J,XIANG F.Artificial Intelligence and GameConfrontation[M]//Beijing:Science Press,2021.
[22]REILLY M B,LISA V W A.Beyond video games:New artificial intelligence beats tactical experts in combat simulation[EB/OL].(2016-01-27) [2022-03-21].https://magazine.uc.edu/editors_picks/recent_features/alpha.html.
[23]SILVER D,HUBERT T,SCHRITTWIESER J,et al.Mastering chess and shogi by self-play with a general reinforcement lear-ning algorithm[EB/OL].(2017-11-05)[2022-03-21].https://arxiv.org/pdf/1712.01815.pdf.
[24]MORAVRˇÍK M,SCHMID M,BURCH N,et al.Deepstack:Expert-level artificial intelligence in heads-up no-limit poker[J].Science,2017,356(6337):508-513.
[25]BROWN N,SANDHOLM T.Superhuman AI for heads-up no-limit poker:Libratus beats top professionals[J].Science,2018,359(6374):418-424.
[26]BAKER B,KANITSCHEIDER I,MARKOV T,et al.Emer-gent tool use from multi-agent autocurricula[EB/OL].(2020-02-11)[2022-03-21].https://arxiv.org/pdf/1909.07528.pdf.
[27]ZHA D,XIE J,MA W,et al.DouZero:Mastering DouDizhu with Self-Play Deep Reinforcement Learning[EB/OL].(2021-01-11)[2022-03-21].https://arxiv.org/pdf/2106.06135.pdf.
[28]THERESA H.Darpa’s alphadogfight tests AI pilot’s combat chops[EB/OL].(2020-08-18)[2022-03-21].https://brea-kingdefense.com/2020/08/darpas-alphadogfight-tests-ai-pilots-combat-chops/.
[29]MASTERS P,SARDINA S.Deceptive Path-Planning[C]//International Joint Conferences on Artificial Intelligence Organization.2017:4368-4375.
[30]BELL J B.Toward a theory of deception[J].InternationalJournal of Intelligence and Counterintelligence,2003,16(2):244-279.
[31]WHALEY B.Toward a general theory of deception[J].TheJournal of Strategic Studies,1982,5(1):178-192.
[32]BURCH N,SCHMID M,MORAVCIK M,et al.AIVAT:A new variance reduction technique for agent evaluation in imperfect information games[EB/OL].(2017-01-19) [2022-03-21].https://arxiv.org/pdf/1612.06915.pdf.
[33]MICHAEL.J.Measuring the size of large no-limit poker games[EB/OL].(2013-03-07)[2022-03-21].https://arxiv.org/pdf/1302.7008.pdf.
[34]SAM G,TUOMAS S.Potential-aware imperfect-recall abstraction with earth mover’s distance in imperfect-information games[C]//Twenty-Eighth AAAI Conference on Artificial Intelligence.2014:682-691.
[35]TUOMAS S.Abstraction for solving large incomplete-information games[C]//Twenty-Ninth AAAI Conference on Artificial Intelligence.2015:4127-4131.
[36]NEIL B.Time and space:Why imperfect information games are hard[D].Edmonton:University of Alberta,2018.
[37]YU N.Excessive gap technique in nonsmooth convex minimization[J].SIAM Journal on Optimization,2005,16(1):235-249.
[38]LI L J.Research on human-computer game decision-makingtechnology of wargame deduction[D].Beijing:Institute of Automation,Chinese Academy of Sciences,2020.
[39]KORZHYK D,YIN Z,KIEKINTVELD C,et al.Stackelberg vs.Nash in security games:An extended investigation of interchangeability,equivalence,and uniqueness[J].Journal of Artificial Intelligence Research,2011(41):297-327.
[40]CHEN X,DENG X T.Settling the complexity of 2-player Nash equilibrium[C]//Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science.2006:261-272.
[41]BROWN N.Equilibrium Finding for Large Adversarial Imperfect-Information Games[D].Pittsburgh:Carnegie Mellon University,2020.
[42]YUAN W,LIAO Z,GAO W,et al.A Survey on IntelligentGame of Computer Poker[J].Chinese Journal of Network and Information Security,2021,7(5):57-76.
[43]BOWLING M,VELOSO M.Rational and convergent learning in stochastic games[C]//International Joint Conference on Artificial Intelligence.Lawrence Erlbaum Associates Ltd,2001:1021-1026.
[44]EVERETT R,ROBERTS S.Learning against non-stationaryagents with opponent modelling and deep reinforcement lear-ning[C]//2018 AAAI Spring Symposium Series.2018.
[45]PAPOUDAKIS G,CHRISTIANOS F,RAHMAN A,et al.Dealing with non-stationarity in multi-agent deep reinforcement learning[EB/OL].(2019-01-11) [2022-03-21].https://arxiv.org/pdf/1906.04737.pdf.
[46]MAZUMDAR E V,JORDAN M I,SASTRY S S.On finding local nash equilibria(and only local nash equilibria) in zero-sum games[EB/OL].(2019-01-25) [2022-03-21].https://arxiv.org/pdf/1901.00838.pdf.
[47]JOHANSON M,BURCH N,VALENZANO R,et al.Evaluating state-space abstractions in extensive-form games[C]//Procee-dings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems.2013:271-278.
[48]KROER C,SANDHOLM T.A unified framework for exten-sive-form game abstraction with bounds[C]//AI3 workshop at International Joint Conference on Artificial Intelligence.2018.
[49]SANDHOLM T,SINGH S.Lossy stochastic game abstractionwith bounds[C]//Proceedings of the 13th ACM Conference on Electronic Commerce.2012:880-897.
[50]LU Y,YAN K.Algorithms in multi-agent systems:A holistic perspective from reinforcement learning and game theory[EB/OL].(2020-01-31)[2022-03-21].https://arxiv.org/pdf/2001.06487.pdf.
[51]NEYMAN A.Correlated equilibrium and potential games[J].International Journal of Game Theory,1997,26(2):223-227.
[52]MOULIN H,RAY I,GUPTA S S.Coarse correlated equilibria in an abatement game[R].Cardiff Economics Working Papers,2014.
[53]BRÜCKNER M,SCHEFFER T.Stackelberg games for adver-sarial prediction problems[C]//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2011:547-555.
[54]ZHANG Y,AN B.Computing team-maximin equilibria in zero-sum multiplayer extensive-form games[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:2318-2325.
[55]OMIDSHAFIEI S,PAPADIMITRIOU C,PILIOURAS G,et al.α-rank:Multi-agent evaluation by evolution[J].Scientific Reports,2019,9(1):1-29.
[56]LU L,GU X,ZHANG W,et al.Research on Learning Method Based on Hierarchical Decomposition[C]//2019 Chinese Automation Congress(CAC).IEEE,2019:5413-5418.
[57]CAO L.Key Technologies of Intelligent Game ConfrontationBased on Deep Reinforcement Learning[J].Command Information System and Technology,2019,10(5):1-7.
[58]KOVARˇÍK V,SCHMID M,BURCH N,et al.Rethinking formal models of partially observable multiagent decision making[EB/OL].(2020-10-26)[2022-05-17].https://arxiv.org/pdf/1906.11110.pdf.
[59]MARTIN S,MATEJ M,NEIL B,et al.Player of games[EB/OL].(2021-11-06)[2022-03-21].https://arxiv.org/pdf/2112.03178v1.pdf.
[60]JOHANSON M,WAUGH K,BOWLING M,et al.Accelerating best response calculation in large extensive games[C]//Twenty-second International Joint Conference on Artificial Intelligence.2011:258-265.
[61]JOHANSON M B.Robust strategies and counter-strategies:from superhuman to optimal play[D].Edmonton:University of Alberta,2016.
[62]PAPP D R.Dealing with imperfect information in poker[EB/OL].(1998-11-30) [2022-03-21].https://webdocs.cs.ualberta.ca/~jonathan/PREVIOUS/Grad/papp/thesis.html.
[63]BILLINGS D,BURCH N,DAVIDSON A,et al.Approxima-ting game-theoretic optimal strategies for full-scale poker[C]//IJCAI.2003:661.
[64]SCHNIZLEIN D.State translation in no-limit poker[D].Ed-monton:University of Alberta,2009.
[65]BROWN N,GANZFRIED S,SANDHOLM T.Tartanian7:achampion two-player no-limit texas hold’em poker-playing program[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2015:4270-4271.
[66]BROWN N,SANDHOLM T.Baby Tartanian8:Winning Agent from the 2016 Annual Computer Poker Competition[C]//IJCAI.2016:4238-4239.
[67]FARINA G,KROER C,SANDHOLM T.Regret circuits:Composability of regret minimizers[C]//International Confe-rence on Machine Learning.PMLR,2019:1863-1872.
[68]LANCTOT M,WAUGH K,ZINKEVICH M,et al.Monte Carlo Sampling for Regret Minimization in Extensive Games[C]//NIPS.2009:1078-1086.
[69]SCHMID M,BURCH N,LANCTOT M,et al.Variance reduction in monte carlo counterfactual regret minimization(VR-MCCFR) for extensive form games using baselines[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2019:2157-2164.
[70]LI H,HU K,GE Z,et al.Double neural counterfactual regret minimization[EB/OL].(2021-11-06) [2018-11-27].https://arxiv.org/pdf/1812.10607.pdf.
[71]JACKSON E G.Targeted CFR[C]//Workshops at the Thirty-first AAAI Conference on Artificial Intelligence.2017.
[72]BOWLING M,BURCH N,JOHANSON M,et al.Heads-uplimit hold’em poker is solved[J].Science,2015,347(6218):145-149.
[73]BROWN N,SANDHOLM T.Solving Imperfect-InformationGames via Discounted Regret Minimization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:1829-1836.
[74]BROWN N,LERER A,GROSS S,et al.Deep counterfactualregret minimization[C]//International Conference on Machine Learning.PMLR,2019:793-802.
[75]ZHOU Y,REN T,LI J,et al.Lazy-CFR:fast and near optimal regret minimization for extensive games with imperfect information[EB/OL].(2018-11-25) [2022-03-21].https://arxiv.org/pdf/1810.04433.pdf.
[76]LIU W,LI B,TOGELIUS J.Model-free Neural Counterfactual Regret Minimization with Bootstrap Learning[EB/OL].(2020-01-02) [2022-03-21].https://arxiv.org/pdf/2012.01870.pdf.
[77]LI H,WANG X,QI S,et al.Solving imperfect-informationgames via exponential counterfactual regret minimization[EB/OL].(2020-11-04)[2022-03-21].https://arxiv.org/pdf/2008.02679.pdf.
[78]STEINBERGER E.Single deep counterfactual regret minimization[EB/OL].(2019-10-04) [2022-03-21].https://arxiv.org/pdf/1901.07621.pdf.
[79]LANCTOT M.Monte Carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games[M].Edmonton:University of Alberta,2013.
[80]REIS J.A GPU implementation of Counterfactual Regret Minimization[D].Porto:University of Porto,2015.
[81]CELLI A,MARCHESI A,BIANCHI T,et al.Learning to correlate in multi-player general-sum sequential games[J].Advances in Neural Information Processing Systems,2019(32):13076-13086.
[82]TESAURO G.TD-Gammon,a self-teaching backgammon pro-gram,achieves master-level play[J].Neural Computation,1994,6(2):215-219.
[83]BERNER C,BROCKMAN G,CHAN B,et al.Dota 2 withlarge scale deep reinforcement learning[EB/OL].(2019-11-13) [2022-03-21].https://arxiv.org/pdf/1912.06680.pdf.
[84]BROWN G W.Iterative solution of games by fictitious play[J].Activity Analysis of Production and Allocation,1951,13(1):374-376.
[85]VAN D G B.A weakened form of fictitious play in two-person zero-sum games[J].International Game Theory Review,2000,2(4):307-328.
[86]LESLIE D S,COLLINS E J.Generalized weakened fictitiousplay[J].Games and Economic Behavior,2006,56(2):285-298.
[87]CHEN Y,ZHANG L,LI S,et al.Optimize Neural FictitiousSelf-Play in Regret Minimization Thinking[EB/OL].(2021-04-22) [2022-03-21].https://arxiv.org/pdf/2104.10845.pdf.
[88]HEINRICH J,SILVER D.Deep reinforcement learning fromself-play in imperfect-information games[EB/OL].(2016-01-28) [2022-03-21].https://arxiv.org/pdf/1603.01121.pdf.
[89]ZHANG L,WANG W,LI S,et al.Monte Carlo neural ficti-tious self-play:Approach to approximate Nash equilibrium of imperfect-information games[EB/OL].(2019-04-06) [2022-03-21].https://arxiv.org/pdf/1903.09569.pdf.
[90]HEINRICH J,SILVER D.Self-play monte-carlo tree search in computer poker[C]//Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence.2014:19-25.
[91]JIANG Q,LI K,DU B,et al.DeltaDou:Expert-level Doudizhu AI through Self-play[C]//IJCAI.2019:1265-1271.
[92]KASH I A,SULLINS M,HOFMANN K.Combining No-re-gret and Q-learning[C]//Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems.2020:593-601.
[93]LANCTOT M,LOCKHART E,LESPIAU J B,et al.Open-Spiel:A Framework for Reinforcement Learning in Games[EB/OL].(2020-09-26) [2022-03-21].https://arxiv.org/pdf/1908.09453.pdf.
[94]HENDERSON P,ISLAM R,BACHMAN P,et al.Deep reinforcement learning that matters[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018:3207-3214.
[95]HERNANDEZ-LEAL P,KARTAL B,TAYLOR M E.A survey and critique of multiagent deep reinforcement learning[J].Autonomous Agents and Multi-Agent Systems,2019,33(6):750-797.
[96]CZARNECKI W M,GIDEL G,TRACEY B,et al.Real world games look like spinning tops[EB/OL].(2020-01-17) [2022-03-21].https://arxiv.org/pdf/2004.09468.pdf.
[97]LUO J,ZHANG W,YUAN W,et al.Research on OpponentModeling Framework for Multi-agent Game Confrontation[EB/OL].(2019-11-13) [2022-02-16].http//kns.cnki.net/kcms/detail/11.3092.V.20210818.1041.007.html.
[98]FENG X,SLUMBERS O,YANG Y,et al.Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games[EB/OL].(2021-11-01) [2022-03-21].https://arxiv.org/pdf/2106.02745.pdf.
[99]PRETORIUS A,TESSERA K,SMIT A P,et al.Mava:a re-search framework for distributed multi-agent reinforcement learning[EB/OL].(2021-01-03) [2022-03-21].https://arxiv.org/pdf/2107.01460.pdf.
[100]LIANG E,LIAW R,NISHIHARA R,et al.RLlib:Abstractions for distributed reinforcement learning[C]//International Conference on Machine Learning.PMLR,2018:3053-3062.
[101]ZHOU M,WAN Z,WANG H,et al.MALib:A Parallel Framework for Population-based Multi-agent Reinforcement Learning[EB/OL].(2021-01-05) [2022-03-21].https://arxiv.org/pdf/2106.07551.pdf.
[102]NEU G,JONSSON A,GÓMEZ V.A unified view of entropy-regularized markov decision processes[EB/OL].(2017-05-22) [2022-03-21].https://arxiv.org/pdf/1705.07798.pdf.
[103]JIN C,ALLEN-ZHU Z,BUBECK S,et al.Is Q-learning pro-vably efficient?[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:4868-4878.
[104]GRUSLYS A,LANCTOT M,MUNOS R,et al.The advantage regret-matching actor-critic[EB/OL].(2020-08-27) [2022-03-21].https://arxiv.org/pdf/2008.12234.pdf.
[105]LI H,WANG X,JIA F,et al.RLCFR:Minimize counterfactual regret by deep reinforcement learning[EB/OL].(2020-08-27) [2022-03-21].https://arxiv.org/pdf/2009.06373.pdf.
[106]STEINBERGER E,LERER A,BROWN N.DREAM:Deep regret minimization with advantage baselines and model-free learning[EB/OL].(2020-11-29) [2022-03-21].https://arxiv.org/pdf/2006.10410.pdf.
[107]SRINIVASAN S,LANCTOT M,ZAMBALDI V F,et al.Actor-Critic Policy Optimization in Partially Observable Multiagent Environments[C]//NeurIPS.2018:3426-3439.
[108]LANCTOT M,ZAMBALDI V,GRUSLYS A,et al.A unified game-theoretic approach to multiagent reinforcement learning[EB/OL].(2017-11-07) [2022-03-21].https://arxiv.org/pdf/1711.00832.pdf.
[109]JAIN M,KORZHYK D,VANÓK O,et al.A double oracle algorithm for zero-sum security games on graphs[C]//The 10th International Conference on Autonomous Agents and Multiagent Systems.2011:327-334.
[110]BALDUZZI D,GARNELO M,BACHRACH Y,et al.Open-ended learning in symmetric zero-sum games[C]//International Conference on Machine Learning.PMLR,2019:434-443.
[111]MCALEER S,LANIER J,FOX R,et al.Pipeline psro:A scalable approach for finding approximate nash equilibria in large games[EB/OL].(2021-02-18) [2022-03-21].https://arxiv.org/pdf/2006.08555.pdf.
[112]PEREZ-NIEVES N,YANG Y,SLUMBERS O,et al.Modelling behavioural diversity for learning in open-ended games[C]//International Conference on Machine Learning.PMLR,2021:8514-8524.
[113]DINH L C,YANG Y,TIAN Z,et al.Online Double Oracle[EB/OL].(2021-03-16) [2022-03-21].https://arxiv.org/pdf/2103.07780.pdf.
[114]NGUYEN T H,SINHA A,HE H.Partial Adversarial Behavior Deception in Security Games[C]//Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence(IJCAI-PRICAI-20).2020:283-289.
[115]WU B,CUBUKTEPE M,BHARADWAJ S,et al.Reward-Based Deception with Cognitive Bias[C]//2019 IEEE 58th Conference on Decision and Control(CDC).IEEE,2019:2265-2270.
[116]WEN Y,YANG Y,LUO R,et al.Probabilistic recursive reaso-ning for multi-agent reinforcement learning[EB/OL].(2019-01-26) [2022-03-21].https://arxiv.org/pdf/1901.09207.pdf.
[117]DAI Z,CHEN Y,LOW B K H,et al.R2-B2:Recursive reaso-ning-based Bayesian optimization for no-regret learning in games[C]//International Conference on Machine Learning.PMLR,2020:2291-2301.
[118]ZINKEVICH M,GREENWALD A,LITTMAN M.Cyclic equilibria in Markov games[J].Advances in Neural Information Processing Systems,2006(18):1641-1649.
[119]ALBRECHT S V,STONE P.Autonomous agents modellingother agents:A comprehensive survey and open problems[J].Artificial Intelligence,2018(258):66-95.
[120]FOERSTER J N,CHEN R Y,AL-SHEDIVAT M,et al.Lear-ning with Opponent-Learning Awareness[C]//Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems(AMMAS’18).2018:122-130.
[121]SHEN M,HOW J P.Active perception in adversarial scenarios using maximum entropy deep reinforcement learning[C]//2019 International Conference on Robotics and Automation(ICRA).IEEE,2019:3384-3390.
[122]KEREN S,GAL A,KARPAS E.Goal recognition design-Survey [C]//Twenty-Ninth International Joint Conference on Artificial Intelligence.2020:4847-4853.
[123]LE G N,MOUADDIB A I,LEROUVREUR X,et al.A generative game-theoretic framework for adversarial plan recognition[C]//ICAPS Workshop on Distributed and Multi-Agent Planning.2015.
[124]FINN C,ABBEEL P,LEVINE S.Model-agnostic meta-learning for fast adaptation of deep networks[C]//International Confe-rence on Machine Learning.PMLR,2017:1126-1135.
[125]NAGABANDI A,CLAVERA I,LIU S,et al.Learning to adapt in dynamic,real-world environments through meta-reinforcement learning[EB/OL].(2019-02-27) [2022-03-21].https://arxiv.org/pdf/1803.11347.pdf.
[126]PAINE T L,COLMENAREJO S G,WANG Z,et al.One-shot high-fidelity imitation:Training large-scale deep nets with RL[EB/OL].(2018-10-11) [2022-03-21].https://arxiv.org/pdf/1810.05017.pdf.
[127]FINN C,RAJESWARAN A,KAKADE S,et al.Online meta-learning[C]//International Conference on Machine Learning.PMLR,2019:1920-1930.
[128]HSU K,LEVINE S,FINN C.Unsupervised learning via meta-learning[EB/OL].(2019-03-21)[2022-03-21].https://arxiv.org/pdf/1810.02334.pdf.
[129]PENG H M.Metal Learning:Methods and Applications[M].Beijing:Electronic Industry Press,2021:229-261.
[1] 刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波.
基于边缘智能的频谱地图构建与分发方法
Construction and Distribution Method of REM Based on Edge Intelligence
计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[2] 姜洋洋, 宋丽华, 邢长友, 张国敏, 曾庆伟.
蜜罐博弈中信念驱动的攻防策略优化机制
Belief Driven Attack and Defense Policy Optimization Mechanism in Honeypot Game
计算机科学, 2022, 49(9): 333-339. https://doi.org/10.11896/jsjkx.220400011
[3] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[4] 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军.
基于多智能体强化学习的端到端合作的自适应奖励方法
Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning
计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[5] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[6] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[7] 郭雨欣, 陈秀宏.
融合BERT词嵌入表示和主题信息增强的自动摘要模型
Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement
计算机科学, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101
[8] 范静宇, 刘全.
基于随机加权三重Q学习的异策略最大熵强化学习算法
Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning
计算机科学, 2022, 49(6): 335-341. https://doi.org/10.11896/jsjkx.210300081
[9] 谢万城, 李斌, 代玥玥.
空中智能反射面辅助边缘计算中基于PPO的任务卸载方案
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[10] 胥昊, 曹桂均, 闫璐, 李科, 王振宏.
面向铁路集装箱的高可靠低时延无线资源分配算法
Wireless Resource Allocation Algorithm with High Reliability and Low Delay for Railway Container
计算机科学, 2022, 49(6): 39-43. https://doi.org/10.11896/jsjkx.211200143
[11] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[12] 张佳能, 李辉, 吴昊霖, 王壮.
一种平衡探索和利用的优先经验回放方法
Exploration and Exploitation Balanced Experience Replay
计算机科学, 2022, 49(5): 179-185. https://doi.org/10.11896/jsjkx.210300084
[13] 李鹏, 易修文, 齐德康, 段哲文, 李天瑞.
一种基于深度学习的供热策略优化方法
Heating Strategy Optimization Method Based on Deep Learning
计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[14] 周琴, 罗飞, 丁炜超, 顾春华, 郑帅.
基于逐次超松弛技术的Double Speedy Q-Learning算法
Double Speedy Q-Learning Based on Successive Over Relaxation
计算机科学, 2022, 49(3): 239-245. https://doi.org/10.11896/jsjkx.201200173
[15] 李素, 宋宝燕, 李冬, 王俊陆.
面向金融活动的复合区块链关联事件溯源方法
Composite Blockchain Associated Event Tracing Method for Financial Activities
计算机科学, 2022, 49(3): 346-353. https://doi.org/10.11896/jsjkx.210700068
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!