Computer Science ›› 2022, Vol. 49 ›› Issue (8): 191-204.doi: 10.11896/jsjkx.220200174

• Artificial Intelligence • Previous Articles     Next Articles

Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning

YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing   

  1. College of Intelligence Science and Technology,National University of Defense Technology,Changsha 410073,China
  • Received:2022-02-27 Revised:2022-03-22 Published:2022-08-02
  • About author:YUAN Wei-lin,born in 1994,Ph.D candidate.His main research interests include agent modelling,adversarial team game and multi-agent reinforcement learning.
    LU Li-na,born in 1984,Ph.D.Her main research interests include hierarchical multi-agent system,reinforcement lear-ning and complex network.
  • Supported by:
    National Natural Science Foundation of China(61702528,61806212,62173336).

Abstract: Adversarial intelligent game is an advanced research in decision-making problem of intelligence cognitive.With the support of large computing power,game theory and reinforcement learning represented by counterfactual regret minimization and fictitious self-play respectively,are state-of-the-art approaches in searching strategies.However,the relationship between these two paradigms is not entirely explored.For adversarial intelligent game problems,this paper defines the connotation and extension of adversarial intelligent game,studies the development history of adversarial intelligent game,and summarizes the key challenges.From the perspectives of game theory and reinforcement learning,the models and algorithms of intelligent game are introduced.This paper conducts a comparative study from game theory and reinforcement learning,including the methods and framework,the main purpose is to promote the advance of intelligent game,and lay a foundation for the development of general artificial intelligence.

Key words: Adversarial intelligent game, Counterfactual regret minimization, Fictitious self-play, Nash equilibrium, Reinforcement learning

CLC Number: 

  • TP181
[1]HUANG K Q,XING J L,ZHANG J G,et al.Intelligent technologies of human-computer gaming[J].Scientia Sinica Informationis,2020,50(4):540-550.
[2]ANDREW J K.Operational decision making under uncertainty:Inferential,sequential,and adversarial approaches[R].Technical report,Air Force Institute of Technology Wright-Patterson AFB OH,2019.
[3]ARUNESH S,FEI F,BO A,et al.Stackelberg security games:Looking beyond a decade of success[C]//International Joint Conferences on Artificial Intelligence Organization.2018:5494-5501.
[4]WANG Z,YUAN Y,AN B,et al.An Overview of SecurityGames[J].Journal of Command and Control,2015,1(2):121-149.
[5]LI X,LI Q.Technical analysis of typical intelligent game system and development prospect of intelligent command and control system[J].Chinese Journal of Intelligent Science and Techno-logy,2020,2(1):36-42.
[6]HU X,RONG M.Where Do Operation Decision Support Sys-tems Go:Inspiration and Thought on Deep Green Plan[J].Journal of Command and Control,2016,2(1):22-25.
[7]LI T.Introduction and Inspiration to Military Intelligent ofAmerica[C]//The 5th Chinese Conference on Command and Control.Beijing,2017:94-98.
[8]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of Go without human knowledge[J].Nature,2017,550(7676):354-359.
[9]VINYALS O,BABUSCHKIN I,CZARNECKI W M,et al.Grandmaster level in StarCraft II using multi-agent reinforcement learning[J].Nature,2019,575(7782):350-354.
[10]ALAN B,ABDALLAH S.AI surpasses humans at six-playerpoker[J].Science,2019,365(6456):864-865.
[11]LI J,KOYAMADA S,YE Q,et al.Suphx:Mastering Mahjong with Deep Reinforcement Learning[EB/OL].(2020-04-01) [2022-03-21].https://arxiv.org/pdf/2003.13590.pdf.
[12]YE D,LIU Z,SUN M,et al.Mastering complex control in moba games with deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:6672-6679.
[13]YANG Y,WANG J.An overview of multi-agent reinforcement learning from game theoretical perspective[EB/OL].(2021-03-18) [2022-03-21].https://arxiv.org/pdf/2011.00583.pdf.
[14]LIU Q,ZHAI J,ZHANG Z,et al.A Survey on Deep Reinforcement Learning[J].Chinese Journal of Computers.2018,41(1):1-27.
[15]ZHAO D,SHAO K,ZHU Y,et al.Review of deep reinforcement learning and discussions on the development of computer Go[J].Control Theory and Applications,2016,33(6):701-717.
[16]ZINKEVICH M,JOHANSON M,BOWLING M,et al.Regret minimization in games with incomplete information[J].Advances in Neural Information Processing Systems,2007(20):1729-1736.
[17]HEINRICH J,LANCTOT M,SILVER D.Fictitious self-playin extensive-form games[C]//International Conference on Machine Learning.PMLR,2015:805-813.
[18]JADERBERG M,CZARNECKI W M,DUNNING I,et al.Human-level performance in 3D multiplayer games with population-based reinforcement learning[J].Science,2019,364(6443):859-865.
[19]SAMVELYAN M,RASHID T,DE W C S,et al.The starcraft multi-agent challenge[EB/OL].(2019-11-09) [2022-03-21].https://arxiv.org/pdf/1902.04043.pdf.
[20]ZHAO E,YAN R,LI J,et al..High-performance artificial intelligence for heads-up no-limit poker via end-to-end reinforcement learning[EB/OL].(2022-05-17) [2022-05-17].https://www.aaai.org/AAAI22Papers/AAAI-2268.ZhaoE.pdf.
[21]CHEN S,SU J,XIANG F.Artificial Intelligence and GameConfrontation[M]//Beijing:Science Press,2021.
[22]REILLY M B,LISA V W A.Beyond video games:New artificial intelligence beats tactical experts in combat simulation[EB/OL].(2016-01-27) [2022-03-21].https://magazine.uc.edu/editors_picks/recent_features/alpha.html.
[23]SILVER D,HUBERT T,SCHRITTWIESER J,et al.Mastering chess and shogi by self-play with a general reinforcement lear-ning algorithm[EB/OL].(2017-11-05)[2022-03-21].https://arxiv.org/pdf/1712.01815.pdf.
[24]MORAVRˇÍK M,SCHMID M,BURCH N,et al.Deepstack:Expert-level artificial intelligence in heads-up no-limit poker[J].Science,2017,356(6337):508-513.
[25]BROWN N,SANDHOLM T.Superhuman AI for heads-up no-limit poker:Libratus beats top professionals[J].Science,2018,359(6374):418-424.
[26]BAKER B,KANITSCHEIDER I,MARKOV T,et al.Emer-gent tool use from multi-agent autocurricula[EB/OL].(2020-02-11)[2022-03-21].https://arxiv.org/pdf/1909.07528.pdf.
[27]ZHA D,XIE J,MA W,et al.DouZero:Mastering DouDizhu with Self-Play Deep Reinforcement Learning[EB/OL].(2021-01-11)[2022-03-21].https://arxiv.org/pdf/2106.06135.pdf.
[28]THERESA H.Darpa’s alphadogfight tests AI pilot’s combat chops[EB/OL].(2020-08-18)[2022-03-21].https://brea-kingdefense.com/2020/08/darpas-alphadogfight-tests-ai-pilots-combat-chops/.
[29]MASTERS P,SARDINA S.Deceptive Path-Planning[C]//International Joint Conferences on Artificial Intelligence Organization.2017:4368-4375.
[30]BELL J B.Toward a theory of deception[J].InternationalJournal of Intelligence and Counterintelligence,2003,16(2):244-279.
[31]WHALEY B.Toward a general theory of deception[J].TheJournal of Strategic Studies,1982,5(1):178-192.
[32]BURCH N,SCHMID M,MORAVCIK M,et al.AIVAT:A new variance reduction technique for agent evaluation in imperfect information games[EB/OL].(2017-01-19) [2022-03-21].https://arxiv.org/pdf/1612.06915.pdf.
[33]MICHAEL.J.Measuring the size of large no-limit poker games[EB/OL].(2013-03-07)[2022-03-21].https://arxiv.org/pdf/1302.7008.pdf.
[34]SAM G,TUOMAS S.Potential-aware imperfect-recall abstraction with earth mover’s distance in imperfect-information games[C]//Twenty-Eighth AAAI Conference on Artificial Intelligence.2014:682-691.
[35]TUOMAS S.Abstraction for solving large incomplete-information games[C]//Twenty-Ninth AAAI Conference on Artificial Intelligence.2015:4127-4131.
[36]NEIL B.Time and space:Why imperfect information games are hard[D].Edmonton:University of Alberta,2018.
[37]YU N.Excessive gap technique in nonsmooth convex minimization[J].SIAM Journal on Optimization,2005,16(1):235-249.
[38]LI L J.Research on human-computer game decision-makingtechnology of wargame deduction[D].Beijing:Institute of Automation,Chinese Academy of Sciences,2020.
[39]KORZHYK D,YIN Z,KIEKINTVELD C,et al.Stackelberg vs.Nash in security games:An extended investigation of interchangeability,equivalence,and uniqueness[J].Journal of Artificial Intelligence Research,2011(41):297-327.
[40]CHEN X,DENG X T.Settling the complexity of 2-player Nash equilibrium[C]//Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science.2006:261-272.
[41]BROWN N.Equilibrium Finding for Large Adversarial Imperfect-Information Games[D].Pittsburgh:Carnegie Mellon University,2020.
[42]YUAN W,LIAO Z,GAO W,et al.A Survey on IntelligentGame of Computer Poker[J].Chinese Journal of Network and Information Security,2021,7(5):57-76.
[43]BOWLING M,VELOSO M.Rational and convergent learning in stochastic games[C]//International Joint Conference on Artificial Intelligence.Lawrence Erlbaum Associates Ltd,2001:1021-1026.
[44]EVERETT R,ROBERTS S.Learning against non-stationaryagents with opponent modelling and deep reinforcement lear-ning[C]//2018 AAAI Spring Symposium Series.2018.
[45]PAPOUDAKIS G,CHRISTIANOS F,RAHMAN A,et al.Dealing with non-stationarity in multi-agent deep reinforcement learning[EB/OL].(2019-01-11) [2022-03-21].https://arxiv.org/pdf/1906.04737.pdf.
[46]MAZUMDAR E V,JORDAN M I,SASTRY S S.On finding local nash equilibria(and only local nash equilibria) in zero-sum games[EB/OL].(2019-01-25) [2022-03-21].https://arxiv.org/pdf/1901.00838.pdf.
[47]JOHANSON M,BURCH N,VALENZANO R,et al.Evaluating state-space abstractions in extensive-form games[C]//Procee-dings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems.2013:271-278.
[48]KROER C,SANDHOLM T.A unified framework for exten-sive-form game abstraction with bounds[C]//AI3 workshop at International Joint Conference on Artificial Intelligence.2018.
[49]SANDHOLM T,SINGH S.Lossy stochastic game abstractionwith bounds[C]//Proceedings of the 13th ACM Conference on Electronic Commerce.2012:880-897.
[50]LU Y,YAN K.Algorithms in multi-agent systems:A holistic perspective from reinforcement learning and game theory[EB/OL].(2020-01-31)[2022-03-21].https://arxiv.org/pdf/2001.06487.pdf.
[51]NEYMAN A.Correlated equilibrium and potential games[J].International Journal of Game Theory,1997,26(2):223-227.
[52]MOULIN H,RAY I,GUPTA S S.Coarse correlated equilibria in an abatement game[R].Cardiff Economics Working Papers,2014.
[53]BRÜCKNER M,SCHEFFER T.Stackelberg games for adver-sarial prediction problems[C]//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2011:547-555.
[54]ZHANG Y,AN B.Computing team-maximin equilibria in zero-sum multiplayer extensive-form games[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:2318-2325.
[55]OMIDSHAFIEI S,PAPADIMITRIOU C,PILIOURAS G,et al.α-rank:Multi-agent evaluation by evolution[J].Scientific Reports,2019,9(1):1-29.
[56]LU L,GU X,ZHANG W,et al.Research on Learning Method Based on Hierarchical Decomposition[C]//2019 Chinese Automation Congress(CAC).IEEE,2019:5413-5418.
[57]CAO L.Key Technologies of Intelligent Game ConfrontationBased on Deep Reinforcement Learning[J].Command Information System and Technology,2019,10(5):1-7.
[58]KOVARˇÍK V,SCHMID M,BURCH N,et al.Rethinking formal models of partially observable multiagent decision making[EB/OL].(2020-10-26)[2022-05-17].https://arxiv.org/pdf/1906.11110.pdf.
[59]MARTIN S,MATEJ M,NEIL B,et al.Player of games[EB/OL].(2021-11-06)[2022-03-21].https://arxiv.org/pdf/2112.03178v1.pdf.
[60]JOHANSON M,WAUGH K,BOWLING M,et al.Accelerating best response calculation in large extensive games[C]//Twenty-second International Joint Conference on Artificial Intelligence.2011:258-265.
[61]JOHANSON M B.Robust strategies and counter-strategies:from superhuman to optimal play[D].Edmonton:University of Alberta,2016.
[62]PAPP D R.Dealing with imperfect information in poker[EB/OL].(1998-11-30) [2022-03-21].https://webdocs.cs.ualberta.ca/~jonathan/PREVIOUS/Grad/papp/thesis.html.
[63]BILLINGS D,BURCH N,DAVIDSON A,et al.Approxima-ting game-theoretic optimal strategies for full-scale poker[C]//IJCAI.2003:661.
[64]SCHNIZLEIN D.State translation in no-limit poker[D].Ed-monton:University of Alberta,2009.
[65]BROWN N,GANZFRIED S,SANDHOLM T.Tartanian7:achampion two-player no-limit texas hold’em poker-playing program[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2015:4270-4271.
[66]BROWN N,SANDHOLM T.Baby Tartanian8:Winning Agent from the 2016 Annual Computer Poker Competition[C]//IJCAI.2016:4238-4239.
[67]FARINA G,KROER C,SANDHOLM T.Regret circuits:Composability of regret minimizers[C]//International Confe-rence on Machine Learning.PMLR,2019:1863-1872.
[68]LANCTOT M,WAUGH K,ZINKEVICH M,et al.Monte Carlo Sampling for Regret Minimization in Extensive Games[C]//NIPS.2009:1078-1086.
[69]SCHMID M,BURCH N,LANCTOT M,et al.Variance reduction in monte carlo counterfactual regret minimization(VR-MCCFR) for extensive form games using baselines[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2019:2157-2164.
[70]LI H,HU K,GE Z,et al.Double neural counterfactual regret minimization[EB/OL].(2021-11-06) [2018-11-27].https://arxiv.org/pdf/1812.10607.pdf.
[71]JACKSON E G.Targeted CFR[C]//Workshops at the Thirty-first AAAI Conference on Artificial Intelligence.2017.
[72]BOWLING M,BURCH N,JOHANSON M,et al.Heads-uplimit hold’em poker is solved[J].Science,2015,347(6218):145-149.
[73]BROWN N,SANDHOLM T.Solving Imperfect-InformationGames via Discounted Regret Minimization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:1829-1836.
[74]BROWN N,LERER A,GROSS S,et al.Deep counterfactualregret minimization[C]//International Conference on Machine Learning.PMLR,2019:793-802.
[75]ZHOU Y,REN T,LI J,et al.Lazy-CFR:fast and near optimal regret minimization for extensive games with imperfect information[EB/OL].(2018-11-25) [2022-03-21].https://arxiv.org/pdf/1810.04433.pdf.
[76]LIU W,LI B,TOGELIUS J.Model-free Neural Counterfactual Regret Minimization with Bootstrap Learning[EB/OL].(2020-01-02) [2022-03-21].https://arxiv.org/pdf/2012.01870.pdf.
[77]LI H,WANG X,QI S,et al.Solving imperfect-informationgames via exponential counterfactual regret minimization[EB/OL].(2020-11-04)[2022-03-21].https://arxiv.org/pdf/2008.02679.pdf.
[78]STEINBERGER E.Single deep counterfactual regret minimization[EB/OL].(2019-10-04) [2022-03-21].https://arxiv.org/pdf/1901.07621.pdf.
[79]LANCTOT M.Monte Carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games[M].Edmonton:University of Alberta,2013.
[80]REIS J.A GPU implementation of Counterfactual Regret Minimization[D].Porto:University of Porto,2015.
[81]CELLI A,MARCHESI A,BIANCHI T,et al.Learning to correlate in multi-player general-sum sequential games[J].Advances in Neural Information Processing Systems,2019(32):13076-13086.
[82]TESAURO G.TD-Gammon,a self-teaching backgammon pro-gram,achieves master-level play[J].Neural Computation,1994,6(2):215-219.
[83]BERNER C,BROCKMAN G,CHAN B,et al.Dota 2 withlarge scale deep reinforcement learning[EB/OL].(2019-11-13) [2022-03-21].https://arxiv.org/pdf/1912.06680.pdf.
[84]BROWN G W.Iterative solution of games by fictitious play[J].Activity Analysis of Production and Allocation,1951,13(1):374-376.
[85]VAN D G B.A weakened form of fictitious play in two-person zero-sum games[J].International Game Theory Review,2000,2(4):307-328.
[86]LESLIE D S,COLLINS E J.Generalized weakened fictitiousplay[J].Games and Economic Behavior,2006,56(2):285-298.
[87]CHEN Y,ZHANG L,LI S,et al.Optimize Neural FictitiousSelf-Play in Regret Minimization Thinking[EB/OL].(2021-04-22) [2022-03-21].https://arxiv.org/pdf/2104.10845.pdf.
[88]HEINRICH J,SILVER D.Deep reinforcement learning fromself-play in imperfect-information games[EB/OL].(2016-01-28) [2022-03-21].https://arxiv.org/pdf/1603.01121.pdf.
[89]ZHANG L,WANG W,LI S,et al.Monte Carlo neural ficti-tious self-play:Approach to approximate Nash equilibrium of imperfect-information games[EB/OL].(2019-04-06) [2022-03-21].https://arxiv.org/pdf/1903.09569.pdf.
[90]HEINRICH J,SILVER D.Self-play monte-carlo tree search in computer poker[C]//Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence.2014:19-25.
[91]JIANG Q,LI K,DU B,et al.DeltaDou:Expert-level Doudizhu AI through Self-play[C]//IJCAI.2019:1265-1271.
[92]KASH I A,SULLINS M,HOFMANN K.Combining No-re-gret and Q-learning[C]//Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems.2020:593-601.
[93]LANCTOT M,LOCKHART E,LESPIAU J B,et al.Open-Spiel:A Framework for Reinforcement Learning in Games[EB/OL].(2020-09-26) [2022-03-21].https://arxiv.org/pdf/1908.09453.pdf.
[94]HENDERSON P,ISLAM R,BACHMAN P,et al.Deep reinforcement learning that matters[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018:3207-3214.
[95]HERNANDEZ-LEAL P,KARTAL B,TAYLOR M E.A survey and critique of multiagent deep reinforcement learning[J].Autonomous Agents and Multi-Agent Systems,2019,33(6):750-797.
[96]CZARNECKI W M,GIDEL G,TRACEY B,et al.Real world games look like spinning tops[EB/OL].(2020-01-17) [2022-03-21].https://arxiv.org/pdf/2004.09468.pdf.
[97]LUO J,ZHANG W,YUAN W,et al.Research on OpponentModeling Framework for Multi-agent Game Confrontation[EB/OL].(2019-11-13) [2022-02-16].http//kns.cnki.net/kcms/detail/11.3092.V.20210818.1041.007.html.
[98]FENG X,SLUMBERS O,YANG Y,et al.Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games[EB/OL].(2021-11-01) [2022-03-21].https://arxiv.org/pdf/2106.02745.pdf.
[99]PRETORIUS A,TESSERA K,SMIT A P,et al.Mava:a re-search framework for distributed multi-agent reinforcement learning[EB/OL].(2021-01-03) [2022-03-21].https://arxiv.org/pdf/2107.01460.pdf.
[100]LIANG E,LIAW R,NISHIHARA R,et al.RLlib:Abstractions for distributed reinforcement learning[C]//International Conference on Machine Learning.PMLR,2018:3053-3062.
[101]ZHOU M,WAN Z,WANG H,et al.MALib:A Parallel Framework for Population-based Multi-agent Reinforcement Learning[EB/OL].(2021-01-05) [2022-03-21].https://arxiv.org/pdf/2106.07551.pdf.
[102]NEU G,JONSSON A,GÓMEZ V.A unified view of entropy-regularized markov decision processes[EB/OL].(2017-05-22) [2022-03-21].https://arxiv.org/pdf/1705.07798.pdf.
[103]JIN C,ALLEN-ZHU Z,BUBECK S,et al.Is Q-learning pro-vably efficient?[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:4868-4878.
[104]GRUSLYS A,LANCTOT M,MUNOS R,et al.The advantage regret-matching actor-critic[EB/OL].(2020-08-27) [2022-03-21].https://arxiv.org/pdf/2008.12234.pdf.
[105]LI H,WANG X,JIA F,et al.RLCFR:Minimize counterfactual regret by deep reinforcement learning[EB/OL].(2020-08-27) [2022-03-21].https://arxiv.org/pdf/2009.06373.pdf.
[106]STEINBERGER E,LERER A,BROWN N.DREAM:Deep regret minimization with advantage baselines and model-free learning[EB/OL].(2020-11-29) [2022-03-21].https://arxiv.org/pdf/2006.10410.pdf.
[107]SRINIVASAN S,LANCTOT M,ZAMBALDI V F,et al.Actor-Critic Policy Optimization in Partially Observable Multiagent Environments[C]//NeurIPS.2018:3426-3439.
[108]LANCTOT M,ZAMBALDI V,GRUSLYS A,et al.A unified game-theoretic approach to multiagent reinforcement learning[EB/OL].(2017-11-07) [2022-03-21].https://arxiv.org/pdf/1711.00832.pdf.
[109]JAIN M,KORZHYK D,VANÓK O,et al.A double oracle algorithm for zero-sum security games on graphs[C]//The 10th International Conference on Autonomous Agents and Multiagent Systems.2011:327-334.
[110]BALDUZZI D,GARNELO M,BACHRACH Y,et al.Open-ended learning in symmetric zero-sum games[C]//International Conference on Machine Learning.PMLR,2019:434-443.
[111]MCALEER S,LANIER J,FOX R,et al.Pipeline psro:A scalable approach for finding approximate nash equilibria in large games[EB/OL].(2021-02-18) [2022-03-21].https://arxiv.org/pdf/2006.08555.pdf.
[112]PEREZ-NIEVES N,YANG Y,SLUMBERS O,et al.Modelling behavioural diversity for learning in open-ended games[C]//International Conference on Machine Learning.PMLR,2021:8514-8524.
[113]DINH L C,YANG Y,TIAN Z,et al.Online Double Oracle[EB/OL].(2021-03-16) [2022-03-21].https://arxiv.org/pdf/2103.07780.pdf.
[114]NGUYEN T H,SINHA A,HE H.Partial Adversarial Behavior Deception in Security Games[C]//Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence(IJCAI-PRICAI-20).2020:283-289.
[115]WU B,CUBUKTEPE M,BHARADWAJ S,et al.Reward-Based Deception with Cognitive Bias[C]//2019 IEEE 58th Conference on Decision and Control(CDC).IEEE,2019:2265-2270.
[116]WEN Y,YANG Y,LUO R,et al.Probabilistic recursive reaso-ning for multi-agent reinforcement learning[EB/OL].(2019-01-26) [2022-03-21].https://arxiv.org/pdf/1901.09207.pdf.
[117]DAI Z,CHEN Y,LOW B K H,et al.R2-B2:Recursive reaso-ning-based Bayesian optimization for no-regret learning in games[C]//International Conference on Machine Learning.PMLR,2020:2291-2301.
[118]ZINKEVICH M,GREENWALD A,LITTMAN M.Cyclic equilibria in Markov games[J].Advances in Neural Information Processing Systems,2006(18):1641-1649.
[119]ALBRECHT S V,STONE P.Autonomous agents modellingother agents:A comprehensive survey and open problems[J].Artificial Intelligence,2018(258):66-95.
[120]FOERSTER J N,CHEN R Y,AL-SHEDIVAT M,et al.Lear-ning with Opponent-Learning Awareness[C]//Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems(AMMAS’18).2018:122-130.
[121]SHEN M,HOW J P.Active perception in adversarial scenarios using maximum entropy deep reinforcement learning[C]//2019 International Conference on Robotics and Automation(ICRA).IEEE,2019:3384-3390.
[122]KEREN S,GAL A,KARPAS E.Goal recognition design-Survey [C]//Twenty-Ninth International Joint Conference on Artificial Intelligence.2020:4847-4853.
[123]LE G N,MOUADDIB A I,LEROUVREUR X,et al.A generative game-theoretic framework for adversarial plan recognition[C]//ICAPS Workshop on Distributed and Multi-Agent Planning.2015.
[124]FINN C,ABBEEL P,LEVINE S.Model-agnostic meta-learning for fast adaptation of deep networks[C]//International Confe-rence on Machine Learning.PMLR,2017:1126-1135.
[125]NAGABANDI A,CLAVERA I,LIU S,et al.Learning to adapt in dynamic,real-world environments through meta-reinforcement learning[EB/OL].(2019-02-27) [2022-03-21].https://arxiv.org/pdf/1803.11347.pdf.
[126]PAINE T L,COLMENAREJO S G,WANG Z,et al.One-shot high-fidelity imitation:Training large-scale deep nets with RL[EB/OL].(2018-10-11) [2022-03-21].https://arxiv.org/pdf/1810.05017.pdf.
[127]FINN C,RAJESWARAN A,KAKADE S,et al.Online meta-learning[C]//International Conference on Machine Learning.PMLR,2019:1920-1930.
[128]HSU K,LEVINE S,FINN C.Unsupervised learning via meta-learning[EB/OL].(2019-03-21)[2022-03-21].https://arxiv.org/pdf/1810.02334.pdf.
[129]PENG H M.Metal Learning:Methods and Applications[M].Beijing:Electronic Industry Press,2021:229-261.
[1] LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241.
[2] JIANG Yang-yang, SONG Li-hua, XING Chang-you, ZHANG Guo-min, ZENG Qing-wei. Belief Driven Attack and Defense Policy Optimization Mechanism in Honeypot Game [J]. Computer Science, 2022, 49(9): 333-339.
[3] SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256.
[4] YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253.
[5] LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279.
[6] XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11.
[7] XU Hao, CAO Gui-jun, YAN Lu, LI Ke, WANG Zhen-hong. Wireless Resource Allocation Algorithm with High Reliability and Low Delay for Railway Container [J]. Computer Science, 2022, 49(6): 39-43.
[8] HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157.
[9] GUO Yu-xin, CHEN Xiu-hong. Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement [J]. Computer Science, 2022, 49(6): 313-318.
[10] FAN Jing-yu, LIU Quan. Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning [J]. Computer Science, 2022, 49(6): 335-341.
[11] ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185.
[12] LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268.
[13] OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51.
[14] ZHOU Qin, LUO Fei, DING Wei-chao, GU Chun-hua, ZHENG Shuai. Double Speedy Q-Learning Based on Successive Over Relaxation [J]. Computer Science, 2022, 49(3): 239-245.
[15] LI Su, SONG Bao-yan, LI Dong, WANG Jun-lu. Composite Blockchain Associated Event Tracing Method for Financial Activities [J]. Computer Science, 2022, 49(3): 346-353.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!