Computer Science ›› 2022, Vol. 49 ›› Issue (8): 191-204.doi: 10.11896/jsjkx.220200174
• Artificial Intelligence • Previous Articles Next Articles
YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing
CLC Number:
[1]HUANG K Q,XING J L,ZHANG J G,et al.Intelligent technologies of human-computer gaming[J].Scientia Sinica Informationis,2020,50(4):540-550. [2]ANDREW J K.Operational decision making under uncertainty:Inferential,sequential,and adversarial approaches[R].Technical report,Air Force Institute of Technology Wright-Patterson AFB OH,2019. [3]ARUNESH S,FEI F,BO A,et al.Stackelberg security games:Looking beyond a decade of success[C]//International Joint Conferences on Artificial Intelligence Organization.2018:5494-5501. [4]WANG Z,YUAN Y,AN B,et al.An Overview of SecurityGames[J].Journal of Command and Control,2015,1(2):121-149. [5]LI X,LI Q.Technical analysis of typical intelligent game system and development prospect of intelligent command and control system[J].Chinese Journal of Intelligent Science and Techno-logy,2020,2(1):36-42. [6]HU X,RONG M.Where Do Operation Decision Support Sys-tems Go:Inspiration and Thought on Deep Green Plan[J].Journal of Command and Control,2016,2(1):22-25. [7]LI T.Introduction and Inspiration to Military Intelligent ofAmerica[C]//The 5th Chinese Conference on Command and Control.Beijing,2017:94-98. [8]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of Go without human knowledge[J].Nature,2017,550(7676):354-359. [9]VINYALS O,BABUSCHKIN I,CZARNECKI W M,et al.Grandmaster level in StarCraft II using multi-agent reinforcement learning[J].Nature,2019,575(7782):350-354. [10]ALAN B,ABDALLAH S.AI surpasses humans at six-playerpoker[J].Science,2019,365(6456):864-865. [11]LI J,KOYAMADA S,YE Q,et al.Suphx:Mastering Mahjong with Deep Reinforcement Learning[EB/OL].(2020-04-01) [2022-03-21].https://arxiv.org/pdf/2003.13590.pdf. [12]YE D,LIU Z,SUN M,et al.Mastering complex control in moba games with deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:6672-6679. [13]YANG Y,WANG J.An overview of multi-agent reinforcement learning from game theoretical perspective[EB/OL].(2021-03-18) [2022-03-21].https://arxiv.org/pdf/2011.00583.pdf. [14]LIU Q,ZHAI J,ZHANG Z,et al.A Survey on Deep Reinforcement Learning[J].Chinese Journal of Computers.2018,41(1):1-27. [15]ZHAO D,SHAO K,ZHU Y,et al.Review of deep reinforcement learning and discussions on the development of computer Go[J].Control Theory and Applications,2016,33(6):701-717. [16]ZINKEVICH M,JOHANSON M,BOWLING M,et al.Regret minimization in games with incomplete information[J].Advances in Neural Information Processing Systems,2007(20):1729-1736. [17]HEINRICH J,LANCTOT M,SILVER D.Fictitious self-playin extensive-form games[C]//International Conference on Machine Learning.PMLR,2015:805-813. [18]JADERBERG M,CZARNECKI W M,DUNNING I,et al.Human-level performance in 3D multiplayer games with population-based reinforcement learning[J].Science,2019,364(6443):859-865. [19]SAMVELYAN M,RASHID T,DE W C S,et al.The starcraft multi-agent challenge[EB/OL].(2019-11-09) [2022-03-21].https://arxiv.org/pdf/1902.04043.pdf. [20]ZHAO E,YAN R,LI J,et al..High-performance artificial intelligence for heads-up no-limit poker via end-to-end reinforcement learning[EB/OL].(2022-05-17) [2022-05-17].https://www.aaai.org/AAAI22Papers/AAAI-2268.ZhaoE.pdf. [21]CHEN S,SU J,XIANG F.Artificial Intelligence and GameConfrontation[M]//Beijing:Science Press,2021. [22]REILLY M B,LISA V W A.Beyond video games:New artificial intelligence beats tactical experts in combat simulation[EB/OL].(2016-01-27) [2022-03-21].https://magazine.uc.edu/editors_picks/recent_features/alpha.html. [23]SILVER D,HUBERT T,SCHRITTWIESER J,et al.Mastering chess and shogi by self-play with a general reinforcement lear-ning algorithm[EB/OL].(2017-11-05)[2022-03-21].https://arxiv.org/pdf/1712.01815.pdf. [24]MORAVRˇÍK M,SCHMID M,BURCH N,et al.Deepstack:Expert-level artificial intelligence in heads-up no-limit poker[J].Science,2017,356(6337):508-513. [25]BROWN N,SANDHOLM T.Superhuman AI for heads-up no-limit poker:Libratus beats top professionals[J].Science,2018,359(6374):418-424. [26]BAKER B,KANITSCHEIDER I,MARKOV T,et al.Emer-gent tool use from multi-agent autocurricula[EB/OL].(2020-02-11)[2022-03-21].https://arxiv.org/pdf/1909.07528.pdf. [27]ZHA D,XIE J,MA W,et al.DouZero:Mastering DouDizhu with Self-Play Deep Reinforcement Learning[EB/OL].(2021-01-11)[2022-03-21].https://arxiv.org/pdf/2106.06135.pdf. [28]THERESA H.Darpa’s alphadogfight tests AI pilot’s combat chops[EB/OL].(2020-08-18)[2022-03-21].https://brea-kingdefense.com/2020/08/darpas-alphadogfight-tests-ai-pilots-combat-chops/. [29]MASTERS P,SARDINA S.Deceptive Path-Planning[C]//International Joint Conferences on Artificial Intelligence Organization.2017:4368-4375. [30]BELL J B.Toward a theory of deception[J].InternationalJournal of Intelligence and Counterintelligence,2003,16(2):244-279. [31]WHALEY B.Toward a general theory of deception[J].TheJournal of Strategic Studies,1982,5(1):178-192. [32]BURCH N,SCHMID M,MORAVCIK M,et al.AIVAT:A new variance reduction technique for agent evaluation in imperfect information games[EB/OL].(2017-01-19) [2022-03-21].https://arxiv.org/pdf/1612.06915.pdf. [33]MICHAEL.J.Measuring the size of large no-limit poker games[EB/OL].(2013-03-07)[2022-03-21].https://arxiv.org/pdf/1302.7008.pdf. [34]SAM G,TUOMAS S.Potential-aware imperfect-recall abstraction with earth mover’s distance in imperfect-information games[C]//Twenty-Eighth AAAI Conference on Artificial Intelligence.2014:682-691. [35]TUOMAS S.Abstraction for solving large incomplete-information games[C]//Twenty-Ninth AAAI Conference on Artificial Intelligence.2015:4127-4131. [36]NEIL B.Time and space:Why imperfect information games are hard[D].Edmonton:University of Alberta,2018. [37]YU N.Excessive gap technique in nonsmooth convex minimization[J].SIAM Journal on Optimization,2005,16(1):235-249. [38]LI L J.Research on human-computer game decision-makingtechnology of wargame deduction[D].Beijing:Institute of Automation,Chinese Academy of Sciences,2020. [39]KORZHYK D,YIN Z,KIEKINTVELD C,et al.Stackelberg vs.Nash in security games:An extended investigation of interchangeability,equivalence,and uniqueness[J].Journal of Artificial Intelligence Research,2011(41):297-327. [40]CHEN X,DENG X T.Settling the complexity of 2-player Nash equilibrium[C]//Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science.2006:261-272. [41]BROWN N.Equilibrium Finding for Large Adversarial Imperfect-Information Games[D].Pittsburgh:Carnegie Mellon University,2020. [42]YUAN W,LIAO Z,GAO W,et al.A Survey on IntelligentGame of Computer Poker[J].Chinese Journal of Network and Information Security,2021,7(5):57-76. [43]BOWLING M,VELOSO M.Rational and convergent learning in stochastic games[C]//International Joint Conference on Artificial Intelligence.Lawrence Erlbaum Associates Ltd,2001:1021-1026. [44]EVERETT R,ROBERTS S.Learning against non-stationaryagents with opponent modelling and deep reinforcement lear-ning[C]//2018 AAAI Spring Symposium Series.2018. [45]PAPOUDAKIS G,CHRISTIANOS F,RAHMAN A,et al.Dealing with non-stationarity in multi-agent deep reinforcement learning[EB/OL].(2019-01-11) [2022-03-21].https://arxiv.org/pdf/1906.04737.pdf. [46]MAZUMDAR E V,JORDAN M I,SASTRY S S.On finding local nash equilibria(and only local nash equilibria) in zero-sum games[EB/OL].(2019-01-25) [2022-03-21].https://arxiv.org/pdf/1901.00838.pdf. [47]JOHANSON M,BURCH N,VALENZANO R,et al.Evaluating state-space abstractions in extensive-form games[C]//Procee-dings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems.2013:271-278. [48]KROER C,SANDHOLM T.A unified framework for exten-sive-form game abstraction with bounds[C]//AI3 workshop at International Joint Conference on Artificial Intelligence.2018. [49]SANDHOLM T,SINGH S.Lossy stochastic game abstractionwith bounds[C]//Proceedings of the 13th ACM Conference on Electronic Commerce.2012:880-897. [50]LU Y,YAN K.Algorithms in multi-agent systems:A holistic perspective from reinforcement learning and game theory[EB/OL].(2020-01-31)[2022-03-21].https://arxiv.org/pdf/2001.06487.pdf. [51]NEYMAN A.Correlated equilibrium and potential games[J].International Journal of Game Theory,1997,26(2):223-227. [52]MOULIN H,RAY I,GUPTA S S.Coarse correlated equilibria in an abatement game[R].Cardiff Economics Working Papers,2014. [53]BRÜCKNER M,SCHEFFER T.Stackelberg games for adver-sarial prediction problems[C]//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2011:547-555. [54]ZHANG Y,AN B.Computing team-maximin equilibria in zero-sum multiplayer extensive-form games[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:2318-2325. [55]OMIDSHAFIEI S,PAPADIMITRIOU C,PILIOURAS G,et al.α-rank:Multi-agent evaluation by evolution[J].Scientific Reports,2019,9(1):1-29. [56]LU L,GU X,ZHANG W,et al.Research on Learning Method Based on Hierarchical Decomposition[C]//2019 Chinese Automation Congress(CAC).IEEE,2019:5413-5418. [57]CAO L.Key Technologies of Intelligent Game ConfrontationBased on Deep Reinforcement Learning[J].Command Information System and Technology,2019,10(5):1-7. [58]KOVARˇÍK V,SCHMID M,BURCH N,et al.Rethinking formal models of partially observable multiagent decision making[EB/OL].(2020-10-26)[2022-05-17].https://arxiv.org/pdf/1906.11110.pdf. [59]MARTIN S,MATEJ M,NEIL B,et al.Player of games[EB/OL].(2021-11-06)[2022-03-21].https://arxiv.org/pdf/2112.03178v1.pdf. [60]JOHANSON M,WAUGH K,BOWLING M,et al.Accelerating best response calculation in large extensive games[C]//Twenty-second International Joint Conference on Artificial Intelligence.2011:258-265. [61]JOHANSON M B.Robust strategies and counter-strategies:from superhuman to optimal play[D].Edmonton:University of Alberta,2016. [62]PAPP D R.Dealing with imperfect information in poker[EB/OL].(1998-11-30) [2022-03-21].https://webdocs.cs.ualberta.ca/~jonathan/PREVIOUS/Grad/papp/thesis.html. [63]BILLINGS D,BURCH N,DAVIDSON A,et al.Approxima-ting game-theoretic optimal strategies for full-scale poker[C]//IJCAI.2003:661. [64]SCHNIZLEIN D.State translation in no-limit poker[D].Ed-monton:University of Alberta,2009. [65]BROWN N,GANZFRIED S,SANDHOLM T.Tartanian7:achampion two-player no-limit texas hold’em poker-playing program[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2015:4270-4271. [66]BROWN N,SANDHOLM T.Baby Tartanian8:Winning Agent from the 2016 Annual Computer Poker Competition[C]//IJCAI.2016:4238-4239. [67]FARINA G,KROER C,SANDHOLM T.Regret circuits:Composability of regret minimizers[C]//International Confe-rence on Machine Learning.PMLR,2019:1863-1872. [68]LANCTOT M,WAUGH K,ZINKEVICH M,et al.Monte Carlo Sampling for Regret Minimization in Extensive Games[C]//NIPS.2009:1078-1086. [69]SCHMID M,BURCH N,LANCTOT M,et al.Variance reduction in monte carlo counterfactual regret minimization(VR-MCCFR) for extensive form games using baselines[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2019:2157-2164. [70]LI H,HU K,GE Z,et al.Double neural counterfactual regret minimization[EB/OL].(2021-11-06) [2018-11-27].https://arxiv.org/pdf/1812.10607.pdf. [71]JACKSON E G.Targeted CFR[C]//Workshops at the Thirty-first AAAI Conference on Artificial Intelligence.2017. [72]BOWLING M,BURCH N,JOHANSON M,et al.Heads-uplimit hold’em poker is solved[J].Science,2015,347(6218):145-149. [73]BROWN N,SANDHOLM T.Solving Imperfect-InformationGames via Discounted Regret Minimization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:1829-1836. [74]BROWN N,LERER A,GROSS S,et al.Deep counterfactualregret minimization[C]//International Conference on Machine Learning.PMLR,2019:793-802. [75]ZHOU Y,REN T,LI J,et al.Lazy-CFR:fast and near optimal regret minimization for extensive games with imperfect information[EB/OL].(2018-11-25) [2022-03-21].https://arxiv.org/pdf/1810.04433.pdf. [76]LIU W,LI B,TOGELIUS J.Model-free Neural Counterfactual Regret Minimization with Bootstrap Learning[EB/OL].(2020-01-02) [2022-03-21].https://arxiv.org/pdf/2012.01870.pdf. [77]LI H,WANG X,QI S,et al.Solving imperfect-informationgames via exponential counterfactual regret minimization[EB/OL].(2020-11-04)[2022-03-21].https://arxiv.org/pdf/2008.02679.pdf. [78]STEINBERGER E.Single deep counterfactual regret minimization[EB/OL].(2019-10-04) [2022-03-21].https://arxiv.org/pdf/1901.07621.pdf. [79]LANCTOT M.Monte Carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games[M].Edmonton:University of Alberta,2013. [80]REIS J.A GPU implementation of Counterfactual Regret Minimization[D].Porto:University of Porto,2015. [81]CELLI A,MARCHESI A,BIANCHI T,et al.Learning to correlate in multi-player general-sum sequential games[J].Advances in Neural Information Processing Systems,2019(32):13076-13086. [82]TESAURO G.TD-Gammon,a self-teaching backgammon pro-gram,achieves master-level play[J].Neural Computation,1994,6(2):215-219. [83]BERNER C,BROCKMAN G,CHAN B,et al.Dota 2 withlarge scale deep reinforcement learning[EB/OL].(2019-11-13) [2022-03-21].https://arxiv.org/pdf/1912.06680.pdf. [84]BROWN G W.Iterative solution of games by fictitious play[J].Activity Analysis of Production and Allocation,1951,13(1):374-376. [85]VAN D G B.A weakened form of fictitious play in two-person zero-sum games[J].International Game Theory Review,2000,2(4):307-328. [86]LESLIE D S,COLLINS E J.Generalized weakened fictitiousplay[J].Games and Economic Behavior,2006,56(2):285-298. [87]CHEN Y,ZHANG L,LI S,et al.Optimize Neural FictitiousSelf-Play in Regret Minimization Thinking[EB/OL].(2021-04-22) [2022-03-21].https://arxiv.org/pdf/2104.10845.pdf. [88]HEINRICH J,SILVER D.Deep reinforcement learning fromself-play in imperfect-information games[EB/OL].(2016-01-28) [2022-03-21].https://arxiv.org/pdf/1603.01121.pdf. [89]ZHANG L,WANG W,LI S,et al.Monte Carlo neural ficti-tious self-play:Approach to approximate Nash equilibrium of imperfect-information games[EB/OL].(2019-04-06) [2022-03-21].https://arxiv.org/pdf/1903.09569.pdf. [90]HEINRICH J,SILVER D.Self-play monte-carlo tree search in computer poker[C]//Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence.2014:19-25. [91]JIANG Q,LI K,DU B,et al.DeltaDou:Expert-level Doudizhu AI through Self-play[C]//IJCAI.2019:1265-1271. [92]KASH I A,SULLINS M,HOFMANN K.Combining No-re-gret and Q-learning[C]//Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems.2020:593-601. [93]LANCTOT M,LOCKHART E,LESPIAU J B,et al.Open-Spiel:A Framework for Reinforcement Learning in Games[EB/OL].(2020-09-26) [2022-03-21].https://arxiv.org/pdf/1908.09453.pdf. [94]HENDERSON P,ISLAM R,BACHMAN P,et al.Deep reinforcement learning that matters[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018:3207-3214. [95]HERNANDEZ-LEAL P,KARTAL B,TAYLOR M E.A survey and critique of multiagent deep reinforcement learning[J].Autonomous Agents and Multi-Agent Systems,2019,33(6):750-797. [96]CZARNECKI W M,GIDEL G,TRACEY B,et al.Real world games look like spinning tops[EB/OL].(2020-01-17) [2022-03-21].https://arxiv.org/pdf/2004.09468.pdf. [97]LUO J,ZHANG W,YUAN W,et al.Research on OpponentModeling Framework for Multi-agent Game Confrontation[EB/OL].(2019-11-13) [2022-02-16].http//kns.cnki.net/kcms/detail/11.3092.V.20210818.1041.007.html. [98]FENG X,SLUMBERS O,YANG Y,et al.Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games[EB/OL].(2021-11-01) [2022-03-21].https://arxiv.org/pdf/2106.02745.pdf. [99]PRETORIUS A,TESSERA K,SMIT A P,et al.Mava:a re-search framework for distributed multi-agent reinforcement learning[EB/OL].(2021-01-03) [2022-03-21].https://arxiv.org/pdf/2107.01460.pdf. [100]LIANG E,LIAW R,NISHIHARA R,et al.RLlib:Abstractions for distributed reinforcement learning[C]//International Conference on Machine Learning.PMLR,2018:3053-3062. [101]ZHOU M,WAN Z,WANG H,et al.MALib:A Parallel Framework for Population-based Multi-agent Reinforcement Learning[EB/OL].(2021-01-05) [2022-03-21].https://arxiv.org/pdf/2106.07551.pdf. [102]NEU G,JONSSON A,GÓMEZ V.A unified view of entropy-regularized markov decision processes[EB/OL].(2017-05-22) [2022-03-21].https://arxiv.org/pdf/1705.07798.pdf. [103]JIN C,ALLEN-ZHU Z,BUBECK S,et al.Is Q-learning pro-vably efficient?[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:4868-4878. [104]GRUSLYS A,LANCTOT M,MUNOS R,et al.The advantage regret-matching actor-critic[EB/OL].(2020-08-27) [2022-03-21].https://arxiv.org/pdf/2008.12234.pdf. [105]LI H,WANG X,JIA F,et al.RLCFR:Minimize counterfactual regret by deep reinforcement learning[EB/OL].(2020-08-27) [2022-03-21].https://arxiv.org/pdf/2009.06373.pdf. [106]STEINBERGER E,LERER A,BROWN N.DREAM:Deep regret minimization with advantage baselines and model-free learning[EB/OL].(2020-11-29) [2022-03-21].https://arxiv.org/pdf/2006.10410.pdf. [107]SRINIVASAN S,LANCTOT M,ZAMBALDI V F,et al.Actor-Critic Policy Optimization in Partially Observable Multiagent Environments[C]//NeurIPS.2018:3426-3439. [108]LANCTOT M,ZAMBALDI V,GRUSLYS A,et al.A unified game-theoretic approach to multiagent reinforcement learning[EB/OL].(2017-11-07) [2022-03-21].https://arxiv.org/pdf/1711.00832.pdf. [109]JAIN M,KORZHYK D,VANÓK O,et al.A double oracle algorithm for zero-sum security games on graphs[C]//The 10th International Conference on Autonomous Agents and Multiagent Systems.2011:327-334. [110]BALDUZZI D,GARNELO M,BACHRACH Y,et al.Open-ended learning in symmetric zero-sum games[C]//International Conference on Machine Learning.PMLR,2019:434-443. [111]MCALEER S,LANIER J,FOX R,et al.Pipeline psro:A scalable approach for finding approximate nash equilibria in large games[EB/OL].(2021-02-18) [2022-03-21].https://arxiv.org/pdf/2006.08555.pdf. [112]PEREZ-NIEVES N,YANG Y,SLUMBERS O,et al.Modelling behavioural diversity for learning in open-ended games[C]//International Conference on Machine Learning.PMLR,2021:8514-8524. [113]DINH L C,YANG Y,TIAN Z,et al.Online Double Oracle[EB/OL].(2021-03-16) [2022-03-21].https://arxiv.org/pdf/2103.07780.pdf. [114]NGUYEN T H,SINHA A,HE H.Partial Adversarial Behavior Deception in Security Games[C]//Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence(IJCAI-PRICAI-20).2020:283-289. [115]WU B,CUBUKTEPE M,BHARADWAJ S,et al.Reward-Based Deception with Cognitive Bias[C]//2019 IEEE 58th Conference on Decision and Control(CDC).IEEE,2019:2265-2270. [116]WEN Y,YANG Y,LUO R,et al.Probabilistic recursive reaso-ning for multi-agent reinforcement learning[EB/OL].(2019-01-26) [2022-03-21].https://arxiv.org/pdf/1901.09207.pdf. [117]DAI Z,CHEN Y,LOW B K H,et al.R2-B2:Recursive reaso-ning-based Bayesian optimization for no-regret learning in games[C]//International Conference on Machine Learning.PMLR,2020:2291-2301. [118]ZINKEVICH M,GREENWALD A,LITTMAN M.Cyclic equilibria in Markov games[J].Advances in Neural Information Processing Systems,2006(18):1641-1649. [119]ALBRECHT S V,STONE P.Autonomous agents modellingother agents:A comprehensive survey and open problems[J].Artificial Intelligence,2018(258):66-95. [120]FOERSTER J N,CHEN R Y,AL-SHEDIVAT M,et al.Lear-ning with Opponent-Learning Awareness[C]//Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems(AMMAS’18).2018:122-130. [121]SHEN M,HOW J P.Active perception in adversarial scenarios using maximum entropy deep reinforcement learning[C]//2019 International Conference on Robotics and Automation(ICRA).IEEE,2019:3384-3390. [122]KEREN S,GAL A,KARPAS E.Goal recognition design-Survey [C]//Twenty-Ninth International Joint Conference on Artificial Intelligence.2020:4847-4853. [123]LE G N,MOUADDIB A I,LEROUVREUR X,et al.A generative game-theoretic framework for adversarial plan recognition[C]//ICAPS Workshop on Distributed and Multi-Agent Planning.2015. [124]FINN C,ABBEEL P,LEVINE S.Model-agnostic meta-learning for fast adaptation of deep networks[C]//International Confe-rence on Machine Learning.PMLR,2017:1126-1135. [125]NAGABANDI A,CLAVERA I,LIU S,et al.Learning to adapt in dynamic,real-world environments through meta-reinforcement learning[EB/OL].(2019-02-27) [2022-03-21].https://arxiv.org/pdf/1803.11347.pdf. [126]PAINE T L,COLMENAREJO S G,WANG Z,et al.One-shot high-fidelity imitation:Training large-scale deep nets with RL[EB/OL].(2018-10-11) [2022-03-21].https://arxiv.org/pdf/1810.05017.pdf. [127]FINN C,RAJESWARAN A,KAKADE S,et al.Online meta-learning[C]//International Conference on Machine Learning.PMLR,2019:1920-1930. [128]HSU K,LEVINE S,FINN C.Unsupervised learning via meta-learning[EB/OL].(2019-03-21)[2022-03-21].https://arxiv.org/pdf/1810.02334.pdf. [129]PENG H M.Metal Learning:Methods and Applications[M].Beijing:Electronic Industry Press,2021:229-261. |
[1] | LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241. |
[2] | JIANG Yang-yang, SONG Li-hua, XING Chang-you, ZHANG Guo-min, ZENG Qing-wei. Belief Driven Attack and Defense Policy Optimization Mechanism in Honeypot Game [J]. Computer Science, 2022, 49(9): 333-339. |
[3] | SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256. |
[4] | YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253. |
[5] | LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279. |
[6] | XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11. |
[7] | XU Hao, CAO Gui-jun, YAN Lu, LI Ke, WANG Zhen-hong. Wireless Resource Allocation Algorithm with High Reliability and Low Delay for Railway Container [J]. Computer Science, 2022, 49(6): 39-43. |
[8] | HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157. |
[9] | GUO Yu-xin, CHEN Xiu-hong. Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement [J]. Computer Science, 2022, 49(6): 313-318. |
[10] | FAN Jing-yu, LIU Quan. Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning [J]. Computer Science, 2022, 49(6): 335-341. |
[11] | ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185. |
[12] | LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268. |
[13] | OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51. |
[14] | ZHOU Qin, LUO Fei, DING Wei-chao, GU Chun-hua, ZHENG Shuai. Double Speedy Q-Learning Based on Successive Over Relaxation [J]. Computer Science, 2022, 49(3): 239-245. |
[15] | LI Su, SONG Bao-yan, LI Dong, WANG Jun-lu. Composite Blockchain Associated Event Tracing Method for Financial Activities [J]. Computer Science, 2022, 49(3): 346-353. |
|