计算机科学 ›› 2024, Vol. 51 ›› Issue (1): 84-98.doi: 10.11896/jsjkx.230600011
罗俊仁, 邹明我, 陈少飞, 张万鹏, 陈璟
LUO Junren, ZOU Mingwo, CHEN Shaofei, ZHANG Wanpeng, CHEN Jing
摘要: 对抗条件下的资源分配是大多数博弈决策问题的核心。从拟合最优解到博弈均衡解,基于博弈论的资源分配策略求解是认知决策领域的前沿课题。文中围绕对抗条件下资源分配的布洛托上校博弈模型和求解方法展开综述分析。首先,简要介绍了离线与在线策略学习的区别,策略博弈与相关解概念,在线优化与遗憾值;其次,梳理了6类布洛托上校博弈典型模型(连续布洛托上校博弈、离散布洛托上校博弈、广义布洛托上校博弈、广义乐透布洛托博弈、广义规则布洛托上校博弈与在线离散布洛托上校博弈);然后,区分2个阶段(离线与在线)3类博弈场景(单次、重复、多阶段),分析了多类布洛托上校博弈求解方法;最后,从典型应用探索、广义博弈模型、博弈求解方法、未来研究展望共4方面进行了未来研究前沿分析及展望。通过对当前布洛托上校博弈进行概述,期望能为对抗条件下资源分配与博弈论相关领域的研究带来启发。
中图分类号:
[1]ZHAO J,YANG C.Graph Reinforcement Learning for Predictive Power Allocation to Mobile Users[EB/OL].(2022-03-08)(2022-11-01).https://arxiv.org/abs/2203.03906. [2]PANIGRAHY N K,BASU P,NAIN P,et al.Resource Allocation in One-Dimensional Distributed Service Networks with Applications [J].Performance Evaluation.2020,142:102110. [3]ABDALLAH M.Effects of Behavioral Decision-Making inGame-Theoretic Frameworks for Security Resource Allocation in Networked Systems [D].West Lafayette:Purdue University Graduate School,2022. [4]JI X,ZHANG W,XIANG F,et al.A Swarm ConfrontationMethod Based on Lanchester Law and Nash Equilibrium [J].Electronics,2022,11(6):896. [5]ZHANG X X,GE B F,TAN Y J.Multi-Attribute Game Theoretic Model for Resource Allocation in Military Attack Defense Application[J].Journal of National University of Defense Technology,2018,40(5):153-160. [6]LIU B Y,YE X B,ZHOU C F,et al.Composite Mode On-Orbit Service Resource Allocation Based on Improved DQN[J].Acta Aeronautica et Astronautica Sinica,2020,41(5):9. [7]YAN W,JINKUAN W,JINGHAO S.A Game-Theoretic Based Resource Allocation Strategy for Cloud Computing Services[J].Scientific Programming,2016,2016(Pt.2):1629893.1. [8]MYERSON R B.Incentives to Cultivate Favored Minorities Under Alternative Electoral Systems [J].American Political Science Review,1993,87(4):856-869. [9]JIANG Y J,KUANG K,WU F.Big Data Intelligence:From the Optimal Solution of Data Fitting to The Equilibrium Solution of Game Theory[J].CAAI Transactions on Intelligent Systems,2020,15(1):175-182. [10]BOREL E.La Théorie Du Jeu Et Les Equations IntégralesaNoyau Symétrique [J].Comptes rendus de l’Académie des Sciences,1921,173(1304/1305/1306/1307/1308):58. [11]NAKAYAMA M.The Dawn of Modern Theory of Games[J].Advances in Mathematical Economics,2006,9:73-97. [12]GROSS O,WAGNER R.A Continuous Colonel Blotto Game [R].Rand Project Air Force Santa Monica Ca,1950. [13]KOVENOCK D,ROBERSON B.Coalitional Colonel BlottoGames with Application to The Economics of Alliances [J].Journal of Public Economic Theory,2012,14(4):653-676. [14]GRANA J,LAMB J,O’DONOUGHUE N A.Findings on Mosaic Warfare from a Colonel Blotto Game [R].Rand National Defense Research Inst Santa Monica Ca,2021. [15]MORGENSTERN O,VON NEUMANN J.Theory of Gamesand Economic Behavior [M].Princeton University Press,1953. [16]NASH J.Non-Cooperative Games [D].Annals of mathematics,1951:286-295. [17]KALAI A,VEMPALA S.Efficient Algorithms for Online Decision Problems [J].Journal of Computer and System Sciences,2005,71(3):291-307. [18]HANNAN J.Approximation to Bayes Risk in Repeated Play[J].Contributions to the Theory of Games,1957,3(2):97-139. [19]ROBBINS H.Some Aspects of The Sequential Design of Experiments [J].Bulletin of the American Mathematical Society,1952,58(5):527-535. [20]GYÖRGY A,LINDER T,LUGOSI G,et al.The On-Line Shortest Path Problem Under Partial Monitoring [J].Journal of Machine Learning Research,2007,8(10):2369-2403. [21]KOCÁK T,NEU G,VALKO M,et al.Efficient Learning by Implicit Exploration in Bandit Problems with Side Observations [C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.2014:613-621. [22]BARTLETT P,DANI V,HAYES T,et al.High-ProbabilityRegret Bounds for Bandit Online Linear Optimization [C]//Proceedings of the 21st Annual Conference on Learning Theory-COLT.2008:335-342. [23]LAI T L,ROBBINS H.Asymptotically Efficient Adaptive Allocation Rules [J].Advances in Applied Mathematics,1985,6(1):4-22. [24]AUER P,CESA-BIANCHI N,FISCHER P.Finite-Time Analysis of The Multiarmed Bandit Problem [J].Machine Learning,2002,47(2):235-256. [25]BUBECK S,CESA-BIANCHI N.Regret Analysis of Stochastic and Non-Stochastic Multi-Armed Bandit Problems [J].Foundations and Trends© in Machine Learning,2012,5(1):1-122. [26]GARIVIER A,CAPPÉ O.The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond [C]//Proceedings of the 24th Annual Conference on Learning Theory.2011:359-376. [27]AUER P,CESA-BIANCHI N,FREUND Y,et al.The Non-Stochastic Multiarmed Bandit Problem [J].SIAM Journal on Computing,2002,32(1):48-77. [28]FREUND Y,SCHAPIRE R E.A Decision-Theoretic Generalization of On-Line Learning and An Application to Boosting [J].Journal of Computer and System Sciences,1997,55(1):119-139. [29]BARD N D C.Online Agent Modelling in Human-Scale Problems [D].Edmonton:University of Alberta,2016. [30]CESA-BIANCHI N,LUGOSI G.Prediction,Learning,andGames [M].Cambridge:Cambridge university press,2006. [31]ARORA R,DEKEL O,TEWARI A.Online Bandit LearningAgainst an Adaptive Adversary:From Regret to Policy Regret.[EB/OL].(2012-06-27) [2022-12-01].https://arxiv.org/abs/1206.6400. [32]LITTLESTONE N,WARMUTH M K.The Weighted Majority Algorithm [J].Information and Computation,1994,108(2):212-261. [33]WARMUTH M K,JAGOTA A K.Continuous and Discrete-Time Nonlinear Gradient Descent:Relative Loss Bounds and Convergence [C]//Electronic Proceedings of the 5th International Symposium on Artificial Intelligence and Mathematics.1997. [34]GORDON G J.Regret Bounds for Prediction Problems [C]//Proceedings of the Twelfth Annual Conference on Computa-tional Learning Theory.1999:29-40. [35]MORRILL D.Hindsight Rational Learning for Sequential Decision-Making:Foundations and Experimental Applications [D].Alberta:University of Alberta,2022. [36]NEU G.Explore No More:Improved High-Probability Regret Bounds for Non-Stochastic Bandits [C]//Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2.2015:3168-3176. [37]ABERNETHY J,RAKHLIN A.Beating the Adaptive Bandit with High Probability [C]//Proceedings of the Information Theory and Applications Workshop.2009:280-289. [38]ABERNETHY J,HAZAN E E,RAKHLIN A.Competing inthe Dark:An Efficient Algorithm for Bandit Linear Optimization [C]//Proceedings of the 21st Annual Conference on Learning Theory(COLT 2008).2008:263-273. [39]NEU G,BARTÓK G.An Efficient Algorithm for Learning with Semi-Bandit Feedback [C]//Proceedings of the International Conference on Algorithmic Learning Theory.2013:234-248. [40]DANI V,KAKADE S M,HAYES T.The Price of Bandit Information for Online Optimization [C]//Proceedings of the 20th International Conference on Neural Information Processing Systems.2007:345-352. [41]ADAM L,HORCIK R,KASL T,et al.Double Oracle Algorithm for Computing Equilibria in Continuous Games [C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2021:5070-5077. [42]GANZFRIED S.Algorithm for Computing Approximate NashEquilibrium in Continuous Games with Application to Continuous Blotto [J].Games,2021,12(2):47. [43]BEHNEZHAD S,BLUM A,DERAKHSHAN M,et al.FromBattlefields to Elections:Winning Strategies of Blotto and Auditing Games [C]//Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms.2018:2291-2310. [44]BEHNEZHAD S,DEHGHANI S,DERAKHSHAN M,et al.Faster and Simpler Algorithm for Optimal Strategies of Blotto Game [C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.2017:369-375. [45]KOVENOCK D,ROBERSON B.Generalizations of the General Lotto and Colonel Blotto Games [J].Economic Theory,2021,71(3):997-1032. [46]VU D Q,LOISEAU P,SILVA A.Approximate Equilibria inNon-constant-sum Colonel Blotto and Lottery Blotto Games with Large Numbers of Battlefields [EB/OL].(2019-10-15) [2022-12-01].https://arxiv.org/abs/1910.06559v1. [47]ROBERSON B.The Colonel Blotto Game [J].Economic Theory,2006,29(1):1-24. [48]SCHWARTZ G,LOISEAU P,SASTRY S S.The Heteroge-neous Colonel Blotto Game [C]//2014 7th International Confe-rence on Network Games,Control and Optimization(NetGCoop).2014:232-238. [49]VU D Q,LOISEAU P,SILVA A.Approximate Equilibria inGeneralized Colonel Blotto and Generalized Lottery Blotto Games [EB/OL].(2019-10-15) [2022-12-01].https://arxiv.org/abs/1910.06559v2. [50]LI X,ZHENG J.Pure strategy Nash equilibrium in 2-Contestant Generalized Lottery Colonel Blotto Games [J].Journal of Mathematical Economics,2022,103:102771. [51]TULLOCK G.Efficient rent-seeking revisited[M]//The Political Economy of Rent-seeking.Boston,MA:Springer US,1988:91-94. [52]VU D Q,LOISEAU P.Colonel Blotto Games with Favoritism:Competitions with Pre-Allocations and Asymmetric Effectiveness [C]//Proceedings of the 22nd ACM Conference on Economics and Computation.2021:862-863. [53]ASPECT L,EWERHART C.Colonel Blotto Games with AHead Start [R].Department of Economics-University of Zu-rich,2022. [54]GUPTA A,SCHWARTZ G,LANGBORT C,et al.A Three-Stage Colonel Blotto Game with Applications to Cyber Physical Security [C]//Proceedings of the American Control Confe-rence.2014:3820-3825. [55]PAARPORN K,CHANDAN R,KOVENOCK D,et al.Strategically Revealing Intentions in General Lotto Games [EB/OL].(2021-10-23) [2022-12-01].https://arxiv.org/abs/2110.12099. [56]VU D Q,LOISEAU P,SILVA A,et al.Path Planning Problems with Side Observations-When Colonels Play Hide And-Seek [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:2252-2259. [57]TAKIMOTO E,WARMUTH M K.Path Kernels and Multipli-cative Updates [J].The Journal of Machine Learning Research,2003,4:773-818. [58]CZARNECKI W M,GIDEL G,TRACEY B,et al.Real World Games Look Like Spinning Tops [C]//In Proceedings of the 34th International Conference on Neural Information Processing Systems.2020:17443-17454. [59]OMIDSHAFIEI S,TUYLS K,CZARNECKI W M,et al.Navigating the Landscape of Multiplayer Games [J].Nature Communications.2020,11(1):1-17. [60]HORTALA-VALLVE R,LLORENTE-SAGUER A.PureStrategy Nash Equilibria in Non-Zero Sum Colonel Blotto Games [J].International Journal of Game Theory,2012,41(2):331-343. [61]AHMADINEJAD A,DEHGHANI S,HAJIAGHAYI M,et al.From Duels to Battlefields:Computing Equilibria of Blotto and Other Games [J].Mathematics of Operations Research,2019,44(4):1304-1325. [62]BAYE M R,KOVENOCK D,DE VRIES C G.The Solution to The Tullock Rent-Seeking Game When R> 2:Mixed-Strategy Equilibria and Mean Dissipation Rates [J].Public Choice,1994,81(3):363-380. [63]HILLMAN A L,RILEY J G.Politically Contestable Rents and Transfers [J].Economics & Politics,1989,1(1):17-39. [64]MCMAHAN H B,GORDON G J,BLUM A.Planning in The Presence of Cost Functions Controlled by An Adversary [C]//Proceedings of the 20th International Conference on Machine Learning(ICML-03).2003:536-543. [65]LANCTOT M,ZAMBALDI V,GRUSLYS A,et al.A Unified Game-Theoretic Approach to Multiagent Reinforcement Lear-ning [C]//Proceedings of the 31th International Conference in Neural Information Processing Systems.2017:4193-4206. [66]ZOU M,CHEN J,LUO J,et al.Equilibrium Approximating andOnline Learning for Anti-Jamming Game of Satellite Communication Power Allocation [J].Electronics,2022,11(21):3526. [67]GEMP I,SAVANI R,LANCTOT M,et al.Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent[C]//Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems.2022:507-515. [68]ANTHONY T,ECCLES T,TACCHETTI A,et al.Learning To Play No-Press Diplomacy with Best Response Policy Iteration[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems.2020:17987-18003. [69]JACOB A P,WU D J,FARINA G,et al.Modeling StrongandHuman-Like Gameplay With KL-Regularized Search[C]//International Conference on Machine Learning.PMLR,2022:9695-9728. [70]STRANG A,SEWELL D,KIM A,et al.Principal Trade-offAnalysis [EB/OL].(2022-06-09) [2022-12-01].https://arxiv.org/abs/2206.07520. [71]BERTRAND Q,CZARNECKI W M,GIDEL G.On the Limitations of Elo:Real-World Games,are Transitive,not Additive [EB/OL].(2022-06-21) [2022-12-01].https://arxiv.org/abs/2206.12301. [72]CHEN W,WANG Y,YUAN Y,et al.Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms [J].The Journal of Machine Learning Research,2016,17(1):1746-1778. [73]KOOLEN W M,WARMUTH M K,KIVINEN J,et al.Hedging Structured Concepts [C]//Proceedings of the COLT.2010:93-105. [74]ALON N,CESA-BIANCHI N,DEKEL O,et al.Online Learning with Feedback Graphs:Beyond Bandits [C]//Proceedings of the Conference on Learning Theory.2015:23-35. [75]CESA-BIANCHI N,LUGOSI G.Combinatorial Bandits [J].Journal of Computer and System Sciences,2012,78(5):1404-1422. [76]VU D Q,LOISEAU P,SILVA A.Combinatorial Bandits for Sequential Learning in Colonel Blotto Games [C]//Proceedings of the IEEE 58th Conference on Decision and Control(CDC).2019:867-872. [77]BADANIDIYURU A,KLEINBERG R,SLIVKINS A.Banditswith Knapsacks [J].Journal of the ACM(JACM),2018,65(3):1-55. [78]IMMORLICA N,SANKARARAMAN K A,SCHAPIRE R,et al.Adversarial Bandits with Knapsacks [C]//Proceedings of the IEEE 60th Annual Symposium on Foundations of Computer Science(FOCS).2019:202-219. [79]AGRAWAL S,DEVANUR N.Linear Contextual Bandits with Knapsacks [C]//Proceedings of the 30th International Confe-rence on Neural Information Processing Systems.2016:3458-3467. [80]LIU S,JIANG J,LI X.Non-stationary Bandits with Knapsacks [EB/OL].(2022-05-25) [2022-12-01].https://arxiv.org/abs/2205.12427. [81]CASTIGLIONI M,CELLI A,KROER C.Online Learning with Knapsacks:The Best of Both Worlds [EB/OL].(2022-02-28).[2022-12-01] https://arxiv.org/abs/2202.13710. [82]GAITONDE J,LI Y,LIGHT B,et al.Budget Pacing in Repeated Auctions:Regret and Efficiency without Convergence [EB/OL].(2022-05-18) [2022-12-01].https://arxiv.org/abs/2205.08674. [83]ASHWINKUMAR B,ROBERT K,ALEKSANDRS S.Bandits with Knapsacks [C]//Proceedings of the 54th IEEE Symposium on Foundations of Computer Science.ACM,2013. [84]AGRAWAL S,DEVANUR N R.Bandits with Concave Rewards and Convex Knapsacks [C]//Proceedings of the Fifteenth ACM Conference on Economics and Computation.2014:989-1006. [85]AGRAWAL S,DEVANUR N R,LI L.An Efficient Algorithm for Contextual Bandits with Knapsacks,and An Extension to Concave Objectives [C]//Proceedings of the Conference on Learning Theory.2016:4-18. [86]BADANIDIYURU A,LANGFORD J,SLIVKINS A.Resourceful Contextual Bandits [C]//Proceedings of the Conference on Learning Theory.2014:1109-1134. [87]AGARWAL A,HSU D,KALE S,et al.Taming the Monster:A Fast and Simple Algorithm for Contextual Bandits [C]//Proceedings of the International Conference on Machine Learning.2014:1638-1646. [88]LI X,SUN C,YE Y.The Symmetry between Arms and Knap-sacks:A Primal-Dual Approach for Bandits with Knapsacks [C]//Proceedings of the International Conference on Machine Learning.2021:6483-6492. [89]LEON V,ETESAMI S R.Bandit Learning for Dynamic Colonel Blotto Game with A Budget Constraint [C]//Proceedings of the 60th IEEE Conference on Decision and Control(CDC).2021:3818-3823. [90]SANKARARAMAN K A,SLIVKINS A.Combinatorial Semi-Bandits with Knapsacks [C]//Proceedings of the International Conference on Artificial Intelligence and Statistics.2018:1760-1770. [91]WONG R T.Combinatorial Optimization:Algorithms and Complexity [J].SIAM Review,1983,25(3):424. [92]CLEMPNER J B.Reveling Misleading Information for Defenders and Attackers in Repeated Stackelberg Security Games [J].Engineering Applications of Artificial Intelligence,2022,110:104703. [93]CHANDAN R,PAARPORN K,MARDEN J R.When Showing Your Hand Pays Off:Announcing Strategic Intentions in Colonel Blotto Games [C]//Proceedings of the American Control Conference(ACC).2020:4632-4637. [94]SEDDIGHIN S.Campaigning via LP’s:Solving Blotto and Beyond [D].Baltimore:University of Maryland,2019. [95]FERGUSON B L,SHISHIKA D,MARDEN J R.Ensuring the Defense of Paths and Perimeters in Dynamic Defender-Attacker Blotto Games(dDAB) on Graphs [C]//Proceedings of the 58th Annual Allerton Conference on Communication,Control,and Computing(Allerton).2022:1-7. [96]CHEN A K,FERGUSON B L,SHISHIKA D,et al.Path Defense in Dynamic Defender-Attacker Blotto Games(dDAB) with Limited Information [EB/OL].(2022-04-08) [2022-12-01].https://arxiv.org/abs/2204.04176. [97]PAARPORN K,CHANDAN R,ALIZADEH M,et al.A General Lotto Game with Asymmetric Budget Uncertainty [EB/OL].(2021-06-23) [2022-12-01].https://arxiv.org/abs/2106.12133. [98]SHAH V,MARDEN J R.Battlefield transfers in coalitionalBlotto games [EB/OL].(2023-04-04) [2023-06-30].https://arxiv.org/abs/2304.02068. [99]EWERHART C,KOVENOCK D.A Class of N-Player Colonel Blotto Games with Multidimensional Private Information[J].Operations Research Letters,2021,49(3):418-425. [100]ANBARCI N,CINGIZ K,ISMAIL M S.Proportional ResourceAllocation in Dynamic N-Player Blotto Games[J].Mathematical Social Sciences,2023,125:94-100. [101]GUAN S,WANG J,JIANG C,et al.Colonel Blotto Game Aided Attack-Defense Analysis in Real-World Networks[C]//2018 IEEE Global Communications Conference(GLOBECOM).IEEE,2018:1-6. [102]BOIX-ADSERÀ E,EDELMAN B L,JAYANTI S.The Multi-player Colonel Blotto Game[C]//Proceedings of the 21st ACM Conference on Economics and Computation.2020:47-48. [103]DONAHUE K,KLEINBERG J.Private Blotto:Viewpoint Competition with Polarized Agents [EB/OL].(2023-02-27) [2023-06-30].https://arxiv.org/abs/2302.14123. [104]PERCHET V,RIGOLLET P,LE GOUIC T.An Algorithmic Solution to The Blotto Game Using Multi-Marginal Couplings [C]//Proceedings of the 23rd ACM Conference on Economics and Computation.2022:208-209. [105]BEAGLEHOLE D,HOPKINS M,KANE D,et al.SamplingEquilibria:Fast No-Regret Learning in Structured Games [EB/OL].(2022-01-26) [2022-12-01].https://arxiv.org/abs/2201.10758. [106]NOEL J C G.Reinforcement Learning Agents in Colonel Blotto [EB/OL].(2022-04-04) [2022-12-01].https://arxiv.org/abs/2204.02785. [107]THOMAS C.N-Dimensional Blotto Game with Heterogeneous Battlefield Values [J].Economic Theory,2018,65(3):509-544. [108]MASUCCI A M,SILVA A.DefensiveResource Allocation in Social Networks[C]//2015 54th IEEE Conference on Decision and Control(CDC).IEEE,2015:2927-2932. [109]ZOU M,CHEN J,LUO J,et al.Equilibrium Approximating and Online Learning for Anti-Jamming Game of Satellite Communication Power Allocation[J].Electronics,2022,11(21):3526. [110]ZHANG L,WANG Y,HAN Z.Safeguarding UAV-EnabledWireless Power Transfer Against Aerial Eavesdropper:A Colonel Blotto Game[J].IEEE Wireless Communications Letters,2021,11:503-507. [111]WEI P,WANG S D,LU R M,et al.Multi-Channel Power Distribution for Anti-Jamming Based on Asymmetric Colonel Blotto Game [J].Journal of National University of Defense Techno-logy,2023,45:35-48. [112]HAJIMIRSADEGHI M,SRIDHARAN G,SAAD W,et al.In-ter-Network Dynamic Spectrum Allocation Via a Colonel Blotto Game [C]//Proceedings of the Annual Conference on Information Science and Systems(CISS).2016:252-257. [113]MIN M,XIAO L,XIE C,et al.Defense Against Advanced Persistent Threats:A Colonel Blotto Game Approach [C]//Proceedings of the IEEE International Conference on Communications(ICC).2017:1-6. [114]GUAN S,WANG J,YAO H,et al.Colonel blotto games in network systems:Models,strategies,and applications[J].IEEE Transactions on Network Science and Engineering,2019,7(2):637-649. [115]FERDOWSI A,SAAD W,MANDAYAM N B.Colonel BlottoGame for Sensor Protection in Interdependent Critical Infrastructure[J].IEEE Internet of Things Journal,2020,8(4):2857-2874. |
|