计算机科学 ›› 2018, Vol. 45 ›› Issue (1): 140-143.doi: 10.11896/j.issn.1002-137X.2018.01.023

• 第十六届中国机器学习会议 • 上一篇    下一篇

双人博弈问题中的蒙特卡洛树搜索算法的改进

季辉,丁泽军   

  1. 中国科学技术大学物理系 合肥230026,中国科学技术大学物理系 合肥230026
  • 出版日期:2018-01-15 发布日期:2018-11-13

Improvement of Monte Carlo Tree Search Algorithm in Two-person Game Problem

JI Hui and DING Ze-jun   

  • Online:2018-01-15 Published:2018-11-13

摘要: 蒙特卡洛树搜索(MCTS)是一种针对决策类博弈游戏,运用蒙特卡洛模拟方法进行评估博弈策略的启发式搜索算法。但是,在面对计算机围棋这种复杂的决策过程时,简单的蒙特卡洛树搜索过程往往由于计算量大,收敛速度非常慢。 由于双人博弈游戏中的蒙特卡洛树搜索不能收敛于双人博弈的最佳决策策略,因此提出蒙特卡洛树搜索结合极大极小值算法的改进算法,使得搜索结果不会因为蒙特卡洛方法的随机性而失真。为了进一步提高复杂双人博弈游戏中搜索算法的计算效率,还结合了几种常见的剪枝策略。实验结果说明,所提算法显著改进了蒙特卡洛树搜索的准确性和效率。

关键词: 蒙特卡洛树搜索,剪枝策略,双人博弈问题

Abstract: Monte Carlo tree search (MCTS) is a heuristic search algorithm for some kinds of decision processes,most notably those employed in game play.In the decision process of a very complex game,like a computer Go game,the basic Monte Carlo tree search method converges very slowly due to large computation cost.This article indicated that MCTS cannot converge to the best strategy of the two-person complete information games.Therefore,this article proposed a new search algorithm which combines MCTS with the min-max algorithm in order to avoid the failure due to the randomness of Monte Carlo method.For further improving computation efficiency of MTCS in complex two-person games,this article also considered to employ some progressive pruning strategies.An experimental test shows that the new algorithm significantly improves the accuracy and efficiency of MCTS.

Key words: Monte Carlo tree search,Progressive pruning,Two-person games

[1] CAMPBELL M,HOANE A J,HSU F.Deep blue[J].Artificial Intelligence,2002,134(1/2):57-83.
[2] SCHAEFFER J.Game Over:Black to Play and Draw in Che-ckers[J].ICGA Journal,2007,30(4):187-197.
[3] SCHAEFFER J,BURCH N,BJRNSSON Y,et al.Checkers is solved[J].Science,2007,317(5844):1518-1522.
[4] BOUZY B,CAZENAVE T.Computer Go:an AI oriented survey[J].Artificial Intelligence,2001,132(1):39-103.
[5] COULOM R.The Monte-Carlo Revolution in Go[C]∥The Japa-nese-French Frontiers of Science Symposium(JFFoS 2008).Roscoff,France,2009.
[6] CHASLOT J B,et al.Progressive strategies for Monte-Carlotree search[J].New Mathematics and Natural Computation,2008,4(3):343-357.
[7] CHANG H S,FU M C,HU J,et al.An adaptive sampling algorithm for solving Markov decision processes[J].Operations Research,2005,53(1):126-139.
[8] KOCSIS L,SZEPESVRI C.Bandit based Monte-Carlo plan-ning[C]∥European Conference on Machine Learning.Springer Berlin Heidelberg,2006:282-293.
[9] HUANG J.Application and Enhancement of the UCT Algo-rithm in Computer Go[D].Beijing:Beijing University of Posts and Telecommunications,2011.(in Chinese) 黄晶.计算机围棋博弈中UCT算法的应用及改进[D].北京:北京邮电大学,2011.
[10] YU Y B.Computer Go Game Research Based on Monte Carlo Tree Search[D].Dalian:Dalian Maritime University,2015.(in Chinese) 于永波.基于蒙特卡洛树搜索的计算机围棋博弈研究[D].大连:大连海事大学,2015.
[11] ABRAMSON B,KORF R E.A Model of Two-Player Evaluation Functions[C]∥AAAI.1987:90-94.
[12] EGELLY S,WWANG Y.Exploration exploitation in Go:UCT for Monte-Carlo Go[C]∥NIPS:Neural Information Processing Systems Conference On-line trading of Exploration and Exploitation Workshop.2006.
[13] SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .