计算机科学 ›› 2018, Vol. 45 ›› Issue (1): 140-143.doi: 10.11896/j.issn.1002-137X.2018.01.023

• 第十六届中国机器学习会议 • 上一篇    下一篇

双人博弈问题中的蒙特卡洛树搜索算法的改进

季辉,丁泽军   

  1. 中国科学技术大学物理系 合肥230026,中国科学技术大学物理系 合肥230026
  • 出版日期:2018-01-15 发布日期:2018-11-13

Improvement of Monte Carlo Tree Search Algorithm in Two-person Game Problem

JI Hui and DING Ze-jun   

  • Online:2018-01-15 Published:2018-11-13

摘要: 蒙特卡洛树搜索(MCTS)是一种针对决策类博弈游戏,运用蒙特卡洛模拟方法进行评估博弈策略的启发式搜索算法。但是,在面对计算机围棋这种复杂的决策过程时,简单的蒙特卡洛树搜索过程往往由于计算量大,收敛速度非常慢。 由于双人博弈游戏中的蒙特卡洛树搜索不能收敛于双人博弈的最佳决策策略,因此提出蒙特卡洛树搜索结合极大极小值算法的改进算法,使得搜索结果不会因为蒙特卡洛方法的随机性而失真。为了进一步提高复杂双人博弈游戏中搜索算法的计算效率,还结合了几种常见的剪枝策略。实验结果说明,所提算法显著改进了蒙特卡洛树搜索的准确性和效率。

关键词: 蒙特卡洛树搜索,剪枝策略,双人博弈问题

Abstract: Monte Carlo tree search (MCTS) is a heuristic search algorithm for some kinds of decision processes,most notably those employed in game play.In the decision process of a very complex game,like a computer Go game,the basic Monte Carlo tree search method converges very slowly due to large computation cost.This article indicated that MCTS cannot converge to the best strategy of the two-person complete information games.Therefore,this article proposed a new search algorithm which combines MCTS with the min-max algorithm in order to avoid the failure due to the randomness of Monte Carlo method.For further improving computation efficiency of MTCS in complex two-person games,this article also considered to employ some progressive pruning strategies.An experimental test shows that the new algorithm significantly improves the accuracy and efficiency of MCTS.

Key words: Monte Carlo tree search,Progressive pruning,Two-person games

[1] CAMPBELL M,HOANE A J,HSU F.Deep blue[J].Artificial Intelligence,2002,134(1/2):57-83.
[2] SCHAEFFER J.Game Over:Black to Play and Draw in Che-ckers[J].ICGA Journal,2007,30(4):187-197.
[3] SCHAEFFER J,BURCH N,BJRNSSON Y,et al.Checkers is solved[J].Science,2007,317(5844):1518-1522.
[4] BOUZY B,CAZENAVE T.Computer Go:an AI oriented survey[J].Artificial Intelligence,2001,132(1):39-103.
[5] COULOM R.The Monte-Carlo Revolution in Go[C]∥The Japa-nese-French Frontiers of Science Symposium(JFFoS 2008).Roscoff,France,2009.
[6] CHASLOT J B,et al.Progressive strategies for Monte-Carlotree search[J].New Mathematics and Natural Computation,2008,4(3):343-357.
[7] CHANG H S,FU M C,HU J,et al.An adaptive sampling algorithm for solving Markov decision processes[J].Operations Research,2005,53(1):126-139.
[8] KOCSIS L,SZEPESVRI C.Bandit based Monte-Carlo plan-ning[C]∥European Conference on Machine Learning.Springer Berlin Heidelberg,2006:282-293.
[9] HUANG J.Application and Enhancement of the UCT Algo-rithm in Computer Go[D].Beijing:Beijing University of Posts and Telecommunications,2011.(in Chinese) 黄晶.计算机围棋博弈中UCT算法的应用及改进[D].北京:北京邮电大学,2011.
[10] YU Y B.Computer Go Game Research Based on Monte Carlo Tree Search[D].Dalian:Dalian Maritime University,2015.(in Chinese) 于永波.基于蒙特卡洛树搜索的计算机围棋博弈研究[D].大连:大连海事大学,2015.
[11] ABRAMSON B,KORF R E.A Model of Two-Player Evaluation Functions[C]∥AAAI.1987:90-94.
[12] EGELLY S,WWANG Y.Exploration exploitation in Go:UCT for Monte-Carlo Go[C]∥NIPS:Neural Information Processing Systems Conference On-line trading of Exploration and Exploitation Workshop.2006.
[13] SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!