面向机器博弈的即时差分学习研究

计算机科学 ›› 2010, Vol. 37 ›› Issue (8): 219-223.

面向机器博弈的即时差分学习研究

徐长明,马宗民,徐心和,李新星

(东北大学信息科学与工程学院沈阳110004)

出版日期:2018-12-01 发布日期:2018-12-01
基金资助:
本文受国家自然科学基金项目(60873010)，国家自然科学基金项目(60774097)资助。

Study of Temporal Difference Learning in Computer Games

XU Chang-ming,MA Zong-min,XU Xin-he,LI Xin-xing

Online:2018-12-01 Published:2018-12-01

摘要/Abstract

摘要： 以六子棋机器博弈为应用背景，实现了基于即时差分学习的估值函数权值调整自动化。提出了一种新的估值函数设计方案，解决了先验知识与多层神经元网络结合的问题。结合具体应用对象的特性，提出了对即时差分序列进行选择性学习的方法，在一定程度上避免了无用状态的干扰。经过10020盘的自学习训练，与同一个程序对弈，其胜率提高了8%左右，具有良好的效果。

关键词: 机器博弈，即时差分学习，六子棋

Abstract: Temporal Difference (Abbr. TD) learning algorithm was used to adjust weights of evaluation function by using Connect6 game as testbed in this paper,which makes the weights adjustment process can be done automatically. A new evaluation scheme was proposed,which can solve the difficult to combine the prior knowledge and multi-layer neural network organically. On account of the specific application,the method selecting part of the whole TD sectuence to learn was proposed, by which the interference of useless states is prevented to a certain extent. After 10020 self-learning training, the winning rate is increased with 8 % around against the same Connect6-playing program, which is a good result.

Key words: Computer games,Temporal difference learning,Connect6

徐长明,马宗民,徐心和,李新星. 面向机器博弈的即时差分学习研究[J]. 计算机科学, 2010, 37(8): 219-223. https://doi.org/

XU Chang-ming,MA Zong-min,XU Xin-he,LI Xin-xing. Study of Temporal Difference Learning in Computer Games[J]. Computer Science, 2010, 37(8): 219-223. https://doi.org/

参考文献

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed