在Carcassonne游戏中，使用语义启发的进化算法来进化树木的上限置信度范围

论文标题

在Carcassonne游戏中，使用语义启发的进化算法来进化树木的上限置信度范围

Evolving the MCTS Upper Confidence Bounds for Trees Using a Semantic-inspired Evolutionary Algorithm in the Game of Carcassonne

论文作者

Galván, Edgar, Simpson, Gavin, Ameneyro, Fred Valdez

论文摘要

蒙特卡洛树搜索（MCT）是一种搜索最佳决策的最佳先入点方法。 MCT的成功在很大程度上取决于树木的建造方式，并且选择过程在其中起着基本作用。被证明是可靠的一种特殊选择机制是基于树木（UCT）的上限置信度范围。 UCT试图通过考虑存储在MCT的统计树中的值来平衡探索和剥削。但是，对MCTS UCT的某些调整对于这是必要的。在这项工作中，我们使用进化算法（EAS）以替代UCT公式并在MCT中使用进化的表达式的目标进化数学表达式。更具体地说，我们通过在MCTS方法（SIEA-MCTS）中提出的语义启发的进化算法来发展表达式。这是受基因编程（GP）语义的启发，在该语义上，使用健身案例被视为在GP中采用的要求。健身病例通常用于确定个体的适应性，可用于计算个体的语义相似性（或差异）。但是，MCT中没有健身案例。我们通过使用MCT的多个奖励值来扩展此概念，从而使我们能够确定个人及其语义的适应性。通过这样做，我们展示了SIEA-MCT如何能够成功地发展数学表达式，而数学表达式与UCT相比，无需调整这些演变的表达式而产生更好或竞争的结果。我们比较了提出的SIEA-MCT与MCT算法，MCTS快速动作值估计算法的性能， * - 毫米算法的三种变体，一个随机控制器和另外两种EA方法。我们始终展示SIEA-MCT在Carcassonne具有挑战性的游戏中如何优于大多数这些智能控制者。

Monte Carlo Tree Search (MCTS) is a sampling best-first method to search for optimal decisions. The success of MCTS depends heavily on how the tree is built and the selection process plays a fundamental role in this. One particular selection mechanism that has proved to be reliable is based on the Upper Confidence Bounds for Trees (UCT). The UCT attempts to balance exploration and exploitation by considering the values stored in the statistical tree of the MCTS. However, some tuning of the MCTS UCT is necessary for this to work well. In this work, we use Evolutionary Algorithms (EAs) to evolve mathematical expressions with the goal to substitute the UCT formula and use the evolved expressions in MCTS. More specifically, we evolve expressions by means of our proposed Semantic-inspired Evolutionary Algorithm in MCTS approach (SIEA-MCTS). This is inspired by semantics in Genetic Programming (GP), where the use of fitness cases is seen as a requirement to be adopted in GP. Fitness cases are normally used to determine the fitness of individuals and can be used to compute the semantic similarity (or dissimilarity) of individuals. However, fitness cases are not available in MCTS. We extend this notion by using multiple reward values from MCTS that allow us to determine both the fitness of an individual and its semantics. By doing so, we show how SIEA-MCTS is able to successfully evolve mathematical expressions that yield better or competitive results compared to UCT without the need of tuning these evolved expressions. We compare the performance of the proposed SIEA-MCTS against MCTS algorithms, MCTS Rapid Action Value Estimation algorithms, three variants of the *-minimax family of algorithms, a random controller and two more EA approaches. We consistently show how SIEA-MCTS outperforms most of these intelligent controllers in the challenging game of Carcassonne.

下载PDF全文

下载文献需遵守相关版权规定

论文标题