离散增强学习中XCSF压实的最佳分析

论文标题

离散增强学习中XCSF压实的最佳分析

Optimality-based Analysis of XCSF Compaction in Discrete Reinforcement Learning

论文作者

Bishop, Jordan T., Gallagher, Marcus

论文摘要

学习分类器系统（LCSS）是基于人群的预测系统，最初被视为在增强学习（RL）环境中起作用的代理。这些系统可能会遭受种群膨胀，因此可以适合试图在人口规模和绩效之间取得平衡的压实技术。 XCSF是一个良好的LCS体系结构，它在RL设置中充当Q功能近似器。我们将XCSF应用于OpenAI体育馆的FrozenLake8x8环境的确定性和随机变体，其性能在功能近似误差和策略准确性方面与通过动态编程求解环境产生的最佳Q-函数和策略。然后，我们引入了一种新型的压实算法（贪婪的利基质量压实-GNMC），并研究了其在XCSF训练有素的人群上的操作。结果表明，鉴于合适的参数化，GNMC保留甚至略微改善了功能近似误差，同时大大减少了人口大小。还会出现合理的政策准确性保存，我们将此指标与类似迷宫的环境中常用的步骤对目标指标联系起来，这说明了指标是如何互补而不是竞争性的。

Learning classifier systems (LCSs) are population-based predictive systems that were originally envisioned as agents to act in reinforcement learning (RL) environments. These systems can suffer from population bloat and so are amenable to compaction techniques that try to strike a balance between population size and performance. A well-studied LCS architecture is XCSF, which in the RL setting acts as a Q-function approximator. We apply XCSF to a deterministic and stochastic variant of the FrozenLake8x8 environment from OpenAI Gym, with its performance compared in terms of function approximation error and policy accuracy to the optimal Q-functions and policies produced by solving the environments via dynamic programming. We then introduce a novel compaction algorithm (Greedy Niche Mass Compaction - GNMC) and study its operation on XCSF's trained populations. Results show that given a suitable parametrisation, GNMC preserves or even slightly improves function approximation error while yielding a significant reduction in population size. Reasonable preservation of policy accuracy also occurs, and we link this metric to the commonly used steps-to-goal metric in maze-like environments, illustrating how the metrics are complementary rather than competitive.

下载PDF全文

下载文献需遵守相关版权规定

论文标题