用于增强学习的光谱分解表示

论文标题

用于增强学习的光谱分解表示

Spectral Decomposition Representation for Reinforcement Learning

论文作者

Ren, Tongzheng, Zhang, Tianjun, Lee, Lisa, Gonzalez, Joseph E., Schuurmans, Dale, Dai, Bo

论文摘要

表示学习通常通过管理维度的诅咒在加强学习中起关键作用。代表性的算法类别利用了随机过渡动力学的光谱分解，以构建在理想化的环境中具有强大理论特性的表示。但是，当前的光谱方法的适用性有限，因为它们是用于仅国家的聚合并源自策略依赖性过渡内核的，而无需考虑勘探问题。为了解决这些问题，我们提出了一种替代光谱分解表示（SPEDER），该方法从动力学中提取了国家行动抽象，而不会引起对数据收集策略的伪造依赖，同时还可以平衡学习过程中学习过程中的探索 - 解释折算。理论分析确定了在线和离线设置中所提出的算法的样本效率。此外，一项实验研究表明，在几个基准测试中，比当前的最新算法表现出色。

Representation learning often plays a critical role in reinforcement learning by managing the curse of dimensionality. A representative class of algorithms exploits a spectral decomposition of the stochastic transition dynamics to construct representations that enjoy strong theoretical properties in an idealized setting. However, current spectral methods suffer from limited applicability because they are constructed for state-only aggregation and derived from a policy-dependent transition kernel, without considering the issue of exploration. To address these issues, we propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy, while also balancing the exploration-versus-exploitation trade-off during learning. A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings. In addition, an experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题