用于批处理增强学习的软弹性算法

论文标题

用于批处理增强学习的软弹性算法

Soft-Robust Algorithms for Batch Reinforcement Learning

论文作者

Lobo, Elita A., Ghavamzadeh, Mohammad, Petrik, Marek

论文摘要

在加强学习中，通常通过优化百分位标准来计算有关高风险决策问题的强大策略，从而最大程度地降低了灾难性失败的可能性。不幸的是，此类政策通常过于保守，因为百分位标准是非凸面，难以优化的，并且忽略了平均表现。为了克服这些缺点，我们研究了软性标准，该标准使用风险措施来更好地平衡平均值和百分位标准。在本文中，我们建立了软性标准的基本属性，表明它是优化的NP-HARD，并提出和分析两种算法以大致优化它。我们的理论分析和经验评估表明，我们的算法计算的保守解决方案远低于优化百分位标准的现有近似方法。

In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure. Unfortunately, such policies are typically overly conservative as the percentile criterion is non-convex, difficult to optimize, and ignores the mean performance. To overcome these shortcomings, we study the soft-robust criterion, which uses risk measures to balance the mean and percentile criterion better. In this paper, we establish the soft-robust criterion's fundamental properties, show that it is NP-hard to optimize, and propose and analyze two algorithms to approximately optimize it. Our theoretical analyses and empirical evaluations demonstrate that our algorithms compute much less conservative solutions than the existing approximate methods for optimizing the percentile-criterion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题