分布连续控制中分位数选择的不变性

论文标题

分布连续控制中分位数选择的不变性

Invariance to Quantile Selection in Distributional Continuous Control

论文作者

Grün, Felix, Saif-ur-Rehman, Muhammad, Glasmachers, Tobias, Iossifidis, Ioannis

论文摘要

近年来，分配强化学习产生了许多最先进的结果。随着时间的推移，已经开发出了越来越多的样本有效的分布算法，这些算法主要以它们参数化其价值分布的近似值以及它们如何量化这些分布之间的差异。在这项工作中，我们将这些算法（QR-DQN，IQN和FQF）中最知名和成功的三个转移到了连续的动作领域，通过将两种强大的参与者 - 批判性算法（TD3和SAC）扩展到分布批评家。我们研究了离散动作空间的方法的相对性能是否转化为连续情况。为此，我们将它们与一组连续控制任务的Pybullet实现进行经验进行比较。我们的结果表明，在确定性的连续作用设置中分布原子的数量和位置的定性不变性。

In recent years distributional reinforcement learning has produced many state of the art results. Increasingly sample efficient Distributional algorithms for the discrete action domain have been developed over time that vary primarily in the way they parameterize their approximations of value distributions, and how they quantify the differences between those distributions. In this work we transfer three of the most well-known and successful of those algorithms (QR-DQN, IQN and FQF) to the continuous action domain by extending two powerful actor-critic algorithms (TD3 and SAC) with distributional critics. We investigate whether the relative performance of the methods for the discrete action space translates to the continuous case. To that end we compare them empirically on the pybullet implementations of a set of continuous control tasks. Our results indicate qualitative invariance regarding the number and placement of distributional atoms in the deterministic, continuous action setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题