强化学习中的本地限制表示

论文标题

强化学习中的本地限制表示

Locally Constrained Representations in Reinforcement Learning

论文作者

Nath, Somjit, Arora, Rushiv, Kahou, Samira Ebrahimi

论文摘要

强化学习（RL）的成功在很大程度上取决于从环境观察中学习强大表示的能力。在大多数情况下，根据价值功能的变化，在各个州纯粹通过强化学习损失所学的表示形式可能会有很大差异。但是，所学的表示形式不必非常具体地针对手头的任务。仅依靠RL目标可能会产生在连续的时间步骤中变化很大的表示形式。此外，由于RL损失的目标变化，因此所学的表示将取决于当前价值/策略的良好。因此，从主要任务中删除表示形式将使他们不仅可以专注于特定于任务的功能，还可以关注环境动态。为此，我们提出了本地约束的表示，辅助损失迫使状态表示可以通过邻近状态的代表来预测。这鼓励表示形式不仅是由价值/政策学习来驱动的，而且还通过额外的损失来驱动该表示形式，从而限制了对价值损失的表示形式。我们在几个已知的基准上评估了所提出的方法，并观察到强劲的性能。尤其是在连续控制任务中，我们的实验显示出显着的性能改善。

The success of Reinforcement Learning (RL) heavily relies on the ability to learn robust representations from the observations of the environment. In most cases, the representations learned purely by the reinforcement learning loss can differ vastly across states depending on how the value functions change. However, the representations learned need not be very specific to the task at hand. Relying only on the RL objective may yield representations that vary greatly across successive time steps. In addition, since the RL loss has a changing target, the representations learned would depend on how good the current values/policies are. Thus, disentangling the representations from the main task would allow them to focus not only on the task-specific features but also the environment dynamics. To this end, we propose locally constrained representations, where an auxiliary loss forces the state representations to be predictable by the representations of the neighboring states. This encourages the representations to be driven not only by the value/policy learning but also by an additional loss that constrains the representations from over-fitting to the value loss. We evaluate the proposed method on several known benchmarks and observe strong performance. Especially in continuous control tasks, our experiments show a significant performance improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题