解开层次增强学习的因果效应

论文标题

解开层次增强学习的因果效应

Disentangling causal effects for hierarchical reinforcement learning

论文作者

Corcoll, Oriol, Vicente, Raul

论文摘要

稀疏奖励下的探索和信用分配仍然是具有挑战性的问题。我们认为，这些挑战的部分原因是在行动水平上运行的固有刚度。动作可以精确定义如何执行活动，但不适合描述执行什么活动。取而代之的是，因果效应本质上是可以综合的，并且在时间上是抽象的，使其非常适合描述任务。通过利用因果关系的层次结构，本研究旨在加快特定于任务的行为和援助探索的学习。从因果文献中借用反事实和正常性措施，我们将可控制的效果与环境其他动态造成的效果相关。我们提出了CEHRL，这是一种层次方法，该方法使用变异自动编码器对可控效应的分布进行建模。高级策略使用此分布来1）通过随机效应探索探索环境，以便不断发现和学习新的效果，以及2）通过优先提出最大化给定奖励功能的效果来学习特定于任务的行为。与随机动作进行探索相比，实验结果表明，随机效应探索是一种更有效的机制，并且通过将信贷分配给几乎没有效果而不是许多动作，CEHRL可以更快地学习任务。

Exploration and credit assignment under sparse rewards are still challenging problems. We argue that these challenges arise in part due to the intrinsic rigidity of operating at the level of actions. Actions can precisely define how to perform an activity but are ill-suited to describe what activity to perform. Instead, causal effects are inherently composable and temporally abstract, making them ideal for descriptive tasks. By leveraging a hierarchy of causal effects, this study aims to expedite the learning of task-specific behavior and aid exploration. Borrowing counterfactual and normality measures from causal literature, we disentangle controllable effects from effects caused by other dynamics of the environment. We propose CEHRL, a hierarchical method that models the distribution of controllable effects using a Variational Autoencoder. This distribution is used by a high-level policy to 1) explore the environment via random effect exploration so that novel effects are continuously discovered and learned, and to 2) learn task-specific behavior by prioritizing the effects that maximize a given reward function. In comparison to exploring with random actions, experimental results show that random effect exploration is a more efficient mechanism and that by assigning credit to few effects rather than many actions, CEHRL learns tasks more rapidly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题