论文标题
关于基于模型的强化学习深度状态空间模型的不确定性
On Uncertainty in Deep State Space Models for Model-Based Reinforcement Learning
论文作者
论文摘要
改进的状态空间模型,例如经常性状态空间模型(RSSM),是基于模型的增强学习(RL)最新进展的关键因素。然而,尽管取得了经验的成功,但许多基本的设计选择尚未得到很好的理解。我们表明,RSSM使用次优的推理方案,并且使用该推理训练的模型高估了地面真实系统的不确定性。我们发现这种高估会隐含地正规化RSSM,并允许它们在基于模型的RL中取得成功。我们假设这种隐式正规化具有与明确建模认识不确定性相同的功能,这对于许多其他基于模型的RL方法至关重要。然而,在准确估计它至关重要的情况下,高估的不确定性也会损害性能,例如,当我们必须处理闭合,缺失观察或融合传感器方式时,在不同频率下。此外,隐式正则化是推理方案的副作用,而不是严格,有原则的配方的结果,这使得很难分析或改善RSSM。因此,我们提出了一种替代方法,建立了对众所周知的组成部分进行建模和认识论不确定性建模的组成部分,称为各种复发性卡尔曼网络(VRKN)。这种方法使用Kalman更新来在潜在空间和Monte Carlo辍学中进行精确的平滑推断,以模拟认知不确定性。由于Kalman的更新,VRKN自然可以处理缺失的观测值或传感器融合问题,并且每个时间步骤的观测值数量不同。我们的实验表明,使用VRKN代替RSSM可以改善在确定性标准基准中适当捕获不确定性的任务中的性能。
Improved state space models, such as Recurrent State Space Models (RSSMs), are a key factor behind recent advances in model-based reinforcement learning (RL). Yet, despite their empirical success, many of the underlying design choices are not well understood. We show that RSSMs use a suboptimal inference scheme and that models trained using this inference overestimate the aleatoric uncertainty of the ground truth system. We find this overestimation implicitly regularizes RSSMs and allows them to succeed in model-based RL. We postulate that this implicit regularization fulfills the same functionality as explicitly modeling epistemic uncertainty, which is crucial for many other model-based RL approaches. Yet, overestimating aleatoric uncertainty can also impair performance in cases where accurately estimating it matters, e.g., when we have to deal with occlusions, missing observations, or fusing sensor modalities at different frequencies. Moreover, the implicit regularization is a side-effect of the inference scheme and not the result of a rigorous, principled formulation, which renders analyzing or improving RSSMs difficult. Thus, we propose an alternative approach building on well-understood components for modeling aleatoric and epistemic uncertainty, dubbed Variational Recurrent Kalman Network (VRKN). This approach uses Kalman updates for exact smoothing inference in a latent space and Monte Carlo Dropout to model epistemic uncertainty. Due to the Kalman updates, the VRKN can naturally handle missing observations or sensor fusion problems with varying numbers of observations per time step. Our experiments show that using the VRKN instead of the RSSM improves performance in tasks where appropriately capturing aleatoric uncertainty is crucial while matching it in the deterministic standard benchmarks.