部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Local Feature Swapping for Generalization in Reinforcement Learning

论文作者

Bertoin, David, Rachelson, Emmanuel

论文摘要

在过去的几年中，深度学习中的计算资源和研究的加速性导致了一系列任务，包括在计算机视觉中的一系列任务中取得了重大成功。在这些进步的基础上，强化学习还看到了能够直接从视觉观察做出决策的代理的出现的飞跃。尽管取得了这些成功，但神经体系结构的过度参数会导致记忆训练过程中使用的数据，从而导致缺乏概括。基于视觉输入的增强学习剂也通过错误地将奖励与无关的视觉特征（例如背景元素）相关联，也遭受了这种现象。为了减轻此问题，我们引入了一种新的正则化技术，该技术由特征图的频道一致的本地排列（CLOP）组成。提出的排列会导致空间相关性的鲁棒性，并有助于防止RL中的过度拟合行为。我们在OpenAI Procgen基准上证明，与使用其他最先进的正则化技术训练的代理相比，使用CLOP方法训练的RL代理对视觉变化和更好的概括性能表现出鲁棒性。我们还证明了CLOP作为监督学习中的一般正则化技术的有效性。

Over the past few years, the acceleration of computing resources and research in deep learning has led to significant practical successes in a range of tasks, including in particular in computer vision. Building on these advances, reinforcement learning has also seen a leap forward with the emergence of agents capable of making decisions directly from visual observations. Despite these successes, the over-parametrization of neural architectures leads to memorization of the data used during training and thus to a lack of generalization. Reinforcement learning agents based on visual inputs also suffer from this phenomenon by erroneously correlating rewards with unrelated visual features such as background elements. To alleviate this problem, we introduce a new regularization technique consisting of channel-consistent local permutations (CLOP) of the feature maps. The proposed permutations induce robustness to spatial correlations and help prevent overfitting behaviors in RL. We demonstrate, on the OpenAI Procgen Benchmark, that RL agents trained with the CLOP method exhibit robustness to visual changes and better generalization properties than agents trained using other state-of-the-art regularization techniques. We also demonstrate the effectiveness of CLOP as a general regularization technique in supervised learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题