通过自举的机会性课程进行强大的深入增强学习

论文标题

通过自举的机会性课程进行强大的深入增强学习

Robust Deep Reinforcement Learning through Bootstrapped Opportunistic Curriculum

论文作者

Wu, Junlin, Vorobeychik, Yevgeniy

论文摘要

尽管深度强化学习取得了长足的进步，但已被证明它非常容易受到对国家观察的逆向扰动的影响。尽管如此，最近试图改善强化学习的对抗性鲁棒性的努力仍然可以容忍非常小的扰动，并且随着扰动大小的增加而保持脆弱。我们提出了自举的机会性对抗性课程学习（BCL），这是一种新型的灵活的对抗性课程学习框架，用于强大的增强学习。我们的框架结合了两个想法：保守地自动化每个课程阶段以及从上一个阶段的多个运行中获得的最高质量解决方案，并在课程中进行了机会主义跳过。在我们的实验中，我们表明所提出的BCL框架可以使学到的政策的鲁棒性显着改善，从而对对抗性扰动。最大的改进是乒乓球，我们的框架在最多25/255的扰动中产生了稳健性。相比之下，最好的现有方法只能忍受高达5/255的对抗噪声。我们的代码可在以下网址提供：https：//github.com/jlwu002/bcl。

Despite considerable advances in deep reinforcement learning, it has been shown to be highly vulnerable to adversarial perturbations to state observations. Recent efforts that have attempted to improve adversarial robustness of reinforcement learning can nevertheless tolerate only very small perturbations, and remain fragile as perturbation size increases. We propose Bootstrapped Opportunistic Adversarial Curriculum Learning (BCL), a novel flexible adversarial curriculum learning framework for robust reinforcement learning. Our framework combines two ideas: conservatively bootstrapping each curriculum phase with highest quality solutions obtained from multiple runs of the previous phase, and opportunistically skipping forward in the curriculum. In our experiments we show that the proposed BCL framework enables dramatic improvements in robustness of learned policies to adversarial perturbations. The greatest improvement is for Pong, where our framework yields robustness to perturbations of up to 25/255; in contrast, the best existing approach can only tolerate adversarial noise up to 5/255. Our code is available at: https://github.com/jlwu002/BCL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题