通过自适应的对抗性训练，强大的增强学习是一款Stackelberg游戏

论文标题

通过自适应的对抗性训练，强大的增强学习是一款Stackelberg游戏

Robust Reinforcement Learning as a Stackelberg Game via Adaptively-Regularized Adversarial Training

论文作者

Huang, Peide, Xu, Mengdi, Fang, Fei, Zhao, Ding

论文摘要

强大的增强学习（RL）着重于改善模型错误或对抗性攻击下的性能，这有助于RL代理的现实部署。强大的对抗强化学习（RARL）是RL最受欢迎的框架之一。但是，大多数现有的文献模型均以零和同时游戏为零和纳什均衡作为解决方案概念，可以忽略RL部署的顺序性质，产生过度保守的代理，并引起训练不稳定。在本文中，我们介绍了一种新颖的层次结构式RL（一种称为RRL -actack的通用stackelberg游戏模型），以形式化顺序性质，并为强大的训练提供了额外的灵活性。我们开发了Stackelberg策略梯度算法来解决RRL堆栈，通过考虑对手的响应来利用Stackelberg学习动态。我们的方法产生了挑战但可解决的对抗环境，这些环境使RL代理的强大学习受益。我们的算法表明，在单权机器人控制和多机科高速公路合并任务中，针对不同测试条件的训练稳定性和鲁棒性。

Robust Reinforcement Learning (RL) focuses on improving performances under model errors or adversarial attacks, which facilitates the real-life deployment of RL agents. Robust Adversarial Reinforcement Learning (RARL) is one of the most popular frameworks for robust RL. However, most of the existing literature models RARL as a zero-sum simultaneous game with Nash equilibrium as the solution concept, which could overlook the sequential nature of RL deployments, produce overly conservative agents, and induce training instability. In this paper, we introduce a novel hierarchical formulation of robust RL - a general-sum Stackelberg game model called RRL-Stack - to formalize the sequential nature and provide extra flexibility for robust training. We develop the Stackelberg Policy Gradient algorithm to solve RRL-Stack, leveraging the Stackelberg learning dynamics by considering the adversary's response. Our method generates challenging yet solvable adversarial environments which benefit RL agents' robust learning. Our algorithm demonstrates better training stability and robustness against different testing conditions in the single-agent robotics control and multi-agent highway merging tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题