有效的奖励中毒攻击在线深入强化学习

论文标题

有效的奖励中毒攻击在线深入强化学习

Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning

论文作者

Xu, Yinglun, Zeng, Qi, Singh, Gagandeep

论文摘要

我们研究奖励对在线深入强化学习（DRL）的奖励中毒攻击，攻击者不理会代理商使用的学习算法和环境的动态。我们通过设计一个通用的黑盒奖励中毒框架，称为“对抗MDP攻击”，证明了最先进的DRL算法的内在脆弱性。我们实例化框架来构建两项新的攻击，这些攻击只会破坏总培训时间段的一小部分的回报，并使代理商学习一项低表现的政策。我们对攻击效率进行理论分析，并进行广泛的经验评估。我们的结果表明，我们的攻击有效地有毒剂在几种流行的古典控制和穆约科克环境中学习，并具有各种最先进的DRL算法，例如DQN，PPO，SAC，等等。

We study reward poisoning attacks on online deep reinforcement learning (DRL), where the attacker is oblivious to the learning algorithm used by the agent and the dynamics of the environment. We demonstrate the intrinsic vulnerability of state-of-the-art DRL algorithms by designing a general, black-box reward poisoning framework called adversarial MDP attacks. We instantiate our framework to construct two new attacks which only corrupt the rewards for a small fraction of the total training timesteps and make the agent learn a low-performing policy. We provide a theoretical analysis of the efficiency of our attack and perform an extensive empirical evaluation. Our results show that our attacks efficiently poison agents learning in several popular classical control and MuJoCo environments with a variety of state-of-the-art DRL algorithms, such as DQN, PPO, SAC, etc.

下载PDF全文

下载文献需遵守相关版权规定

论文标题