反对加强学习的自适应奖励促进攻击

论文标题

反对加强学习的自适应奖励促进攻击

Adaptive Reward-Poisoning Attacks against Reinforcement Learning

论文作者

Zhang, Xuezhou, Ma, Yuzhe, Singla, Adish, Zhu, Xiaojin

论文摘要

在针对强化学习（RL）的奖励攻击攻击中，攻击者可以在每个步骤中驱动环境奖励$ r_t $ to $ r_t+δ_t$，目的是强迫RL代理人学习邪恶的政策。我们根据$Δ_T$的无限 - 基因限制对此类攻击进行分类：我们提供了一个较低的阈值，低于该阈值，而奖励量攻击是不可行的，并且RL已证明是安全的；我们提供了相应的上阈值，该阈值在该攻击之上是可行的。可行的攻击可以进一步归类为非自动攻击，其中$Δ_T$仅取决于$（s_t，a_t，s_t，s_ {t+1}）$，或自适应，其中$Δ_T$进一步取决于RL代理商在时间$ t $的学习过程。非自适应攻击一直是先前工作的重点。但是，我们表明，在温和的条件下，自适应攻击可以在状态空间大小$ | s | $中以多项式的步骤实现邪恶的政策，而非自适应攻击需要指数级步骤。我们提供了一个建设性的证据，表明快速自适应攻击策略达到了多项式率。最后，我们表明攻击者可以使用最先进的深度RL技术找到有效的奖励攻击攻击。

In reward-poisoning attacks against reinforcement learning (RL), an attacker can perturb the environment reward $r_t$ into $r_t+δ_t$ at each step, with the goal of forcing the RL agent to learn a nefarious policy. We categorize such attacks by the infinity-norm constraint on $δ_t$: We provide a lower threshold below which reward-poisoning attack is infeasible and RL is certified to be safe; we provide a corresponding upper threshold above which the attack is feasible. Feasible attacks can be further categorized as non-adaptive where $δ_t$ depends only on $(s_t,a_t, s_{t+1})$, or adaptive where $δ_t$ depends further on the RL agent's learning process at time $t$. Non-adaptive attacks have been the focus of prior works. However, we show that under mild conditions, adaptive attacks can achieve the nefarious policy in steps polynomial in state-space size $|S|$, whereas non-adaptive attacks require exponential steps. We provide a constructive proof that a Fast Adaptive Attack strategy achieves the polynomial rate. Finally, we show that empirically an attacker can find effective reward-poisoning attacks using state-of-the-art deep RL techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题