论文标题

反对加强学习的自适应奖励促进攻击

Adaptive Reward-Poisoning Attacks against Reinforcement Learning

论文作者

Zhang, Xuezhou, Ma, Yuzhe, Singla, Adish, Zhu, Xiaojin

论文摘要

在针对强化学习(RL)的奖励攻击攻击中,攻击者可以在每个步骤中驱动环境奖励$ r_t $ to $ r_t+δ_t$,目的是强迫RL代理人学习邪恶的政策。我们根据$Δ_T$的无限 - 基因限制对此类攻击进行分类:我们提供了一个较低的阈值,低于该阈值,而奖励量攻击是不可行的,并且RL已证明是安全的;我们提供了相应的上阈值,该阈值在该攻击之上是可行的。可行的攻击可以进一步归类为非自动攻击,其中$Δ_T$仅取决于$(s_t,a_t,s_t,s_ {t+1})$,或自适应,其中$Δ_T$进一步取决于RL代理商在时间$ t $的学习过程。非自适应攻击一直是先前工作的重点。但是,我们表明,在温和的条件下,自适应攻击可以在状态空间大小$ | s | $中以多项式的步骤实现邪恶的政策,而非自适应攻击需要指数级步骤。我们提供了一个建设性的证据,表明快速自适应攻击策略达到了多项式率。最后,我们表明攻击者可以使用最先进的深度RL技术找到有效的奖励攻击攻击。

In reward-poisoning attacks against reinforcement learning (RL), an attacker can perturb the environment reward $r_t$ into $r_t+δ_t$ at each step, with the goal of forcing the RL agent to learn a nefarious policy. We categorize such attacks by the infinity-norm constraint on $δ_t$: We provide a lower threshold below which reward-poisoning attack is infeasible and RL is certified to be safe; we provide a corresponding upper threshold above which the attack is feasible. Feasible attacks can be further categorized as non-adaptive where $δ_t$ depends only on $(s_t,a_t, s_{t+1})$, or adaptive where $δ_t$ depends further on the RL agent's learning process at time $t$. Non-adaptive attacks have been the focus of prior works. However, we show that under mild conditions, adaptive attacks can achieve the nefarious policy in steps polynomial in state-space size $|S|$, whereas non-adaptive attacks require exponential steps. We provide a constructive proof that a Fast Adaptive Attack strategy achieves the polynomial rate. Finally, we show that empirically an attacker can find effective reward-poisoning attacks using state-of-the-art deep RL techniques.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源