类似α的药物对对抗性扰动是否强大？

论文标题

类似α的药物对对抗性扰动是否强大？

Are AlphaZero-like Agents Robust to Adversarial Perturbations?

论文作者

Lan, Li-Cheng, Zhang, Huan, Wu, Ti-Rong, Tsai, Meng-Yu, Wu, I-Chen, Hsieh, Cho-Jui

论文摘要

Alphazero（AZ）的成功表明，基于神经网络的GO AIS可以超过人类绩效。鉴于GO的状态空间非常大，并且人类玩家可以从任何法律状态玩游戏，因此我们询问是否存在对抗状态的GO AIS，这可能会导致他们发挥出令人惊讶的错误行动。在本文中，我们首先将对抗性示例的概念扩展到GO的游戏：我们通过在游戏中添加毫无意义的动作来产生``语义上'''''上等同于原始状态的状态，而对抗性状态是一种扰动的状态，这无疑导致了一个明显的动作，这对于Go Go的起步者来说都是显而易见的。但是，由于较大，离散且非不同的搜索空间，搜索对抗状态是具有挑战性的。为了应对这一挑战，我们开发了对GO AI的第一次对抗性攻击，可以通过策略性地减少搜索空间来有效地寻找对抗状态。此方法也可以扩展到其他棋盘游戏，例如Nogo。在实验上，我们表明，可以通过添加一个或两个毫无意义的石头来误导策略价值神经网络（PV-NN）和蒙特卡洛树搜索（MCT）所采取的行动；例如，在58％的Alphago Zero自我游戏游戏中，我们的方法可以使使用50个MCT模拟的广泛使用的Katago代理通过添加两个毫无意义的石头来扮演失败的动作。我们还评估了我们的算法与业余人类GO参与者发现的对抗性示例，而90 \％的例子确实导致Go Agent发挥明显的劣等作用。我们的代码可在\ url {https://papercode.cc/goattack}上找到。

The success of AlphaZero (AZ) has demonstrated that neural-network-based Go AIs can surpass human performance by a large margin. Given that the state space of Go is extremely large and a human player can play the game from any legal state, we ask whether adversarial states exist for Go AIs that may lead them to play surprisingly wrong actions. In this paper, we first extend the concept of adversarial examples to the game of Go: we generate perturbed states that are ``semantically'' equivalent to the original state by adding meaningless moves to the game, and an adversarial state is a perturbed state leading to an undoubtedly inferior action that is obvious even for Go beginners. However, searching the adversarial state is challenging due to the large, discrete, and non-differentiable search space. To tackle this challenge, we develop the first adversarial attack on Go AIs that can efficiently search for adversarial states by strategically reducing the search space. This method can also be extended to other board games such as NoGo. Experimentally, we show that the actions taken by both Policy-Value neural network (PV-NN) and Monte Carlo tree search (MCTS) can be misled by adding one or two meaningless stones; for example, on 58\% of the AlphaGo Zero self-play games, our method can make the widely used KataGo agent with 50 simulations of MCTS plays a losing action by adding two meaningless stones. We additionally evaluated the adversarial examples found by our algorithm with amateur human Go players and 90\% of examples indeed lead the Go agent to play an obviously inferior action. Our code is available at \url{https://PaperCode.cc/GoAttack}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题