论文标题
陷入前门的后门:适得其反的多代理后门攻击
Backdoors Stuck At The Frontdoor: Multi-Agent Backdoor Attacks That Backfire
论文作者
论文摘要
合作学习和外包数据收集的恶意代理人威胁着清洁模型的培训。后门攻击是攻击者在训练过程中为成功实现有针对性错误分类的训练中的模型,这是训练时间鲁棒性的主要问题。在本文中,我们调查了多代理后门攻击方案,其中多个攻击者试图同时对受害者模型进行后门。在广泛的游戏中观察到一致的反击现象,在这些游戏中,代理商的集体攻击成功率低。我们检查了不同模式的后门攻击配置,非配合 /合作,联合分配变化和游戏设置,以返回下边界的平衡攻击成功率。结果激发了对实际环境的后门防御研究的重新评估。
Malicious agents in collaborative learning and outsourced data collection threaten the training of clean models. Backdoor attacks, where an attacker poisons a model during training to successfully achieve targeted misclassification, are a major concern to train-time robustness. In this paper, we investigate a multi-agent backdoor attack scenario, where multiple attackers attempt to backdoor a victim model simultaneously. A consistent backfiring phenomenon is observed across a wide range of games, where agents suffer from a low collective attack success rate. We examine different modes of backdoor attack configurations, non-cooperation / cooperation, joint distribution shifts, and game setups to return an equilibrium attack success rate at the lower bound. The results motivate the re-evaluation of backdoor defense research for practical environments.