可证明的基于强大的显着性解释

论文标题

可证明的基于强大的显着性解释

Provable Robust Saliency-based Explanations

论文作者

Chen, Chao, Guo, Chenghua, Chen, Rufeng, Ma, Guixiang, Zeng, Ming, Liao, Xiangwen, Zhang, Xi, Xie, Sihong

论文摘要

为了促进对机器学习模型的信任，解释必须忠实且稳定，以保持一致的见解。现有的相关作品依赖于$ \ ell_p $距离用于稳定性评估，这与人类的感知不同。此外，与密集计算相关的现有对抗训练（AT）可能会导致军备竞赛。为了应对这些挑战，我们引入了一个新颖的指标，以评估顶级$ K $ SEATERIENT功能的稳定性。我们介绍了R2ET，该R2ET通过高效有效的正规器来训练稳定的解释，并通过多目标优化分析R2ET，以证明说明的数值和统计稳定性。此外，R2ET和认证鲁棒性之间的理论联系证明R2ET在所有攻击中的稳定性都是合理的。各种数据模式和模型体系结构的广泛实验表明，R2ET可以针对隐形攻击实现出色的稳定性，并在不同的解释方法上有效地概括了。

To foster trust in machine learning models, explanations must be faithful and stable for consistent insights. Existing relevant works rely on the $\ell_p$ distance for stability assessment, which diverges from human perception. Besides, existing adversarial training (AT) associated with intensive computations may lead to an arms race. To address these challenges, we introduce a novel metric to assess the stability of top-$k$ salient features. We introduce R2ET which trains for stable explanation by efficient and effective regularizer, and analyze R2ET by multi-objective optimization to prove numerical and statistical stability of explanations. Moreover, theoretical connections between R2ET and certified robustness justify R2ET's stability in all attacks. Extensive experiments across various data modalities and model architectures show that R2ET achieves superior stability against stealthy attacks, and generalizes effectively across different explanation methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题