论文标题
野生动物园:可解释性鲁棒性的多功能和高效评估
SAFARI: Versatile and Efficient Evaluations for Robustness of Interpretability
论文作者
论文摘要
深度学习的解释性(DL)是值得信赖的AI的障碍。尽管可以解释的AI(XAI)社区做出了巨大的努力,但解释缺乏鲁棒性 - 无法区分的输入扰动可能会导致不同的XAI结果。因此,考虑到XAI方法,评估DL可解释性的鲁棒性至关重要。在本文中,我们确定了最先进的几个挑战,无法集体应对:i)现有指标并不全面; ii)XAI技术是高度异质的; iii)误解通常是罕见的事件。为了应对这些挑战,我们介绍了两种黑框评估方法,涉及最严重的解释差异和分别对一般情况的概率概念。具有定制健身函数的遗传算法(GA)用于求解受约束的优化,以进行有效的最差评估。子集模拟(SS),用于估计罕见事件概率,用于评估整体鲁棒性。实验表明,我们方法的准确性,敏感性和效率优于最先进的方法。最后,我们演示了我们方法的两种应用:对强大的XAI方法进行排名,并选择训练方案以提高分类和解释鲁棒性。
Interpretability of Deep Learning (DL) is a barrier to trustworthy AI. Despite great efforts made by the Explainable AI (XAI) community, explanations lack robustness -- indistinguishable input perturbations may lead to different XAI results. Thus, it is vital to assess how robust DL interpretability is, given an XAI method. In this paper, we identify several challenges that the state-of-the-art is unable to cope with collectively: i) existing metrics are not comprehensive; ii) XAI techniques are highly heterogeneous; iii) misinterpretations are normally rare events. To tackle these challenges, we introduce two black-box evaluation methods, concerning the worst-case interpretation discrepancy and a probabilistic notion of how robust in general, respectively. Genetic Algorithm (GA) with bespoke fitness function is used to solve constrained optimisation for efficient worst-case evaluation. Subset Simulation (SS), dedicated to estimate rare event probabilities, is used for evaluating overall robustness. Experiments show that the accuracy, sensitivity, and efficiency of our methods outperform the state-of-the-arts. Finally, we demonstrate two applications of our methods: ranking robust XAI methods and selecting training schemes to improve both classification and interpretation robustness.