论文标题

通过删除解释:模型解释的统一框架

Explaining by Removing: A Unified Framework for Model Explanation

论文作者

Covert, Ian, Lundberg, Scott, Lee, Su-In

论文摘要

研究人员提出了多种模型的解释方法,但尚不清楚大多数方法如何相关或一种方法比另一种方法更可取。我们描述了一种新的统一方法,基于删除的解释,这些方法基于模拟特征去除以量化每个特征的影响的原理。这些方法在几个方面有所不同,因此我们开发了一个沿三个维度表征每个方法的框架:1)该方法如何消除特征,2)该方法解释的模型行为以及3)该方法如何汇总每个特征的影响。我们的框架统一了26种现有方法,包括几种最广泛使用的方法:塑形,石灰,有意义的扰动和置换测试。这种新知识的解释方法具有丰富的连接,我们使用这些工具在很大程度上被解释性文献所忽略了。为了在认知心理学中基于锚定的解释解释,我们表明删除功能是简单的减法反事实推理的应用。合作游戏理论的想法阐明了不同方法之间的关系和权衡,我们得出了所有基于删除的解释都具有信息理论解释的条件。通过此分析,我们开发了一个统一的框架,可以帮助从业者更好地了解模型解释工具,并为未来的可解释性研究可以建立一个强大的理论基础。

Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We describe a new unified class of methods, removal-based explanations, that are based on the principle of simulating feature removal to quantify each feature's influence. These methods vary in several respects, so we develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 26 existing methods, including several of the most widely used approaches: SHAP, LIME, Meaningful Perturbations, and permutation tests. This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature. To anchor removal-based explanations in cognitive psychology, we show that feature removal is a simple application of subtractive counterfactual reasoning. Ideas from cooperative game theory shed light on the relationships and trade-offs among different methods, and we derive conditions under which all removal-based explanations have information-theoretic interpretations. Through this analysis, we develop a unified framework that helps practitioners better understand model explanation tools, and that offers a strong theoretical foundation upon which future explainability research can build.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源