统一特征归因和反事实解释：与同一目的的不同方式

论文标题

统一特征归因和反事实解释：与同一目的的不同方式

Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End

论文作者

Mothilal, Ramaravind Kommiya, Mahajan, Divyat, Tan, Chenhao, Sharma, Amit

论文摘要

特征归因和反事实解释是解释ML模型的流行方法。前者为每个输入功能分配了重要的得分，而后者则提供了具有最小更改的输入示例，以改变模型的预测。为了统一这些方法，我们根据实际因果关系框架提供了一种解释，并在其使用方面提出了两个关键结果。首先，我们提出了一种从一组反事实示例中生成特征归因说明的方法。这些功能归因传达了功能对更改模型的分类结果的重要性，尤其是在该更改中是否需要和/或足够的一部分特征，哪种基于归因的方法无法提供。其次，我们展示了如何使用反事实示例来评估基于归因的说明的优点，从而根据其必要性和充分性来评估。结果，我们强调了这两种方法的互补性。我们对三个基准数据集的评估 - 成人收入，LendingClub和德国学位 - 证实了互补性。特征归因方法，例如石灰和摇摆和反事实解释方法，例如Wachter等。骰子通常不同意特征的重要性排名。此外，通过限制可以修改以生成反事实示例的功能，我们发现，石灰或外壳中的顶级功能通常既不是必要的，也不是对模型预测的充分解释。最后，我们介绍了关于现实世界医院分类问题的不同解释方法的案例研究

Feature attributions and counterfactual explanations are popular approaches to explain a ML model. The former assigns an importance score to each input feature, while the latter provides input examples with minimal changes to alter the model's predictions. To unify these approaches, we provide an interpretation based on the actual causality framework and present two key results in terms of their use. First, we present a method to generate feature attribution explanations from a set of counterfactual examples. These feature attributions convey how important a feature is to changing the classification outcome of a model, especially on whether a subset of features is necessary and/or sufficient for that change, which attribution-based methods are unable to provide. Second, we show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency. As a result, we highlight the complementarity of these two approaches. Our evaluation on three benchmark datasets - Adult-Income, LendingClub, and German-Credit - confirms the complementarity. Feature attribution methods like LIME and SHAP and counterfactual explanation methods like Wachter et al. and DiCE often do not agree on feature importance rankings. In addition, by restricting the features that can be modified for generating counterfactual examples, we find that the top-k features from LIME or SHAP are often neither necessary nor sufficient explanations of a model's prediction. Finally, we present a case study of different explanation methods on a real-world hospital triage problem

下载PDF全文

下载文献需遵守相关版权规定

论文标题