论文标题
机器学习的因果解释性 - 问题,方法和评估
Causal Interpretability for Machine Learning -- Problems, Methods and Evaluation
论文作者
论文摘要
机器学习模型在无数应用程序中都具有明显的成就。但是,这些模型中的大多数都是黑盒,这是晦涩难懂的决策是如何由它们做出的。这使得模型不可靠和不信任。为了洞悉这些模型的决策过程,已经提出了多种传统的可解释模型。此外,为了产生更具人为友好的解释,有关解释性的最新工作试图回答与因果关系有关的问题,例如“为什么该模型做出这样的决定?”或“是导致模型决定的特定功能吗?”。在这项工作中,旨在回答因果问题的模型称为因果解释模型。现有的调查涵盖了传统解释性的概念和方法。在这项工作中,我们从问题和方法的各个方面介绍了有关因果解释模型的全面调查。此外,这项调查还提供了对现有评估指标的深入见解,以衡量可解释性,这可以帮助从业人员了解每个评估指标适合的情况。
Machine learning models have had discernible achievements in a myriad of applications. However, most of these models are black-boxes, and it is obscure how the decisions are made by them. This makes the models unreliable and untrustworthy. To provide insights into the decision making processes of these models, a variety of traditional interpretable models have been proposed. Moreover, to generate more human-friendly explanations, recent work on interpretability tries to answer questions related to causality such as "Why does this model makes such decisions?" or "Was it a specific feature that caused the decision made by the model?". In this work, models that aim to answer causal questions are referred to as causal interpretable models. The existing surveys have covered concepts and methodologies of traditional interpretability. In this work, we present a comprehensive survey on causal interpretable models from the aspects of the problems and methods. In addition, this survey provides in-depth insights into the existing evaluation metrics for measuring interpretability, which can help practitioners understand for what scenarios each evaluation metric is suitable.