用解释学习的框架

论文标题

用解释学习的框架

A Framework to Learn with Interpretation

论文作者

Parekh, Jayneel, Mozharovskyi, Pavlo, d'Alché-Buc, Florence

论文摘要

为了解决深度学习中的解释性，我们提出了一个新颖的框架，可以共同学习预测模型及其相关的解释模型。口译员以人为理解的高级属性函数的方式提供了有关预测模型的局部和全球可解释性，而准确性的损失最小。这是通过专门的体系结构和精选的正规化罚款来实现的。我们寻求一个小型词典的高级属性函数，将其作为输入选定的隐藏层的输出，并且其输出输出了线性分类器。我们用基于熵的标准对属性的激活施加了强烈的简洁性，同时对预测模型的输入和输出施加了保真度。还开发了一条可视化学习功能的详细管道。此外，除了通过设计生成可解释的模型外，我们的方法还可以专门为预训练的神经网络提供事后解释。我们在多个数据集上使用几种最先进的方法来验证我们的方法，并在两种任务上显示其功效。

To tackle interpretability in deep learning, we present a novel framework to jointly learn a predictive model and its associated interpretation model. The interpreter provides both local and global interpretability about the predictive model in terms of human-understandable high level attribute functions, with minimal loss of accuracy. This is achieved by a dedicated architecture and well chosen regularization penalties. We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers and whose outputs feed a linear classifier. We impose strong conciseness on the activation of attributes with an entropy-based criterion while enforcing fidelity to both inputs and outputs of the predictive model. A detailed pipeline to visualize the learnt features is also developed. Moreover, besides generating interpretable models by design, our approach can be specialized to provide post-hoc interpretations for a pre-trained neural network. We validate our approach against several state-of-the-art methods on multiple datasets and show its efficacy on both kinds of tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题