论文标题
可解释图像识别的概念美白
Concept Whitening for Interpretable Image Recognition
论文作者
论文摘要
当我们穿越这些层时,神经网络对概念的编码是什么?机器学习中的可解释性无疑很重要,但是神经网络的计算非常具有挑战性。试图在其隐藏层内看到其隐藏层可能是误导,无法使用的,或者依靠潜在空间具有可能没有的特性。在这项工作中,我们没有试图分析神经网络后,而是引入了一种称为概念美白(CW)的机制,以改变网络的给定层,以使我们能够更好地理解导致该层的计算。当将概念增白模块添加到CNN中时,潜在空间的轴与已知的感兴趣概念一致。通过实验,我们表明CW可以为我们提供对网络如何逐步学习概念的更清晰的了解。 CW是批准的批准层的替代方法,并且也使(白人)降低了潜在空间。 CW可以在网络的任何层中使用,而不会损害预测性能。
What does a neural network encode about a concept as we traverse through the layers? Interpretability in machine learning is undoubtedly important, but the calculations of neural networks are very challenging to understand. Attempts to see inside their hidden layers can either be misleading, unusable, or rely on the latent space to possess properties that it may not have. In this work, rather than attempting to analyze a neural network posthoc, we introduce a mechanism, called concept whitening (CW), to alter a given layer of the network to allow us to better understand the computation leading up to that layer. When a concept whitening module is added to a CNN, the axes of the latent space are aligned with known concepts of interest. By experiment, we show that CW can provide us a much clearer understanding for how the network gradually learns concepts over layers. CW is an alternative to a batch normalization layer in that it normalizes, and also decorrelates (whitens) the latent space. CW can be used in any layer of the network without hurting predictive performance.