朝着视觉解释相似性模型

论文标题

朝着视觉解释相似性模型

Towards Visually Explaining Similarity Models

论文作者

Zheng, Meng, Karanam, Srikrishna, Chen, Terrence, Radke, Richard J., Wu, Ziyan

论文摘要

我们考虑了视觉上解释相似性模型的问题，即解释了为什么模型预测两个图像除了产生标量分数外将是相似的。尽管视觉模型可解释性的最新工作集中在基于梯度的注意力上，但这些方法依赖于分类模块来生成视觉解释。因此，他们无法轻易解释其他类型的模型，这些模型不使用或不需要类似分类的损失功能（例如，经过指标学习损失训练的相似性模型）。在这项工作中，我们弥合了这一关键差距，提出了一种为图像相似性预测变量而产生基于梯度的视觉注意力的方法。通过仅依靠学习的功能嵌入，我们表明我们的方法可以应用于任何基于CNN的相似性体系结构，这是迈向通用视觉解释性的重要一步。我们表明，我们由此产生的注意力图不仅可以解释性。可以将它们注入模型学习过程本身，并具有新的可训练约束。我们表明，所产生的相似性模型可以表现出来，并且可以在视觉上解释，比没有这些约束的相应基线模型更好。我们使用大量实验对三种不同的任务进行了广泛的实验：通用图像检索，人重新识别和低射击语义分割。

We consider the problem of visually explaining similarity models, i.e., explaining why a model predicts two images to be similar in addition to producing a scalar score. While much recent work in visual model interpretability has focused on gradient-based attention, these methods rely on a classification module to generate visual explanations. Consequently, they cannot readily explain other kinds of models that do not use or need classification-like loss functions (e.g., similarity models trained with a metric learning loss). In this work, we bridge this crucial gap, presenting a method to generate gradient-based visual attention for image similarity predictors. By relying solely on the learned feature embedding, we show that our approach can be applied to any kind of CNN-based similarity architecture, an important step towards generic visual explainability. We show that our resulting attention maps serve more than just interpretability; they can be infused into the model learning process itself with new trainable constraints. We show that the resulting similarity models perform, and can be visually explained, better than the corresponding baseline models trained without these constraints. We demonstrate our approach using extensive experiments on three different kinds of tasks: generic image retrieval, person re-identification, and low-shot semantic segmentation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题