论文标题
在自然图像中素描引导的对象本地化
Sketch-Guided Object Localization in Natural Images
论文作者
论文摘要
我们介绍了通过草图查询将对象(在训练中看到或看不见的)本地化的所有实例(在训练中看到或看不见)的新问题。我们将此问题称为草图引导的对象本地化。此问题与传统的基于草图的图像检索任务有着明显的不同,在该任务中,画廊集通常只包含一个对象。 The sketch-guided object localization proves to be more challenging when we consider the following: (i) the sketches used as queries are abstract representations with little information on the shape and salient attributes of the object, (ii) the sketches have significant variability as they are hand-drawn by a diverse set of untrained human subjects, and (iii) there exists a domain gap between sketch queries and target natural images as these are sampled from very different data distributions.为了解决草图引导对象定位的问题,我们提出了一种新颖的跨模式注意方案,该方案指导区域提案网络(RPN)生成与草图查询相关的对象建议。这些对象建议后来根据查询得分以获得最终定位。我们的方法与单个草图查询一样有效。此外,它还可以很好地推广到在训练过程中未见的对象类别,并且可以有效地定位图像中存在的多个对象实例。此外,我们使用本文中介绍的新型功能融合和注意力融合策略将框架扩展到了多电量设置。本地化性能是根据公开可用的对象检测基准评估的,即。 MS-COCO和PASCAL-VOC,带有从“快速,绘制!”获得的草图查询。所提出的方法在单质量和多传奇本地化任务上都显着优于相关的基线。
We introduce the novel problem of localizing all the instances of an object (seen or unseen during training) in a natural image via sketch query. We refer to this problem as sketch-guided object localization. This problem is distinctively different from the traditional sketch-based image retrieval task where the gallery set often contains images with only one object. The sketch-guided object localization proves to be more challenging when we consider the following: (i) the sketches used as queries are abstract representations with little information on the shape and salient attributes of the object, (ii) the sketches have significant variability as they are hand-drawn by a diverse set of untrained human subjects, and (iii) there exists a domain gap between sketch queries and target natural images as these are sampled from very different data distributions. To address the problem of sketch-guided object localization, we propose a novel cross-modal attention scheme that guides the region proposal network (RPN) to generate object proposals relevant to the sketch query. These object proposals are later scored against the query to obtain final localization. Our method is effective with as little as a single sketch query. Moreover, it also generalizes well to object categories not seen during training and is effective in localizing multiple object instances present in the image. Furthermore, we extend our framework to a multi-query setting using novel feature fusion and attention fusion strategies introduced in this paper. The localization performance is evaluated on publicly available object detection benchmarks, viz. MS-COCO and PASCAL-VOC, with sketch queries obtained from `Quick, Draw!'. The proposed method significantly outperforms related baselines on both single-query and multi-query localization tasks.