通过细分进行稳健的视觉跟踪

论文标题

通过细分进行稳健的视觉跟踪

Robust Visual Tracking by Segmentation

论文作者

Paul, Matthieu, Danelljan, Martin, Mayer, Christoph, Van Gool, Luc

论文摘要

估计目标范围在视觉对象跟踪中构成了基本挑战。通常，跟踪器以箱子为中心，并且完全依靠一个边界框来定义场景中的目标。实际上，对象通常具有复杂的形状，并且与图像轴不符。在这些情况下，边界框不能提供对目标的准确描述，并且通常包含大多数背景像素。我们提出了一个以细分为中心的跟踪管道，该管道不仅会产生高度准确的分割掩码，而且还可以内部使用分段掩码而不是边界框。因此，我们的跟踪器能够更好地学习目标表示形式，该目标表示明确将场景中的目标与背景内容区分开来。为了实现具有挑战性的跟踪方案的必要鲁棒性，我们提出了一个单独的实例本地化组件，该组件用于在产生输出掩码时调节分割解码器。我们从细分面具中推断出一个边界框，验证我们的跟踪器在挑战跟踪数据集方面，并在Lasot上实现新的最新技术，而AUC得分为69.7％。由于大多数跟踪数据集都不包含掩码注释，因此我们无法使用它们来评估预测的分段蒙版。取而代之的是，我们在两个流行的视频对象细分数据集上验证分割质量。

Estimating the target extent poses a fundamental challenge in visual object tracking. Typically, trackers are box-centric and fully rely on a bounding box to define the target in the scene. In practice, objects often have complex shapes and are not aligned with the image axis. In these cases, bounding boxes do not provide an accurate description of the target and often contain a majority of background pixels. We propose a segmentation-centric tracking pipeline that not only produces a highly accurate segmentation mask, but also internally works with segmentation masks instead of bounding boxes. Thus, our tracker is able to better learn a target representation that clearly differentiates the target in the scene from background content. In order to achieve the necessary robustness for the challenging tracking scenario, we propose a separate instance localization component that is used to condition the segmentation decoder when producing the output mask. We infer a bounding box from the segmentation mask, validate our tracker on challenging tracking datasets and achieve the new state of the art on LaSOT with a success AUC score of 69.7%. Since most tracking datasets do not contain mask annotations, we cannot use them to evaluate predicted segmentation masks. Instead, we validate our segmentation quality on two popular video object segmentation datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题