PVT ++：一个简单的端到端潜伏感知视觉跟踪框架

论文标题

PVT ++：一个简单的端到端潜伏感知视觉跟踪框架

PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework

论文作者

Li, Bowen, Huang, Ziyuan, Ye, Junjie, Li, Yiming, Scherer, Sebastian, Zhao, Hang, Fu, Changhong

论文摘要

视觉对象跟踪对于智能机器人至关重要。大多数现有方法都忽略了在现实处理过程中可能导致严重性能下降的在线潜伏期。特别是对于无人驾驶汽车（UAV），强大的跟踪更具挑战性并且在车载计算有限的情况下，潜伏期问题可能是致命的。在这项工作中，我们提出了一个简单的框架，用于端到端潜伏感跟踪，即端到端预测性视觉跟踪（PVT ++）。与天真地附加Kalman过滤器过滤的现有解决方案不同，PVT ++可以共同优化，因此它不仅可以采用运动信息，而且可以在大多数预训练的跟踪器模型中利用丰富的视觉知识来进行健壮的预测。此外，为了弥合训练评估域间隙，我们提出了一个相对运动因子，赋予PVT ++的能力，以推广到具有挑战性且复杂的无人机跟踪场景。这些仔细的设计使小容量轻巧的Pvt ++成为广泛有效的解决方案。此外，这项工作提出了扩展的延迟感知评估基准，用于评估在线环境中的任何速度跟踪器。从空中角度来看，在机器人平台上的经验结果表明，Pvt ++可以在各种跟踪器上获得显着的性能增长，并且与先前的解决方案相比，在很大程度上可以减轻潜伏期的退化。

Visual object tracking is essential to intelligent robots. Most existing approaches have ignored the online latency that can cause severe performance degradation during real-world processing. Especially for unmanned aerial vehicles (UAVs), where robust tracking is more challenging and onboard computation is limited, the latency issue can be fatal. In this work, we present a simple framework for end-to-end latency-aware tracking, i.e., end-to-end predictive visual tracking (PVT++). Unlike existing solutions that naively append Kalman Filters after trackers, PVT++ can be jointly optimized, so that it takes not only motion information but can also leverage the rich visual knowledge in most pre-trained tracker models for robust prediction. Besides, to bridge the training-evaluation domain gap, we propose a relative motion factor, empowering PVT++ to generalize to the challenging and complex UAV tracking scenes. These careful designs have made the small-capacity lightweight PVT++ a widely effective solution. Additionally, this work presents an extended latency-aware evaluation benchmark for assessing an any-speed tracker in the online setting. Empirical results on a robotic platform from the aerial perspective show that PVT++ can achieve significant performance gain on various trackers and exhibit higher accuracy than prior solutions, largely mitigating the degradation brought by latency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题