论文标题

骨干就是您的全部需要:一个简化的视觉对象跟踪体系结构

Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking

论文作者

Chen, Boyu, Li, Peixia, Bai, Lei, Qiao, Lei, Shen, Qiuhong, Li, Bo, Gan, Weihao, Wu, Wei, Ouyang, Wanli

论文摘要

利用通用神经结构来替代手动设计或感应偏见,最近引起了广泛的兴趣。但是,现有的跟踪方法依赖于定制的子模块,需要进行架构选择的先验知识,从而阻碍了更通用系统中的跟踪开发。本文通过利用变压器主链进行联合特征提取和相互作用来提供简化的跟踪体系结构(SIMTRACK)。与现有的暹罗跟踪器不同,我们将输入图像序列化,并在单支骨架上直接串联。主链中的特征相互作用有助于删除精心设计的交互模块并产生更有效和有效的框架。为了减少视觉变压器中的下采样的信息丢失,我们进一步提出了动脉窗口策略,为可接受的计算成本提供了更多多样化的输入补丁。我们的SimTrack在Lasot/TNL2K上以2.5%/2.6%的AUC收益提高了基线,并获得了与其他没有铃铛和哨声的其他专业跟踪算法竞争的结果。

Exploiting a general-purpose neural architecture to replace hand-wired designs or inductive biases has recently drawn extensive interest. However, existing tracking approaches rely on customized sub-modules and need prior knowledge for architecture selection, hindering the tracking development in a more general system. This paper presents a Simplified Tracking architecture (SimTrack) by leveraging a transformer backbone for joint feature extraction and interaction. Unlike existing Siamese trackers, we serialize the input images and concatenate them directly before the one-branch backbone. Feature interaction in the backbone helps to remove well-designed interaction modules and produce a more efficient and effective framework. To reduce the information loss from down-sampling in vision transformers, we further propose a foveal window strategy, providing more diverse input patches with acceptable computational costs. Our SimTrack improves the baseline with 2.5%/2.6% AUC gains on LaSOT/TNL2K and gets results competitive with other specialized tracking algorithms without bells and whistles.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源