论文标题
MUTR3D:通过3D到2D查询的多相机跟踪框架
MUTR3D: A Multi-camera Tracking Framework via 3D-to-2D Queries
论文作者
论文摘要
来自多个摄像机的准确且一致的3D跟踪是基于视觉的自主驾驶系统中的关键组件。它涉及在多个摄像机的复杂场景中对3D动态对象进行建模。由于深度估计,视觉阻塞,外观模棱两可等,此问题本质上是具有挑战性的。此外,对象在时间和相机之间并不始终如一。为了解决这个问题,我们提出了一个端到端\ textbf {mu} lti-camera \ textbf {tr} acking框架称为mutr3d。与先前的作品相反,mutr3d并不明确依赖对象的空间和外观相似性。取而代之的是,我们的方法引入\ textit {3D跟踪查询},以模拟每个对象出现在多个摄像机和多个帧中的对象的空间和外观相干轨道。我们使用摄像头转换将3D跟踪器与2D图像中的观测值联系起来。根据相机图像获得的功能,将进一步完善每个跟踪器。 MUTR3D使用设定的损失来衡量预测的跟踪结果与地面真相之间的差异。因此,它不需要任何后处理,例如非最大抑制和/或边界框关联。在Nuscenes数据集上,MUTR3D优于5.3 AMOTA的最先进方法。代码可在:\ url {https://github.com/a1600012888/mutr3d}中获得。
Accurate and consistent 3D tracking from multiple cameras is a key component in a vision-based autonomous driving system. It involves modeling 3D dynamic objects in complex scenes across multiple cameras. This problem is inherently challenging due to depth estimation, visual occlusions, appearance ambiguity, etc. Moreover, objects are not consistently associated across time and cameras. To address that, we propose an end-to-end \textbf{MU}lti-camera \textbf{TR}acking framework called MUTR3D. In contrast to prior works, MUTR3D does not explicitly rely on the spatial and appearance similarity of objects. Instead, our method introduces \textit{3D track query} to model spatial and appearance coherent track for each object that appears in multiple cameras and multiple frames. We use camera transformations to link 3D trackers with their observations in 2D images. Each tracker is further refined according to the features that are obtained from camera images. MUTR3D uses a set-to-set loss to measure the difference between the predicted tracking results and the ground truths. Therefore, it does not require any post-processing such as non-maximum suppression and/or bounding box association. MUTR3D outperforms state-of-the-art methods by 5.3 AMOTA on the nuScenes dataset. Code is available at: \url{https://github.com/a1600012888/MUTR3D}.