Yolov：使静止图像对象检测器在视频对象检测方面很棒

论文标题

Yolov：使静止图像对象检测器在视频对象检测方面很棒

YOLOV: Making Still Image Object Detectors Great at Video Object Detection

论文作者

Shi, Yuheng, Wang, Naiyan, Guo, Xiaojie

论文摘要

视频对象检测（VID）是具有挑战性的，因为对象外观的较高变化以及某些框架中的各种恶化。从积极的一面来看，与静止图像中的视频的某个框架中的检测可以吸引其他帧的支撑。因此，如何在不同框架上汇总特征对于VID问题至关重要。大多数现有的聚合算法都是针对两阶段探测器定制的。但是，这些探测器通常由于其两个阶段性质而在计算上昂贵。这项工作提出了一种简单而有效的策略来解决上述问题，这使边际开销的准确性显着提高。具体的，与传统的两阶段管道不同，我们选择了一阶段检测后的重要区域，以避免处理大量的低质量候选者。此外，我们评估目标框架和参考框架之间的关系以指导聚合。我们进行了广泛的实验和消融研究，以验证设计的功效，并在有效性和效率方面揭示了其优于其他最先进的VID方法的优势。我们的基于Yolox的模型可以实现有希望的性能（\ Emph {e.g。}，在单个2080TI GPU上的Imagenet VID数据集上的30 fps上的87.5 \％ap50，使其对大型或实时应用程序有吸引力。实现很简单，我们在\ url {https://github.com/yuhengsss/yolov}中提供了演示代码和模型。

Video object detection (VID) is challenging because of the high variation of object appearance as well as the diverse deterioration in some frames. On the positive side, the detection in a certain frame of a video, compared with that in a still image, can draw support from other frames. Hence, how to aggregate features across different frames is pivotal to VID problem. Most of existing aggregation algorithms are customized for two-stage detectors. However, these detectors are usually computationally expensive due to their two-stage nature. This work proposes a simple yet effective strategy to address the above concerns, which costs marginal overheads with significant gains in accuracy. Concretely, different from traditional two-stage pipeline, we select important regions after the one-stage detection to avoid processing massive low-quality candidates. Besides, we evaluate the relationship between a target frame and reference frames to guide the aggregation. We conduct extensive experiments and ablation studies to verify the efficacy of our design, and reveal its superiority over other state-of-the-art VID approaches in both effectiveness and efficiency. Our YOLOX-based model can achieve promising performance (\emph{e.g.}, 87.5\% AP50 at over 30 FPS on the ImageNet VID dataset on a single 2080Ti GPU), making it attractive for large-scale or real-time applications. The implementation is simple, we have made the demo codes and models available at \url{https://github.com/YuHengsss/YOLOV}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题