论文标题
BoxMask:重新访问视频对象检测的边界框监管
BoxMask: Revisiting Bounding Box Supervision for Video Object Detection
论文作者
论文摘要
我们提出了一种新的,简单但有效的方法来提升视频对象检测。我们观察到,先前的作品是基于实例级特征聚合的运作,即不会忽略精制的像素级表示,从而导致对象共享相似的外观或运动特征的混淆。为了解决此限制,我们提出了BoxMask,该盒子通过合并班级感知的像素级信息来有效地学习歧视性表示。我们只是将边界级注释视为每个对象的粗掩模,以监督我们的方法。所提出的模块可以轻松地集成到任何基于区域的检测器中以提高检测。当我们将BoxMask模块插入许多最新的最新方法时,对Imagenet VID和Epic Kitchens数据集进行了广泛的实验表现出一致和显着的改进。
We present a new, simple yet effective approach to uplift video object detection. We observe that prior works operate on instance-level feature aggregation that imminently neglects the refined pixel-level representation, resulting in confusion among objects sharing similar appearance or motion characteristics. To address this limitation, we propose BoxMask, which effectively learns discriminative representations by incorporating class-aware pixel-level information. We simply consider bounding box-level annotations as a coarse mask for each object to supervise our method. The proposed module can be effortlessly integrated into any region-based detector to boost detection. Extensive experiments on ImageNet VID and EPIC KITCHENS datasets demonstrate consistent and significant improvement when we plug our BoxMask module into numerous recent state-of-the-art methods.