上下文感知的RCNN：视频中的动作检测基线

论文标题

上下文感知的RCNN：视频中的动作检测基线

Context-Aware RCNN: A Baseline for Action Detection in Videos

论文作者

Wu, Jianchao, Kuang, Zhanghui, Wang, Limin, Zhang, Wayne, Wu, Gangshan

论文摘要

视频动作检测方法通常在ROI-Pool-pool的功能上进行以Actor为中心的动作识别，该功能是在标准的RCNN标准管道之后进行的。在这项工作中，我们首先从经验上发现，识别精度与演员的边界框大小高度相关，因此，更高的参与者的分辨率有助于更好的性能。但是，视频模型需要及时进行密集的采样以实现准确的识别。为了适合GPU内存，必须保持低分辨率的骨架网络，从而在ROI-Pooling层中产生粗糙的特征映射。因此，我们通过在用i3D深网进行特征提取之前，通过裁剪和调整围绕参与者的图像贴片来重新访问RCNN以裁剪和调整图像贴片的大小。此外，我们发现稍微扩展演员边界框并融合上下文功能可以进一步提高性能。因此，我们开发了令人惊讶的有效基线（上下文感知的RCNN），并在AVA和JHMDB的两个具有挑战性的动作检测基准上取得了新的最新结果。我们的观察结果挑战了基于ROI的管道的传统观念，并鼓励研究人员重新考虑解决方案中以演员为中心的行动识别的重要性。我们的方法可以作为视频检测的强大基准，并有望激发提出的新想法。该代码可在\ url {https://github.com/mcg-nju/crcnn-action}中获得。

Video action detection approaches usually conduct actor-centric action recognition over RoI-pooled features following the standard pipeline of Faster-RCNN. In this work, we first empirically find the recognition accuracy is highly correlated with the bounding box size of an actor, and thus higher resolution of actors contributes to better performance. However, video models require dense sampling in time to achieve accurate recognition. To fit in GPU memory, the frames to backbone network must be kept low-resolution, resulting in a coarse feature map in RoI-Pooling layer. Thus, we revisit RCNN for actor-centric action recognition via cropping and resizing image patches around actors before feature extraction with I3D deep network. Moreover, we found that expanding actor bounding boxes slightly and fusing the context features can further boost the performance. Consequently, we develop a surpringly effective baseline (Context-Aware RCNN) and it achieves new state-of-the-art results on two challenging action detection benchmarks of AVA and JHMDB. Our observations challenge the conventional wisdom of RoI-Pooling based pipeline and encourage researchers rethink the importance of resolution in actor-centric action recognition. Our approach can serve as a strong baseline for video action detection and is expected to inspire new ideas for this filed. The code is available at \url{https://github.com/MCG-NJU/CRCNN-Action}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题