AR-NET：自适应框架分辨率用于有效的动作识别

论文标题

AR-NET：自适应框架分辨率用于有效的动作识别

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

论文作者

Meng, Yue, Lin, Chung-Ching, Panda, Rameswar, Sattigeri, Prasanna, Karlinsky, Leonid, Oliva, Aude, Saenko, Kate, Feris, Rogerio

论文摘要

动作识别是计算机视觉中的一个开放挑战性的问题。尽管当前的最新模型可提供出色的识别结果，但它们的计算费用限制了它们对许多现实应用程序的影响。在本文中，我们提出了一种称为AR-NET（自适应分辨率网络）的新颖方法，该方法可以在输入上选择每个框架的最佳分辨率，以在长期未经缩放的视频中进行有效的动作识别。具体而言，给定视频框架，策略网络用于确定应通过动作识别模型来处理哪些输入分辨率，以提高准确性和效率。我们使用标准背部传播有效地通过识别模型共同训练政策网络。对几个具有挑战性的动作识别基准数据集进行了广泛的实验，很好地证明了我们提出的方法对最新方法的功效。可以在https://mengyuest.github.io/ar-net上找到项目页面

Action recognition is an open and challenging problem in computer vision. While current state-of-the-art models offer excellent recognition results, their computational expense limits their impact for many real-world applications. In this paper, we propose a novel approach, called AR-Net (Adaptive Resolution Network), that selects on-the-fly the optimal resolution for each frame conditioned on the input for efficient action recognition in long untrimmed videos. Specifically, given a video frame, a policy network is used to decide what input resolution should be used for processing by the action recognition model, with the goal of improving both accuracy and efficiency. We efficiently train the policy network jointly with the recognition model using standard back-propagation. Extensive experiments on several challenging action recognition benchmark datasets well demonstrate the efficacy of our proposed approach over state-of-the-art methods. The project page can be found at https://mengyuest.github.io/AR-Net

下载PDF全文

下载文献需遵守相关版权规定

论文标题