通过外观和时间对齐

论文标题

通过外观和时间对齐

Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments

论文作者

Nguyen, Khoi D., Tran, Quoc-Huy, Nguyen, Khoi, Hua, Binh-Son, Nguyen, Rang

论文摘要

我们提出了一种用于几个射击视频分类的新方法，该方法可以执行外观和时间对齐。特别是，给定一对查询和支持视频，我们通过框架级功能匹配进行外观对齐，以在视频之间达到外观相似性得分，同时利用时间订单保留的先验来获得视频之间的时间相似性得分。此外，我们介绍了一些视频分类框架，该框架利用了多个步骤的上述外观和时间相似性得分，即基于原型的训练和测试以及电感性和转导性原型的改进。据我们所知，我们的工作是第一个探索跨传播视频分类的工作。对动力学和某些事物的V2数据集进行了广泛的实验表明，外观和时间对齐对于具有时间订单敏感性的数据集至关重要，例如某些东西v2。我们的方法与两个数据集上的以前方法相似或更好的结果。我们的代码可在https://github.com/vinairesearch/fsvc-ata上找到。

We present a novel method for few-shot video classification, which performs appearance and temporal alignments. In particular, given a pair of query and support videos, we conduct appearance alignment via frame-level feature matching to achieve the appearance similarity score between the videos, while utilizing temporal order-preserving priors for obtaining the temporal similarity score between the videos. Moreover, we introduce a few-shot video classification framework that leverages the above appearance and temporal similarity scores across multiple steps, namely prototype-based training and testing as well as inductive and transductive prototype refinement. To the best of our knowledge, our work is the first to explore transductive few-shot video classification. Extensive experiments on both Kinetics and Something-Something V2 datasets show that both appearance and temporal alignments are crucial for datasets with temporal order sensitivity such as Something-Something V2. Our approach achieves similar or better results than previous methods on both datasets. Our code is available at https://github.com/VinAIResearch/fsvc-ata.

下载PDF全文

下载文献需遵守相关版权规定

论文标题