论文标题
视频测试时间适应动作识别
Video Test-Time Adaptation for Action Recognition
论文作者
论文摘要
尽管在分发测试点进行评估时,动作识别系统可以达到最高的性能,但它们容易受到测试数据中意外的分配变化的影响。但是,尚未证明对视频动作识别模型的测试时间适应到迄今为止尚未证明针对共同分布变化的模型。我们建议通过一种适合时空模型的方法来解决这个问题,该方法能够在单个视频样本上适应。它由特征分布对齐技术组成,该技术将测试集统计数据的在线估计与培训统计数据保持一致。我们进一步对同一测试视频样本的暂时增强视图实施预测一致性。对三个基准动作识别数据集的评估表明,我们提出的技术是架构 - 敏锐的,并且能够显着提高两者的性能,即最先进的卷积体系结构tanet和视频Swin Transformer。我们提出的方法表明,在对单个分布变化的评估和挑战性的随机分配变化案例中,对现有测试时间适应方法的绩效增长很大。代码将在\ url {https://github.com/wlin-at/vitta}上找到。
Although action recognition systems can achieve top performance when evaluated on in-distribution test points, they are vulnerable to unanticipated distribution shifts in test data. However, test-time adaptation of video action recognition models against common distribution shifts has so far not been demonstrated. We propose to address this problem with an approach tailored to spatio-temporal models that is capable of adaptation on a single video sample at a step. It consists in a feature distribution alignment technique that aligns online estimates of test set statistics towards the training statistics. We further enforce prediction consistency over temporally augmented views of the same test video sample. Evaluations on three benchmark action recognition datasets show that our proposed technique is architecture-agnostic and able to significantly boost the performance on both, the state of the art convolutional architecture TANet and the Video Swin Transformer. Our proposed method demonstrates a substantial performance gain over existing test-time adaptation approaches in both evaluations of a single distribution shift and the challenging case of random distribution shifts. Code will be available at \url{https://github.com/wlin-at/ViTTA}.