通用事件边界检测的蒙版自动编码器CVPR'2022动力学-GEBD挑战

论文标题

通用事件边界检测的蒙版自动编码器CVPR'2022动力学-GEBD挑战

Masked Autoencoders for Generic Event Boundary Detection CVPR'2022 Kinetics-GEBD Challenge

论文作者

He, Rui, Sun, Yuanxi, Li, Youzeng, Huang, Zuwei, Hu, Feng, Cheng, Xu, Tang, Jie

论文摘要

通用事件边界检测（GEBD）任务旨在检测通用的，无分类的事件边界，将整个视频分为块。在本文中，我们应用蒙版的自动编码器来改善GEBD任务的算法性能。我们的方法主要采用了对GEBD任务进行微调的蒙面自动编码器的合奏，并将其作为其他基本模型的自我监督的学习者。此外，我们还使用半监督的伪标签方法来充分利用训练时丰富的未标记动力学-400数据。此外，我们提出了一种软标签方法，以部分平衡正面和负样本，并减轻此任务中模棱两可的标记问题。最后，实施了一个棘手的分割对齐策略，以完善我们的模型预测到更准确的位置的边界。通过我们的方法，与2021 Kinetics-GEBD挑战的获胜者相比，Kinetics-GEBD测试集的F1分数达到了85.94％。我们的代码可在https://github.com/contentandandmaterialportrait/mae-gebd上找到。

Generic Event Boundary Detection (GEBD) tasks aim at detecting generic, taxonomy-free event boundaries that segment a whole video into chunks. In this paper, we apply Masked Autoencoders to improve algorithm performance on the GEBD tasks. Our approach mainly adopted the ensemble of Masked Autoencoders fine-tuned on the GEBD task as a self-supervised learner with other base models. Moreover, we also use a semi-supervised pseudo-label method to take full advantage of the abundant unlabeled Kinetics-400 data while training. In addition, we propose a soft-label method to partially balance the positive and negative samples and alleviate the problem of ambiguous labeling in this task. Lastly, a tricky segmentation alignment policy is implemented to refine boundaries predicted by our models to more accurate locations. With our approach, we achieved 85.94% on the F1-score on the Kinetics-GEBD test set, which improved the F1-score by 2.31% compared to the winner of the 2021 Kinetics-GEBD Challenge. Our code is available at https://github.com/ContentAndMaterialPortrait/MAE-GEBD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题