论文标题
重新思考学习方法以进行长期行动预期
Rethinking Learning Approaches for Long-Term Action Anticipation
论文作者
论文摘要
动作预期涉及预测未来的动作,观察到视频的初始部分。通常,整体上处理观察到的视频,以获取视频中正在进行的活动的视频级表示,然后将其用于将来的预测。我们介绍了预期的ATH,该预期会执行长期行动预期,以利用细分级表示,除了视频级表示外,还使用不同活动中的单个片段学到了学习。我们提出了一种两阶段的学习方法来训练一种基于新颖的变压器模型,该模型使用这两种类型的表示,直接预测任何给定的预期持续时间的一组未来的动作实例。早餐,50萨拉德,Epic-Kitchens-55和Egtea凝视+数据集的结果证明了我们方法的有效性。
Action anticipation involves predicting future actions having observed the initial portion of a video. Typically, the observed video is processed as a whole to obtain a video-level representation of the ongoing activity in the video, which is then used for future prediction. We introduce ANTICIPATR which performs long-term action anticipation leveraging segment-level representations learned using individual segments from different activities, in addition to a video-level representation. We propose a two-stage learning approach to train a novel transformer-based model that uses these two types of representations to directly predict a set of future action instances over any given anticipation duration. Results on Breakfast, 50Salads, Epic-Kitchens-55, and EGTEA Gaze+ datasets demonstrate the effectiveness of our approach.