在视频中发现时间精确，细粒度的事件

论文标题

在视频中发现时间精确，细粒度的事件

Spotting Temporally Precise, Fine-Grained Events in Video

论文作者

Hong, James, Zhang, Haotian, Gharbi, Michaël, Fisher, Matthew, Fatahalian, Kayvon

论文摘要

我们介绍了在视频中发现时间精确，细粒度事件的任务（检测到时间事件的精确时刻）。精确的斑点需要模型在全球范围内对全日制动作规模进行推理，并在本地识别微妙的框架外观和运动差异，以识别这些动作过程中事件的识别。令人惊讶的是，我们发现，最高的绩效解决方案可以与动作检测和细分等视频理解任务相同，不能同时满足这两个要求。作为响应，我们提出了E2E点，这是一种紧凑的端到端模型，在精确的发现任务上表现良好，可以在单个GPU上快速训练。我们证明，E2E点的表现明显优于最近根据视频动作检测，细分和将文献发现到精确的发现任务的基线。最后，我们为几个细颗粒的运动动作数据集贡献了新的注释和分裂，以使这些数据集适合将来的精确发现工作。

We introduce the task of spotting temporally precise, fine-grained events in video (detecting the precise moment in time events occur). Precise spotting requires models to reason globally about the full-time scale of actions and locally to identify subtle frame-to-frame appearance and motion differences that identify events during these actions. Surprisingly, we find that top performing solutions to prior video understanding tasks such as action detection and segmentation do not simultaneously meet both requirements. In response, we propose E2E-Spot, a compact, end-to-end model that performs well on the precise spotting task and can be trained quickly on a single GPU. We demonstrate that E2E-Spot significantly outperforms recent baselines adapted from the video action detection, segmentation, and spotting literature to the precise spotting task. Finally, we contribute new annotations and splits to several fine-grained sports action datasets to make these datasets suitable for future work on precise spotting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题