论文标题

全面的教学视频分析:硬币数据集和绩效评估

Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation

论文作者

Tang, Yansong, Lu, Jiwen, Zhou, Jie

论文摘要

得益于互联网上的大量和爆炸性的教学视频,新手能够获得完成各种任务的知识。在过去的十年中,越来越多的努力一直致力于在教学视频分析上调查该问题。但是,该领域中最现有的数据集具有多样性和规模的局限性,这使得它们远离发生更多不同活动的许多现实世界应用程序。为了解决这个问题,我们提出了一个大规模数据集,称为“硬币”,以进行全面的教学视频分析。硬币数据集以分层结构组织,包含11,827个与我们的日常生活有关的12个域(例如,车辆,小工具等)的180个任务的视频。有了新开发的工具箱,所有视频都通过一系列步骤标签和相应的时间边界有效地注释。为了提供教学视频分析的基准,我们在五个不同的设置下评估了硬币数据集上的许多方法。此外,我们利用了在教学视频中本地化重要步骤的两个重要特征(即任务符合性和订购依赖性)。因此,我们提出了两种简单但有效的方法,可以轻松地将其插入基于常规建议的动作检测模型中。我们认为,硬币数据集的引入将促进对社区教学视频分析的未来深入研究。我们的数据集,注释工具箱和源代码可在http://coin-dataset.github.io上找到。

Thanks to the substantial and explosively inscreased instructional videos on the Internet, novices are able to acquire knowledge for completing various tasks. Over the past decade, growing efforts have been devoted to investigating the problem on instructional video analysis. However, the most existing datasets in this area have limitations in diversity and scale, which makes them far from many real-world applications where more diverse activities occur. To address this, we present a large-scale dataset named as "COIN" for COmprehensive INstructional video analysis. Organized with a hierarchical structure, the COIN dataset contains 11,827 videos of 180 tasks in 12 domains (e.g., vehicles, gadgets, etc.) related to our daily life. With a new developed toolbox, all the videos are annotated efficiently with a series of step labels and the corresponding temporal boundaries. In order to provide a benchmark for instructional video analysis, we evaluate plenty of approaches on the COIN dataset under five different settings. Furthermore, we exploit two important characteristics (i.e., task-consistency and ordering-dependency) for localizing important steps in instructional videos. Accordingly, we propose two simple yet effective methods, which can be easily plugged into conventional proposal-based action detection models. We believe the introduction of the COIN dataset will promote the future in-depth research on instructional video analysis for the community. Our dataset, annotation toolbox and source code are available at http://coin-dataset.github.io.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源