论文标题
用于医学教学视频分类和问题答案的数据集
A Dataset for Medical Instructional Video Classification and Question Answering
论文作者
论文摘要
本文介绍了一个新的挑战和数据集,以促进研究可以理解医疗视频并为自然语言问题提供视觉答案的系统。我们认为,医疗视频可能会为许多急救和医学教育问题提供最佳答案。为此,我们创建了MEDVIDCL和MEDVIDQA数据集,并介绍了医学视频分类(MVC)和医疗视觉答案本地化(MVAL)的任务,这是两个侧重于跨模式(医学语言和医学视频)理解的任务。拟议的任务和数据集有可能支持可以使公众和医生受益的复杂下游应用程序的开发。我们的数据集由MVC任务的6,117个带注释的视频组成,并从899个视频中回答了3,010个带注释的问题,并回答了MVAL任务的时间戳。这些数据集已通过医学信息学专家进行了验证和纠正。我们还使用创建的MEDVIDCL和MEDVIDQA数据集对每个任务进行了基准测试,并提出了多模式学习方法,这些方法为未来的研究设定了竞争基准。
This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos and provide visual answers to natural language questions. We believe medical videos may provide the best possible answers to many first aids, medical emergency, and medical education questions. Toward this, we created the MedVidCL and MedVidQA datasets and introduce the tasks of Medical Video Classification (MVC) and Medical Visual Answer Localization (MVAL), two tasks that focus on cross-modal (medical language and medical video) understanding. The proposed tasks and datasets have the potential to support the development of sophisticated downstream applications that can benefit the public and medical practitioners. Our datasets consist of 6,117 annotated videos for the MVC task and 3,010 annotated questions and answers timestamps from 899 videos for the MVAL task. These datasets have been verified and corrected by medical informatics experts. We have also benchmarked each task with the created MedVidCL and MedVidQA datasets and proposed the multimodal learning methods that set competitive baselines for future research.