论文标题
奇妙的问题和在哪里可以找到它们:Fairytaleqa-叙事理解的真实数据集
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension
论文作者
论文摘要
问答(QA)是促进机器和幼儿评估和培训叙事理解能力的基本手段,但是精心设计的高质量质量质量质量QA数据集缺乏为此目的而设计。特别是,现有数据集很少区分细粒度的阅读技能,例如对不同叙事元素的理解。利用阅读教育研究,我们介绍了Fairytaleqa,这是一个针对八年级学生的幼儿园叙事理解的数据集。 Fairytaleqa由教育专家基于基于证据的理论框架而生成,由10,580个明确和隐性问题组成,这些问题来自278个对儿童友好的故事,涵盖了七种类型的叙事元素或关系。我们的数据集在两个折叠中很有价值:首先,我们在数据集中运行了现有的QA模型,并确认该注释有助于评估模型的细粒度学习技能。其次,数据集支持教育领域中的问题生成(QG)任务。通过使用QG模型进行基准测试,我们表明在Fairytaleqa训练的QG模型能够提出高质量和更多样化的问题。
Question answering (QA) is a fundamental means to facilitate assessment and training of narrative comprehension skills for both machines and young children, yet there is scarcity of high-quality QA datasets carefully designed to serve this purpose. In particular, existing datasets rarely distinguish fine-grained reading skills, such as the understanding of varying narrative elements. Drawing on the reading education research, we introduce FairytaleQA, a dataset focusing on narrative comprehension of kindergarten to eighth-grade students. Generated by educational experts based on an evidence-based theoretical framework, FairytaleQA consists of 10,580 explicit and implicit questions derived from 278 children-friendly stories, covering seven types of narrative elements or relations. Our dataset is valuable in two folds: First, we ran existing QA models on our dataset and confirmed that this annotation helps assess models' fine-grained learning skills. Second, the dataset supports question generation (QG) task in the education domain. Through benchmarking with QG models, we show that the QG model trained on FairytaleQA is capable of asking high-quality and more diverse questions.