论文标题
学会总结段落:Wikipedia修订历史的采矿通道 - 苏装对
Learning to Summarize Passages: Mining Passage-Summary Pairs from Wikipedia Revision Histories
论文作者
论文摘要
在本文中,我们提出了一种通过挖掘Wikipedia页面修订历史记录来自动构建通道到夏季数据集的方法。特别是,该方法将主体段落和引言句子同时添加到页面上。构造的数据集包含超过十万通道 - 萨金对。质量分析表明,可以将数据集用作通行摘要的培训和验证集。我们验证和分析提出的数据集上各种摘要系统的性能。该数据集将在https://res.qyzhou.me在线提供。
In this paper, we propose a method for automatically constructing a passage-to-summary dataset by mining the Wikipedia page revision histories. In particular, the method mines the main body passages and the introduction sentences which are added to the pages simultaneously. The constructed dataset contains more than one hundred thousand passage-summary pairs. The quality analysis shows that it is promising that the dataset can be used as a training and validation set for passage summarization. We validate and analyze the performance of various summarization systems on the proposed dataset. The dataset will be available online at https://res.qyzhou.me.