论文标题
部分可观测时空混沌系统的无模型预测
EDU-level Extractive Summarization with Varying Summary Lengths
论文作者
论文摘要
提取模型通常将文本摘要提取为从文档中提取固定的top-$ k $显着句子作为摘要。很少有作品利用提取更细粒度的基本话语单元(EDU),而对提取单位选择的分析和理由很少。此外,由于不同文档中的显着句子的数量有所不同,因此固定顶部$ k $显着句子的选择策略符合摘要,因此现实中不存在常见或最佳$ k $。为了填补这些空白,本文首先对甲骨文摘要进行了基于Edus和句子的比较分析,该分析从理论和实验角度提供了证据,以证明和量化Edus使Edus的摘要比句子更高。然后,考虑到这种谱系的这种优点,本文进一步提出了一个具有不同摘要长度的Edu级提取模型,并开发了相应的学习算法。 EDU-VL学会了在文档中编码和预测Edus的概率,生成基于各种$ K $值的多个长度的候选摘要,并以端到端的培训方式对候选候选摘要进行编码和分数候选摘要。最后,与最先进的提取模型相比,在单一和多文件基准数据集上进行了EDU-VL的实验,并显示出改善的胭脂分数性能,进一步的人类评估表明,EDU-Constituent摘要保持良好的语法性和可读性。
Extractive models usually formulate text summarization as extracting fixed top-$k$ salient sentences from the document as a summary. Few works exploited extracting finer-grained Elementary Discourse Unit (EDU) with little analysis and justification for the extractive unit selection. Further, the selection strategy of the fixed top-$k$ salient sentences fits the summarization need poorly, as the number of salient sentences in different documents varies and therefore a common or best $k$ does not exist in reality. To fill these gaps, this paper first conducts the comparison analysis of oracle summaries based on EDUs and sentences, which provides evidence from both theoretical and experimental perspectives to justify and quantify that EDUs make summaries with higher automatic evaluation scores than sentences. Then, considering this merit of EDUs, this paper further proposes an EDU-level extractive model with Varying summary Lengths and develops the corresponding learning algorithm. EDU-VL learns to encode and predict probabilities of EDUs in the document, generate multiple candidate summaries with varying lengths based on various $k$ values, and encode and score candidate summaries, in an end-to-end training manner. Finally, EDU-VL is experimented on single and multi-document benchmark datasets and shows improved performances on ROUGE scores in comparison with state-of-the-art extractive models, and further human evaluation suggests that EDU-constituent summaries maintain good grammaticality and readability.