论文标题
多级文本对齐,并关注跨文档
Multilevel Text Alignment with Cross-Document Attention
论文作者
论文摘要
文本对准可以在引文建议和窃检测等任务中找到应用。现有的对齐方法在单个预定义的级别上运行,无法在例如句子和文档级别上对齐文本。我们提出了一种新的学习方法,该方法为先前建立的分层注意编码器提供了用跨文档注意组件表示文档的编码,从而使不同级别的结构比较(文档对文档和句子对文档)可以进行结构比较。我们的组件是通过文档对弱监督的,并且可以在多个层面上对齐。我们在预测文档对文档的关系以及对引文建议和窃的任务上的句子之间的关系的评估表明,我们的方法优于先前确定的基于层次的层次结构编码器,基于复发和变压器上下文化,这些编码者对文档之间的结构对应不了解。
Text alignment finds application in tasks such as citation recommendation and plagiarism detection. Existing alignment methods operate at a single, predefined level and cannot learn to align texts at, for example, sentence and document levels. We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component, enabling structural comparisons across different levels (document-to-document and sentence-to-document). Our component is weakly supervised from document pairs and can align at multiple levels. Our evaluation on predicting document-to-document relationships and sentence-to-document relationships on the tasks of citation recommendation and plagiarism detection shows that our approach outperforms previously established hierarchical, attention encoders based on recurrent and transformer contextualization that are unaware of structural correspondence between documents.