论文标题
项目对话主义小说语料库:文学文本中引号归因的数据集
The Project Dialogism Novel Corpus: A Dataset for Quotation Attribution in Literary Texts
论文作者
论文摘要
我们介绍了项目对话主义小说语料库(PDNC),这是一个注释英语文学文本的注释数据集。 PDNC包含22个全长小说中35,978篇报价的注释,并且是同类产品中最大的数量级。每个引号都针对说话者,收件人,引号类型,引用表达方式和角色提及。带注释的属性允许对文学文本的引号归因模型和核心方面进行全面评估。
We present the Project Dialogism Novel Corpus, or PDNC, an annotated dataset of quotations for English literary texts. PDNC contains annotations for 35,978 quotations across 22 full-length novels, and is by an order of magnitude the largest corpus of its kind. Each quotation is annotated for the speaker, addressees, type of quotation, referring expression, and character mentions within the quotation text. The annotated attributes allow for a comprehensive evaluation of models of quotation attribution and coreference for literary texts.