协作论证的讨论跟踪器语料库

论文标题

协作论证的讨论跟踪器语料库

The Discussion Tracker Corpus of Collaborative Argumentation

论文作者

Olshefski, Christopher, Lugini, Luca, Singh, Ravneet, Litman, Diane, Godley, Amanda

论文摘要

尽管近年来对论证挖掘的自然语言处理（NLP）的研究已经大大提高，但大多数研究都借鉴了通常由个人制作的异步和书面文本的语料库。很少有同步，多方论点发表的Corpora。讨论跟踪器语料库是在美国高中英语课程中收集的，它是一个注释的口语，多方论证的成绩单。该语料库包括从985分钟音频中转录的29个多方讨论。记录了成绩单的协作论证的三个维度：论点移动（主张，证据和解释），特异性（低，中，高）和协作（例如，对他人思想的扩展和分歧）。除了提供有关语料库的描述性统计数据外，我们还提供了性能基准和相关的代码来分别预测每个维度，还说明了语料库中多个注释的使用，以通过多任务学习来提高绩效，最后讨论语料库可以使用其他方式来进一步进行NLP研究。

Although Natural Language Processing (NLP) research on argument mining has advanced considerably in recent years, most studies draw on corpora of asynchronous and written texts, often produced by individuals. Few published corpora of synchronous, multi-party argumentation are available. The Discussion Tracker corpus, collected in American high school English classes, is an annotated dataset of transcripts of spoken, multi-party argumentation. The corpus consists of 29 multi-party discussions of English literature transcribed from 985 minutes of audio. The transcripts were annotated for three dimensions of collaborative argumentation: argument moves (claims, evidence, and explanations), specificity (low, medium, high) and collaboration (e.g., extensions of and disagreements about others' ideas). In addition to providing descriptive statistics on the corpus, we provide performance benchmarks and associated code for predicting each dimension separately, illustrate the use of the multiple annotations in the corpus to improve performance via multi-task learning, and finally discuss other ways the corpus might be used to further NLP research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题