从社会讨论中无监督的知识转移可以帮助挖掘？

论文标题

从社会讨论中无监督的知识转移可以帮助挖掘？

Can Unsupervised Knowledge Transfer from Social Discussions Help Argument Mining?

论文作者

Dutta, Subhabrata, Juneja, Jeevesh, Das, Dipankar, Chakraborty, Tanmoy

论文摘要

从非结构化文本中识别参数组件并预测它们之间表达的关系是参数挖掘的两个主要步骤。这些任务的内在复杂性需要强大的学习模型。虽然已验证的基于变压器的语言模型（LM）已证明可以在不同的NLP任务上提供最先进的结果，但手动注释数据的稀缺性以及论证高度依赖于域的依赖性限制了此类模型的能力。在这项工作中，我们提出了一种新颖的转移学习策略来克服这些挑战。我们利用ChangemyView Subreddit的论点丰富的社会讨论作为无监督的，论证性话语意识到的知识的来源，通过在有选择的掩盖语言建模任务上审计的LMS进行了预先审计的LMS。此外，我们介绍了一种基于新颖的及时组件关系预测的迅速策略，该预测在利用话语上下文的同时补充了我们提出的填充方法。详尽的实验表明，我们方法对这两个任务的概括能力在域内以及室外数据集上的概括，表现优于几个现有且使用的强大基线。

Identifying argument components from unstructured texts and predicting the relationships expressed among them are two primary steps of argument mining. The intrinsic complexity of these tasks demands powerful learning models. While pretrained Transformer-based Language Models (LM) have been shown to provide state-of-the-art results over different NLP tasks, the scarcity of manually annotated data and the highly domain-dependent nature of argumentation restrict the capabilities of such models. In this work, we propose a novel transfer learning strategy to overcome these challenges. We utilize argumentation-rich social discussions from the ChangeMyView subreddit as a source of unsupervised, argumentative discourse-aware knowledge by finetuning pretrained LMs on a selectively masked language modeling task. Furthermore, we introduce a novel prompt-based strategy for inter-component relation prediction that compliments our proposed finetuning method while leveraging on the discourse context. Exhaustive experiments show the generalization capability of our method on these two tasks over within-domain as well as out-of-domain datasets, outperforming several existing and employed strong baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题