论文标题
用及时的零件重新考虑事件编码管道
Rethinking the Event Coding Pipeline with Prompt Entailment
论文作者
论文摘要
为了监测危机,从新闻中提取了政治事件。大量非结构化的全文事件描述使得逐案分析难以管理,尤其是对于低资源的人道主义援助组织。这创建了将事件分类为事件类型的需求,该任务称为事件编码。通常,域专家制作了事件类型本体论,注释者标记了大型数据集和技术专家开发了监督的编码系统。在这项工作中,我们提出了一种新的事件编码方法,在保持竞争准确性的同时,它更灵活,更有效:首先,我们扩展了一个事件描述,例如“军事受伤的两个平民”模板,例如,“人是[z]”,并提示了预先受害的(clot)的语言模型。事件描述作为前提,将填充模板作为文本需要任务中的假设。这允许域专家作为标记的提示和可解释的答案候选人直接起草代码簿。我们的交互式代码簿设计工具指导了这种环境过程。我们在几个鲁棒性检查中评估Pr-tent:扰动事件描述和提示模板,限制词汇并删除上下文信息。
For monitoring crises, political events are extracted from the news. The large amount of unstructured full-text event descriptions makes a case-by-case analysis unmanageable, particularly for low-resource humanitarian aid organizations. This creates a demand to classify events into event types, a task referred to as event coding. Typically, domain experts craft an event type ontology, annotators label a large dataset and technical experts develop a supervised coding system. In this work, we propose PR-ENT, a new event coding approach that is more flexible and resource-efficient, while maintaining competitive accuracy: first, we extend an event description such as "Military injured two civilians'' by a template, e.g. "People were [Z]" and prompt a pre-trained (cloze) language model to fill the slot Z. Second, we select answer candidates Z* = {"injured'', "hurt"...} by treating the event description as premise and the filled templates as hypothesis in a textual entailment task. This allows domain experts to draft the codebook directly as labeled prompts and interpretable answer candidates. This human-in-the-loop process is guided by our interactive codebook design tool. We evaluate PR-ENT in several robustness checks: perturbing the event description and prompt template, restricting the vocabulary and removing contextual information.