论文标题
实现数据增强框架,用于增强表格推理
Realistic Data Augmentation Framework for Enhancing Tabular Reasoning
论文作者
论文摘要
现有的自然语言推理培训数据(NLI)任务(例如半结构表推理)的方法是通过众包或全自动方法。但是,前者是昂贵且耗时的,因此限制了规模,后者通常会产生可能缺乏复杂推理的天真例子。本文开发了一个现实的半自动化框架,用于用于表格推断的数据增强。我们的方法论不是为每个表手动生成假设,而是生成可转移到类似表的假设模板。此外,我们的框架还需要基于人的书面逻辑约束和前提来创建理性的反事实表。在我们的案例研究中,我们使用Infotabs,该信息可以是以实体为中心的表格推理数据集。我们观察到,我们的框架可以产生类似人类的表格推理示例,这可能会受益于训练数据的增强,尤其是在有限的监督下。
Existing approaches to constructing training data for Natural Language Inference (NLI) tasks, such as for semi-structured table reasoning, are either via crowdsourcing or fully automatic methods. However, the former is expensive and time-consuming and thus limits scale, and the latter often produces naive examples that may lack complex reasoning. This paper develops a realistic semi-automated framework for data augmentation for tabular inference. Instead of manually generating a hypothesis for each table, our methodology generates hypothesis templates transferable to similar tables. In addition, our framework entails the creation of rational counterfactual tables based on human written logical constraints and premise paraphrasing. For our case study, we use the InfoTabs, which is an entity-centric tabular inference dataset. We observed that our framework could generate human-like tabular inference examples, which could benefit training data augmentation, especially in the scenario with limited supervision.