LUNA：通过数字插件和预训练的变压器上的数字增加语言理解

论文标题

LUNA：通过数字插件和预训练的变压器上的数字增加语言理解

LUNA: Language Understanding with Number Augmentations on Transformers via Number Plugins and Pre-training

论文作者

Han, Hongwei, Xu, Jialiang, Zhou, Mengyu, Shao, Yijia, Han, Shi, Zhang, Dongmei

论文摘要

变形金刚在NLP任务中广泛使用。但是，当前利用变形金刚理解语言的方法暴露了一个弱点：数字理解。在某些情况下，经常发生数字，尤其是在半结构化数据（如表）中。但是，使用基于变压器的语言模型放弃或丢失了一些算术信息（例如，将数字分解为子字代币 - 导致许多与数字相关的错误）的当前方法可以放弃或丢失一些算术信息。在本文中，我们提出了Luna框架，该框架改善了基于变压器的语言模型的数值推理和计算功能。使用Numtok和麻木的数字插件，Luna将每个数字整体代表模型输入。露娜（Luna）在数量预训练（包括回归损失和模型蒸馏）中，弥合了数字和词汇嵌入之间的差距。据我们所知，这是使用数字插件将计算能力明确注入语言模型的第一项工作。除了在玩具任务上评估玩具模型外，我们还评估了三种大规模变压器模型（Roberta，Bert，Tabbert）对三种不同下游任务（TATQA，TATQA，TABFACT，Creditrans）的Luna，并观察语言模型的性能不断改善Luna。增强模型还改善了TAT -QA（EM：50.15-> 59.58）的官方基线，并在Creditrans上实现SOTA绩效（F1 = 86.17）。

Transformers are widely used in NLP tasks. However, current approaches to leveraging transformers to understand language expose one weak spot: Number understanding. In some scenarios, numbers frequently occur, especially in semi-structured data like tables. But current approaches to rich-number tasks with transformer-based language models abandon or lose some of the numeracy information - e.g., breaking numbers into sub-word tokens - which leads to many number-related errors. In this paper, we propose the LUNA framework which improves the numerical reasoning and calculation capabilities of transformer-based language models. With the number plugin of NumTok and NumBed, LUNA represents each number as a whole to model input. With number pre-training, including regression loss and model distillation, LUNA bridges the gap between number and vocabulary embeddings. To the best of our knowledge, this is the first work that explicitly injects numeracy capability into language models using Number Plugins. Besides evaluating toy models on toy tasks, we evaluate LUNA on three large-scale transformer models (RoBERTa, BERT, TabBERT) over three different downstream tasks (TATQA, TabFact, CrediTrans), and observe the performances of language models are constantly improved by LUNA. The augmented models also improve the official baseline of TAT-QA (EM: 50.15 -> 59.58) and achieve SOTA performance on CrediTrans (F1 = 86.17).

下载PDF全文

下载文献需遵守相关版权规定

论文标题