论文标题
通过自我监督任务学习更好的表格
Learning Better Representation for Tables by Self-Supervised Tasks
论文作者
论文摘要
桌面到文本的生成旨在自动生成自然文本,以帮助人们方便地获取表中的重要信息。尽管桌面到文本的神经模型取得了显着的进步,但仍然忽略了一些问题。首先是在许多表中记录的值主要是实际数字。现有的方法对这些方法没有特殊待遇,并且仍然将其视为自然语言文本中的单词。其次,培训数据集中的目标文本可能包含冗余信息或输入表中不存在事实。这些可能会根据内容选择和计划和辅助监督给出一些方法,以错误地给予监督信号。为了解决这些问题,我们提出了两个自制任务,数字排序和意义排序,以帮助学习更好的表格表示。前者在列尺寸上进行作品,以帮助将数字的尺寸属性纳入表表示。后者在行维度上行动,并有助于学习有意义的表格表示。我们在广泛使用的数据集Rotowire上测试我们的方法,该数据集由NBA游戏统计和相关新闻组成。实验结果表明,与这两个自制任务一起训练的模型也可以生成包含更突出和组织良好的事实的文本,即使没有建模上下文选择和计划。我们实现了自动指标的最新性能。
Table-to-text generation aims at automatically generating natural text to help people to conveniently obtain the important information in tables. Although neural models for table-to-text have achieved remarkable progress, some problems still overlooked. The first is that the values recorded in many tables are mostly numbers in practice. The existing approaches do not do special treatment for these, and still regard these as words in natural language text. Secondly, the target texts in training dataset may contain redundant information or facts do not exist in the input tables. These may give wrong supervision signals to some methods based on content selection and planning and auxiliary supervision. To solve these problems, we propose two self-supervised tasks, Number Ordering and Significance Ordering, to help to learn better table representation. The former works on the column dimension to help to incorporate the size property of numbers into table representation. The latter acts on row dimension and help to learn a significance-aware table representation. We test our methods on the widely used dataset ROTOWIRE which consists of NBA game statistic and related news. The experimental results demonstrate that the model trained together with these two self-supervised tasks can generate text that contains more salient and well-organized facts, even without modeling context selection and planning. And we achieve the state-of-the-art performance on automatic metrics.