剪切：可控的无监督文本简化

论文标题

剪切：可控的无监督文本简化

CUT: Controllable Unsupervised Text Simplification

论文作者

Kariuk, Oleg, Karamshuk, Dima

论文摘要

在本文中，我们着重于在无监督的设置中学习可控制文本简化的挑战。尽管以前已经讨论过有关监督学习算法的问题，但无监督方法中有关类比的文献却是稀缺。我们提出了两种无监督的机制来控制生成的文本的输出复杂性，即具有控制令牌（一种基于学习的方法）和简单性意识到的光束搜索（基于解码的方法）。我们表明，通过将反向翻译算法轻推以了解文本的相对简单性与其嘈杂的翻译相比，该算法自我避免自我，以产生所需复杂性的输出。这种方法在公认的基准测试中取得了竞争性能：Newsela数据集的SARI得分为46.88％，FKGL为3.65％。

In this paper, we focus on the challenge of learning controllable text simplifications in unsupervised settings. While this problem has been previously discussed for supervised learning algorithms, the literature on the analogies in unsupervised methods is scarse. We propose two unsupervised mechanisms for controlling the output complexity of the generated texts, namely, back translation with control tokens (a learning-based approach) and simplicity-aware beam search (decoding-based approach). We show that by nudging a back-translation algorithm to understand the relative simplicity of a text in comparison to its noisy translation, the algorithm self-supervises itself to produce the output of the desired complexity. This approach achieves competitive performance on well-established benchmarks: SARI score of 46.88% and FKGL of 3.65% on the Newsela dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题