用软掩盖的伯特纠正拼写错误

论文标题

用软掩盖的伯特纠正拼写错误

Spelling Error Correction with Soft-Masked BERT

论文作者

Zhang, Shaohua, Huang, Haoran, Liu, Jicong, Li, Hang

论文摘要

拼写误差校正是一项重要但具有挑战性的任务，因为它令人满意的解决方案本质上需要人级的语言理解能力。在本文中，我们考虑了中文拼写误差校正（CSC）。该任务的最先进方法是根据语言表示模型Bert（伯特（Bert））在句子的每个位置中从校正列表中选择一个字符（包括非纠正）。但是，该方法的准确性可以是最佳的，因为BERT没有足够的能力来检测每个位置是否存在错误，这显然是由于使用蒙版语言建模进行预训练的方式。在这项工作中，我们提出了一种新型的神经体系结构来解决上述问题，该问题由一个用于错误检测的网络和基于BERT的错误校正网络组成，而前者则与我们所说的软罩技术连接到后者。我们使用“软掩盖的伯特”的方法是一般的，并且可以在其他语言检测校正问题中使用。两个数据集上的实验结果表明，我们提出的方法的性能明显优于基准，包括仅基于BERT的基准。

Spelling error correction is an important yet challenging task because a satisfactory solution of it essentially needs human-level language understanding ability. Without loss of generality we consider Chinese spelling error correction (CSC) in this paper. A state-of-the-art method for the task selects a character from a list of candidates for correction (including non-correction) at each position of the sentence on the basis of BERT, the language representation model. The accuracy of the method can be sub-optimal, however, because BERT does not have sufficient capability to detect whether there is an error at each position, apparently due to the way of pre-training it using mask language modeling. In this work, we propose a novel neural architecture to address the aforementioned issue, which consists of a network for error detection and a network for error correction based on BERT, with the former being connected to the latter with what we call soft-masking technique. Our method of using `Soft-Masked BERT' is general, and it may be employed in other language detection-correction problems. Experimental results on two datasets demonstrate that the performance of our proposed method is significantly better than the baselines including the one solely based on BERT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题