从字典中学习：非均质知识指导中文咒语检查的微调

论文标题

从字典中学习：非均质知识指导中文咒语检查的微调

Learning from the Dictionary: Heterogeneous Knowledge Guided Fine-tuning for Chinese Spell Checking

论文作者

Li, Yinghui, Ma, Shirong, Zhou, Qingyu, Li, Zhongli, Yangning, Li, Huang, Shulin, Liu, Ruiyang, Li, Chao, Cao, Yunbo, Zheng, Haitao

论文摘要

中文拼写检查（CSC）旨在检测和纠正中文拼写错误。最近的研究始于语言模型的知识知识，并将多模式信息带入CSC模型以提高性能。但是，他们忽略了字典中的丰富知识，即参考书，可以了解应该如何发音，书写和使用一个角色。在本文中，我们提出了铅框架，该框架使CSC模型从语音，视觉和含义方面从字典中学习异质知识。铅首先根据字典中字符的语音，字形和定义的知识来构建正面和负样本。然后采用统一的基于学习的培训计划来完善CSC模型的表示。对Sighan基准数据集的广泛实验和详细分析证明了我们提出的方法的有效性。

Chinese Spell Checking (CSC) aims to detect and correct Chinese spelling errors. Recent researches start from the pretrained knowledge of language models and take multimodal information into CSC models to improve the performance. However, they overlook the rich knowledge in the dictionary, the reference book where one can learn how one character should be pronounced, written, and used. In this paper, we propose the LEAD framework, which renders the CSC model to learn heterogeneous knowledge from the dictionary in terms of phonetics, vision, and meaning. LEAD first constructs positive and negative samples according to the knowledge of character phonetics, glyphs, and definitions in the dictionary. Then a unified contrastive learning-based training scheme is employed to refine the representations of the CSC models. Extensive experiments and detailed analyses on the SIGHAN benchmark datasets demonstrate the effectiveness of our proposed methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题