跨语义语义相似性匹配的多阶段蒸馏框架

论文标题

跨语义语义相似性匹配的多阶段蒸馏框架

Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity Matching

论文作者

Ding, Kunbo, Liu, Weijie, Fang, Yuejian, Zhao, Zhe, Ju, Qi, Yang, Xuefeng

论文摘要

先前的研究证明，跨语性知识蒸馏可以显着提高预训练模型的跨语义相似性匹配任务的性能。但是，在此操作中，学生模型需要很大。否则，其性能会急剧下降，从而使部署到内存有限的设备不切实际。为了解决这个问题，我们深入研究了跨语性知识蒸馏，并提出了一个多阶段蒸馏框架，用于构建一个小型但高性能的跨语性模型。在我们的框架中，将对比度学习，瓶颈和参数复发策略组合在一起，以防止在压缩过程中表现妥协。实验结果表明，我们的方法可以压缩XLM-R和微小的大小超过50 \％，而性能仅降低约1％。

Previous studies have proved that cross-lingual knowledge distillation can significantly improve the performance of pre-trained models for cross-lingual similarity matching tasks. However, the student model needs to be large in this operation. Otherwise, its performance will drop sharply, thus making it impractical to be deployed to memory-limited devices. To address this issue, we delve into cross-lingual knowledge distillation and propose a multi-stage distillation framework for constructing a small-size but high-performance cross-lingual model. In our framework, contrastive learning, bottleneck, and parameter recurrent strategies are combined to prevent performance from being compromised during the compression process. The experimental results demonstrate that our method can compress the size of XLM-R and MiniLM by more than 50\%, while the performance is only reduced by about 1%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题