通过基于变压器的专家组合建立强大的质量检查系统

论文标题

通过基于变压器的专家组合建立强大的质量检查系统

Build a Robust QA System with Transformer-based Mixture of Experts

论文作者

Zhou, Yu Qing, Liu, Xixuan Julie, Dong, Yuanzhe

论文摘要

在本文中，我们旨在构建一个可以适应室外数据集的强大的问答系统。单个网络可能会过分地与培训分布中的表面相关性过高，但是凭借有意义的专家子网络，一个门控网络，一个选择每个输入的专家组合的稀疏组合，并且在专家子网络的重要性上仔细平衡了专家的重要性，因此，预报的混合物（MOE）模型允许我们可以训练多个数据的数据，从而可以训练多个数据。我们还探索了将MOE层带到Distilbert的中间的可能性，并使用稀疏激活的开关FFN层代替了密集的馈电网络，类似于交换机变压器体系结构，这简化了MOE路由算法，通过降低的通信和计算成本。除了模型架构外，我们还探索了数据增强技术，包括易于数据增强（EDA）和返回翻译，以在小的小室外培训数据之间创建更有意义的差异，从而促进我们的模型的性能和稳健性。在本文中，我们表明，最佳体系结构和数据增强技术的结合在室外评估中获得了53.477 F1的得分，这比基线的表现增长了9.52％。在最终测试集中，我们报告了更高的59.506 F1和41.651 EM。我们成功地证明了在强大的质量检查任务中，专家架构混合物的有效性。

In this paper, we aim to build a robust question answering system that can adapt to out-of-domain datasets. A single network may overfit to the superficial correlation in the training distribution, but with a meaningful number of expert sub-networks, a gating network that selects a sparse combination of experts for each input, and careful balance on the importance of expert sub-networks, the Mixture-of-Experts (MoE) model allows us to train a multi-task learner that can be generalized to out-of-domain datasets. We also explore the possibility of bringing the MoE layers up to the middle of the DistilBERT and replacing the dense feed-forward network with a sparsely-activated switch FFN layers, similar to the Switch Transformer architecture, which simplifies the MoE routing algorithm with reduced communication and computational costs. In addition to model architectures, we explore techniques of data augmentation including Easy Data Augmentation (EDA) and back translation, to create more meaningful variance among the small out-of-domain training data, therefore boosting the performance and robustness of our models. In this paper, we show that our combination of best architecture and data augmentation techniques achieves a 53.477 F1 score in the out-of-domain evaluation, which is a 9.52% performance gain over the baseline. On the final test set, we reported a higher 59.506 F1 and 41.651 EM. We successfully demonstrate the effectiveness of Mixture-of-Expert architecture in a Robust QA task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题