论文标题
神经机器翻译的层多视图学习
Layer-Wise Multi-View Learning for Neural Machine Translation
论文作者
论文摘要
传统的神经机器翻译仅限于最高编码器层的上下文表示,并且不能直接感知下部编码器层。现有的解决方案通常依赖于网络体系结构的调整,从而使计算更加复杂或引入其他结构限制。在这项工作中,我们提出了层次多视图学习,以解决此问题,规避改变模型结构的必要性。我们将每个编码器层的现成输出(逐层编码中的副产品)视为输入句子的冗余视图。这样,除了最上方的编码层(称为主要视图)外,我们还将中间编码层作为辅助视图。我们将两个视图馈送到部分共享的解码器中以维持独立的预测。基于KL差异的一致性正规化用于鼓励两种观点相互学习。对五个翻译任务的广泛实验结果表明,我们的方法对多个强基础可实现稳定的改进。作为另一个奖励,我们的方法对网络体系结构不可知,并且可以保持与原始模型相同的推理速度。
Traditional neural machine translation is limited to the topmost encoder layer's context representation and cannot directly perceive the lower encoder layers. Existing solutions usually rely on the adjustment of network architecture, making the calculation more complicated or introducing additional structural restrictions. In this work, we propose layer-wise multi-view learning to solve this problem, circumventing the necessity to change the model structure. We regard each encoder layer's off-the-shelf output, a by-product in layer-by-layer encoding, as the redundant view for the input sentence. In this way, in addition to the topmost encoder layer (referred to as the primary view), we also incorporate an intermediate encoder layer as the auxiliary view. We feed the two views to a partially shared decoder to maintain independent prediction. Consistency regularization based on KL divergence is used to encourage the two views to learn from each other. Extensive experimental results on five translation tasks show that our approach yields stable improvements over multiple strong baselines. As another bonus, our method is agnostic to network architectures and can maintain the same inference speed as the original model.