论文标题
干净的文本和全身变压器:Microsoft在手语翻译上的WMT22共享任务提交
Clean Text and Full-Body Transformer: Microsoft's Submission to the WMT22 Shared Task on Sign Language Translation
论文作者
论文摘要
本文介绍了微软在WMT 2022上提交的关于手语翻译的第一个共享任务,这是一项公共竞赛,涉及手语的手语,以说出瑞士德语手语的口语翻译。由于数据稀缺性,并且目标方面的词汇大小超过20k,因此该任务非常具有挑战性。此外,数据来自真实的广播新闻,包括本地签名和涵盖长视频的场景。在最近的动作识别方面的推动下,我们通过从预训练的I3D模型中提取功能并应用标准变压器网络来结合全身信息。通过对目标文本进行仔细的数据清洁,进一步提高了系统的准确性。我们在测试和开发设置中分别获得0.6和0.78的BLEU分数,这是共享任务参与者中最佳分数。同样在人类评估中,提交也达到了第一名。通过应用从唇读模型提取的功能,将BLEU得分进一步提高到DEV设置的1.08。
This paper describes Microsoft's submission to the first shared task on sign language translation at WMT 2022, a public competition tackling sign language to spoken language translation for Swiss German sign language. The task is very challenging due to data scarcity and an unprecedented vocabulary size of more than 20k words on the target side. Moreover, the data is taken from real broadcast news, includes native signing and covers scenarios of long videos. Motivated by recent advances in action recognition, we incorporate full body information by extracting features from a pre-trained I3D model and applying a standard transformer network. The accuracy of the system is further improved by applying careful data cleaning on the target text. We obtain BLEU scores of 0.6 and 0.78 on the test and dev set respectively, which is the best score among the participants of the shared task. Also in the human evaluation the submission reaches the first place. The BLEU score is further improved to 1.08 on the dev set by applying features extracted from a lip reading model.