论文标题

深度调节性:有效的分层复发,以改善阿拉伯语。

Deep Diacritization: Efficient Hierarchical Recurrence for Improved Arabic Diacritization

论文作者

AlKhamissi, Badr, ElNokrashy, Muhammad N., Gabr, Mohamed

论文摘要

我们提出了一种用于标记字符序列的新型结构,该序列可在Tashkeela Arabic Ducritization基准上获得最新的结果。核心是一个两级复发层次结构,比可比的传统模型分别在单词和角色级别上运行 - 实现更快的训练和推理。跨层次注意模块进一步连接了两者,并为网络解释性打开了大门。任务模块是列举变音术的有效组合的软马克斯分类器。可以使用经常性解码器扩展该体系结构,该解码器可选地接受部分测数的文本中的先验,从而改善结果。我们采用额外的技巧,例如辍学和多数投票来进一步提高最终结果。我们的最佳模型的价格为5.34%,表现优于先前的最先前,相对误差减少了30.56%。

We propose a novel architecture for labelling character sequences that achieves state-of-the-art results on the Tashkeela Arabic diacritization benchmark. The core is a two-level recurrence hierarchy that operates on the word and character levels separately---enabling faster training and inference than comparable traditional models. A cross-level attention module further connects the two, and opens the door for network interpretability. The task module is a softmax classifier that enumerates valid combinations of diacritics. This architecture can be extended with a recurrent decoder that optionally accepts priors from partially diacritized text, which improves results. We employ extra tricks such as sentence dropout and majority voting to further boost the final result. Our best model achieves a WER of 5.34%, outperforming the previous state-of-the-art with a 30.56% relative error reduction.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源