论文标题

重新访问跨语言嵌入的上下文窗口

Revisiting the Context Window for Cross-lingual Word Embeddings

论文作者

Ri, Ryokan, Tsuruoka, Yoshimasa

论文摘要

现有的基于映射的跨语性单词嵌入的方法基于以下假设:源和目标嵌入空间在结构上相似。嵌入空间的结构在很大程度上取决于每个单词的共发生统计信息,而上下文窗口的选择。尽管上下文窗口和基于映射的跨语性嵌入之间存在这种明显的联系,但在先前的工作中,它们的关系尚未得到充实。在这项工作中,我们通过不同上下文窗口训练的双语嵌入方式,以各种语言,域和任务提供彻底的评估。我们发现的重点是,增加源窗口和目标窗口大小的大小可改善双语词典诱导的性能,尤其是在频繁名词上的性能。

Existing approaches to mapping-based cross-lingual word embeddings are based on the assumption that the source and target embedding spaces are structurally similar. The structures of embedding spaces largely depend on the co-occurrence statistics of each word, which the choice of context window determines. Despite this obvious connection between the context window and mapping-based cross-lingual embeddings, their relationship has been underexplored in prior work. In this work, we provide a thorough evaluation, in various languages, domains, and tasks, of bilingual embeddings trained with different context windows. The highlight of our findings is that increasing the size of both the source and target window sizes improves the performance of bilingual lexicon induction, especially the performance on frequent nouns.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源