论文标题
udpipe在esteatin 2020:上下文化的嵌入和树库嵌入
UDPipe at EvaLatin 2020: Contextualized Embeddings and Treebank Embeddings
论文作者
论文摘要
我们为评估共享任务做出了贡献,这是第一个致力于评估拉丁语工具的评估活动。我们提交了一个基于UDPIPE 2.0的系统,UDPIPE 2.0是Conll 2018共享任务的获奖者之一,2018年外部解析器评估的共享任务和Sigmorphon 2019 2019年共享任务。我们的系统首先在lemmatization和pos标记中首先放置在开放式模式中,在此允许其他监督数据,在这种情况下,我们使用所有通用的依赖性拉丁树库。在封闭的模态下,只有评估蛋白训练数据,我们的系统在Lemmatization和POS标签的经典子任务中实现了最佳性能,同时在跨流程和跨时间设置中获得了第二名。在消融实验中,我们还评估了BERT和XLM-ROBERTA上下文化的嵌入的影响,以及拉丁牛银行不同口味的Treebank编码。
We present our contribution to the EvaLatin shared task, which is the first evaluation campaign devoted to the evaluation of NLP tools for Latin. We submitted a system based on UDPipe 2.0, one of the winners of the CoNLL 2018 Shared Task, The 2018 Shared Task on Extrinsic Parser Evaluation and SIGMORPHON 2019 Shared Task. Our system places first by a wide margin both in lemmatization and POS tagging in the open modality, where additional supervised data is allowed, in which case we utilize all Universal Dependency Latin treebanks. In the closed modality, where only the EvaLatin training data is allowed, our system achieves the best performance in lemmatization and in classical subtask of POS tagging, while reaching second place in cross-genre and cross-time settings. In the ablation experiments, we also evaluate the influence of BERT and XLM-RoBERTa contextualized embeddings, and the treebank encodings of the different flavors of Latin treebanks.