论文标题
古兰经2022:阿拉伯语言问题通过基于伯特的模型的后处理合奏来回答阿拉伯语的问题
TCE at Qur'an QA 2022: Arabic Language Question Answering Over Holy Qur'an Using a Post-Processed Ensemble of BERT-based Models
论文作者
论文摘要
近年来,我们在使用机器学习的自然语言理解的不同任务中目睹了巨大的进步。问题回答是搜索引擎和社交媒体平台使用的这些任务之一,以改善用户体验。阿拉伯语是古兰经的语言;全球18亿人的神圣文字。由于其复杂的结构,阿拉伯语是一种自然语言处理(NLP)的具有挑战性的语言。在本文中,我们描述了我们在OSACT5古兰经2022共享任务中的尝试,这是对阿拉伯语中圣古兰经的一个问题回答挑战。我们提出了一个基于BERT模型阿拉伯语变体的合奏学习模型。此外,我们执行后处理以增强模型预测。我们的系统在官方测试集中获得了部分相互等级(PRR)的分数56.6%。
In recent years, we witnessed great progress in different tasks of natural language understanding using machine learning. Question answering is one of these tasks which is used by search engines and social media platforms for improved user experience. Arabic is the language of the Holy Qur'an; the sacred text for 1.8 billion people across the world. Arabic is a challenging language for Natural Language Processing (NLP) due to its complex structures. In this article, we describe our attempts at OSACT5 Qur'an QA 2022 Shared Task, which is a question answering challenge on the Holy Qur'an in Arabic. We propose an ensemble learning model based on Arabic variants of BERT models. In addition, we perform post-processing to enhance the model predictions. Our system achieves a Partial Reciprocal Rank (pRR) score of 56.6% on the official test set.