UIT-VICOV19QA：COVID-19的数据集基于社区的问题回答越南语

论文标题

UIT-VICOV19QA：COVID-19的数据集基于社区的问题回答越南语

UIT-ViCoV19QA: A Dataset for COVID-19 Community-based Question Answering on Vietnamese Language

论文作者

Thai, Triet Minh, Chu, Ngan Ha-Thao, Vo, Anh Tuan, Luu, Son T.

论文摘要

在过去的两年中，从2020年到2021年，Covid-19在包括越南在内的许多国家 /地区都打破了预防疾病措施，并对人类生活和社会社区的各个方面产生了负面影响。此外，社区中的误导性信息和有关大流行的虚假新闻也是严重的情况。因此，我们提出了第一个基于越南社区的问题答复数据集，用于开发COVID-19的问题答案系统，称为UIT-VICOV19QA。该数据集由从受信任的医疗来源收集的4,500个问答对，至少有一个答案，每个问题最多有四个独特的解释答案。除数据集外，我们还建立了各种深度学习模型作为基线，以评估数据集的质量并启动基准测试结果，从而通过常用的指标，例如BLEU，Meteor和Rouge-l进行进一步研究。我们还说明了对这些模型进行多个释义答案的积极效果，尤其是在变压器上是研究领域的主要结构。

For the last two years, from 2020 to 2021, COVID-19 has broken disease prevention measures in many countries, including Vietnam, and negatively impacted various aspects of human life and the social community. Besides, the misleading information in the community and fake news about the pandemic are also serious situations. Therefore, we present the first Vietnamese community-based question answering dataset for developing question answering systems for COVID-19 called UIT-ViCoV19QA. The dataset comprises 4,500 question-answer pairs collected from trusted medical sources, with at least one answer and at most four unique paraphrased answers per question. Along with the dataset, we set up various deep learning models as baseline to assess the quality of our dataset and initiate the benchmark results for further research through commonly used metrics such as BLEU, METEOR, and ROUGE-L. We also illustrate the positive effects of having multiple paraphrased answers experimented on these models, especially on Transformer - a dominant architecture in the field of study.

下载PDF全文

下载文献需遵守相关版权规定

论文标题