论文标题
VIWOZ:低资源语言的多域式任务对话系统数据集
ViWOZ: A Multi-Domain Task-Oriented Dialogue Systems Dataset For Low-resource Language
论文作者
论文摘要
尽管有有趣的结果,但目前大多数以任务为导向的对话系统(TOD)都是为少数语言而设计的,例如中文和英语。因此,由于缺乏标准数据集和评估策略,它们在低资源语言中的表现仍然是一个重大问题。为了解决这个问题,我们提出了Viwoz,这是一个完全注销的越南任务对话数据集。 Viwoz是第一个多型,多域任务的多域越南语数据集,一种低资源语言。数据集由总共5,000个对话组成,其中包括60,946个完全注释的话语。此外,我们在低资源语言方案中提供了模块化和端到端模型的全面基准。通过这些特征,VIWOZ数据集可以使未来有关创建多语言任务对话系统的研究。
Most of the current task-oriented dialogue systems (ToD), despite having interesting results, are designed for a handful of languages like Chinese and English. Therefore, their performance in low-resource languages is still a significant problem due to the absence of a standard dataset and evaluation policy. To address this problem, we proposed ViWOZ, a fully-annotated Vietnamese task-oriented dialogue dataset. ViWOZ is the first multi-turn, multi-domain tasked oriented dataset in Vietnamese, a low-resource language. The dataset consists of a total of 5,000 dialogues, including 60,946 fully annotated utterances. Furthermore, we provide a comprehensive benchmark of both modular and end-to-end models in low-resource language scenarios. With those characteristics, the ViWOZ dataset enables future studies on creating a multilingual task-oriented dialogue system.