论文标题
评估对话推荐系统的合成数据集
Evaluation of Synthetic Datasets for Conversational Recommender Systems
论文作者
论文摘要
对于利用大型语言模型(LLM)的研究人员来说,在培训数据集的生成中,尤其是对于对话推荐系统 - 缺乏健壮的评估框架一直是一个长期存在的问题。在评估生成的数据的过程中,LLM在数据生成阶段中带来的效率受到阻碍,因为它通常要求人类比例确保生成的数据具有高质量并且具有足够的多样性。由于培训数据的质量对于下游应用程序至关重要,因此必须开发整体评估质量并确定偏见的指标很重要。在本文中,我们提出了一个框架,该框架采用多方面的方法来评估生成模型生产的数据集,并讨论各种评估方法的优点和局限性。
For researchers leveraging Large-Language Models (LLMs) in the generation of training datasets, especially for conversational recommender systems - the absence of robust evaluation frameworks has been a long-standing problem. The efficiency brought about by LLMs in the data generation phase is impeded during the process of evaluation of the generated data, since it generally requires human-raters to ensure that the data generated is of high quality and has sufficient diversity. Since the quality of training data is critical for downstream applications, it is important to develop metrics that evaluate the quality holistically and identify biases. In this paper, we present a framework that takes a multi-faceted approach towards evaluating datasets produced by generative models and discuss the advantages and limitations of various evaluation methods.