论文标题
Tetim-eval:一个新颖的策划评估数据集,用于比较文本图像模型
TeTIm-Eval: a novel curated evaluation data set for comparing text-to-image models
论文作者
论文摘要
评估和比较文本图像模型是一个具有挑战性的问题。最近在该领域取得了重大进展,激发了各个工业部门的兴趣。结果,该领域的黄金标准应涵盖各种任务和应用程序上下文。在本文中,根据以下基础,对一种新颖的评估方法进行了实验:(i)由高质量的免版税图形文本对制成的策划数据集,分为十个类别; (ii)定量度量标准,即剪辑得分,(iii)人类评估任务,以区分给定文本,真实图像和生成的图像。所提出的方法已应用于最新模型,即dalle2,潜扩散,稳定扩散,滑行和craionon。早期实验结果表明,人类判断的准确性与剪辑得分完全相干。该数据集已向公众提供。
Evaluating and comparing text-to-image models is a challenging problem. Significant advances in the field have recently been made, piquing interest of various industrial sectors. As a consequence, a gold standard in the field should cover a variety of tasks and application contexts. In this paper a novel evaluation approach is experimented, on the basis of: (i) a curated data set, made by high-quality royalty-free image-text pairs, divided into ten categories; (ii) a quantitative metric, the CLIP-score, (iii) a human evaluation task to distinguish, for a given text, the real and the generated images. The proposed method has been applied to the most recent models, i.e., DALLE2, Latent Diffusion, Stable Diffusion, GLIDE and Craiyon. Early experimental results show that the accuracy of the human judgement is fully coherent with the CLIP-score. The dataset has been made available to the public.