将GPT-3的创造力放在（替代用途）测试中

论文标题

将GPT-3的创造力放在（替代用途）测试中

Putting GPT-3's Creativity to the (Alternative Uses) Test

论文作者

Stevenson, Claire, Smal, Iris, Baas, Matthijs, Grasman, Raoul, van der Maas, Han

论文摘要

AI大型语言模型（共同）制作了从报纸文章到小说和诗歌的惊人书面作品。这些作品符合创造力的标准定义的标准：原始和有用，有时甚至是惊喜的附加要素。但是，旨在预测下一个文本片段的大型语言模型是否可以提供仍然可以解决手头问题的创意，开箱即用的响应？我们将开放性AI的生成自然语言模型GPT-3投入了测试。它可以为创造力研究中最常用的测试之一提供创造性解决方案吗？我们评估了GPT-3对Guilford的替代用途测试的创造力，并将其表现与以前收集的人类对原创性，有用性和响应的惊喜，每一组思想的灵活性以及一种自动化方法的反应以及基于响应之间的响应之间的语义距离和所涉及的AUT对象之间的自动化方法进行了比较。我们的结果表明，总体而言，目前在创意输出方面，人类当前的表现要优于GPT-3。但是，我们认为GPT-3赶上这项特定任务只是时间问题。我们讨论了这项工作揭示的有关人类和人工智能创造力，创造力测试以及我们对创造力的定义的内容。

AI large language models have (co-)produced amazing written works from newspaper articles to novels and poetry. These works meet the standards of the standard definition of creativity: being original and useful, and sometimes even the additional element of surprise. But can a large language model designed to predict the next text fragment provide creative, out-of-the-box, responses that still solve the problem at hand? We put Open AI's generative natural language model, GPT-3, to the test. Can it provide creative solutions to one of the most commonly used tests in creativity research? We assessed GPT-3's creativity on Guilford's Alternative Uses Test and compared its performance to previously collected human responses on expert ratings of originality, usefulness and surprise of responses, flexibility of each set of ideas as well as an automated method to measure creativity based on the semantic distance between a response and the AUT object in question. Our results show that -- on the whole -- humans currently outperform GPT-3 when it comes to creative output. But, we believe it is only a matter of time before GPT-3 catches up on this particular task. We discuss what this work reveals about human and AI creativity, creativity testing and our definition of creativity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题