识别和操纵语言模型的人格特征

论文标题

识别和操纵语言模型的人格特征

Identifying and Manipulating the Personality Traits of Language Models

论文作者

Caron, Graham, Srivastava, Shashank

论文摘要

长期以来，心理学研究探讨了人格的各个方面，例如外向，愉快的和情感稳定。诸如“五大”人格特质之类的分类通常用于评估和诊断性格类型。在这项工作中，我们探讨了一个问题，即语言模型中感知的人格是否在其语言产生中表现出一致。例如，像GPT2这样的语言模型是否可能会以一致的方式做出响应，如果被要求参加聚会？我们还调查了是否可以控制这种性格特征。我们表明，当提供不同类型的上下文（例如人格描述或有关人格特征的诊断问题的答案）时，诸如BERT和GPT2之类的语言模型可以始终如一地识别并反映这些上下文中的个性标记。这种行为说明了以高度可预测的方式操纵的能力，并将它们作为识别人格特质和控制诸如对话系统等应用程序的角色的工具。我们还为人类受试者与“五巨头”人格评估数据配对的人格描述以及从Reddit整理的个性描述数据集的人格描述进行了众包数据集。

Psychology research has long explored aspects of human personality such as extroversion, agreeableness and emotional stability. Categorizations like the `Big Five' personality traits are commonly used to assess and diagnose personality types. In this work, we explore the question of whether the perceived personality in language models is exhibited consistently in their language generation. For example, is a language model such as GPT2 likely to respond in a consistent way if asked to go out to a party? We also investigate whether such personality traits can be controlled. We show that when provided different types of contexts (such as personality descriptions, or answers to diagnostic questions about personality traits), language models such as BERT and GPT2 can consistently identify and reflect personality markers in those contexts. This behavior illustrates an ability to be manipulated in a highly predictable way, and frames them as tools for identifying personality traits and controlling personas in applications such as dialog systems. We also contribute a crowd-sourced data-set of personality descriptions of human subjects paired with their `Big Five' personality assessment data, and a data-set of personality descriptions collated from Reddit.

下载PDF全文

下载文献需遵守相关版权规定

论文标题