在与人工智能的对话中：与人类价值观的对齐语言模型

论文标题

在与人工智能的对话中：与人类价值观的对齐语言模型

In conversation with Artificial Intelligence: aligning language models with human values

论文作者

Kasirzadeh, Atoosa, Gabriel, Iason

论文摘要

大规模的语言技术越来越多地用于与人类在不同情况下的各种形式的交流中。这些技术的一种特殊用例是对话剂，它会根据提示和查询输出自然语言文本。这种参与方式提出了许多社会和道德问题。例如，将对话剂与人类规范或价值观相结合意味着什么？它们应该与哪些规范或价值观保持一致？如何实现这一目标？在本文中，我们提出了许多步骤来帮助回答这些问题。我们首先要对对话代理人和人类对话者之间语言交流的构件进行哲学分析。然后，我们使用此分析来识别和制定理想的对话规范，这些规范可以控制人类与对话代理之间的成功语言交流。此外，我们探讨了如何使用这些规范来使对话剂与在一系列不同的话语领域中的人类价值相结合。我们通过讨论我们对与这些规范和价值观一致的对话剂的设计的建议的实际含义来结束。

Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions. For example, what does it mean to align conversational agents with human norms or values? Which norms or values should they be aligned with? And how can this be accomplished? In this paper, we propose a number of steps that help answer these questions. We start by developing a philosophical analysis of the building blocks of linguistic communication between conversational agents and human interlocutors. We then use this analysis to identify and formulate ideal norms of conversation that can govern successful linguistic communication between humans and conversational agents. Furthermore, we explore how these norms can be used to align conversational agents with human values across a range of different discursive domains. We conclude by discussing the practical implications of our proposal for the design of conversational agents that are aligned with these norms and values.

下载PDF全文

下载文献需遵守相关版权规定

论文标题