语言产生模型可能会造成伤害：那么我们该怎么办？可行的调查

论文标题

语言产生模型可能会造成伤害：那么我们该怎么办？可行的调查

Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey

论文作者

Kumar, Sachin, Balachandran, Vidhisha, Njoo, Lucille, Anastasopoulos, Antonios, Tsvetkov, Yulia

论文摘要

大语言模型产生类似人类文本的能力的最新进展导致它们在面向用户的设置中的采用增加。同时，这些改进促使他们引入的社会危害风险，无论是无意或恶意的。几项研究探索了这些危害，并呼吁通过开发更安全，更公平的模型来缓解它们。这项工作超越了列举危害的风险，提供了解决潜在威胁和语言产生模型的社会危害的实际方法的调查。我们借鉴了几项先前的作品的语言模型风险分类法，以提出检测和改善语言发生器的不同种类风险/危害的策略的结构化概述。这项调查弥合了各种各样的研究，旨在作为LM研究人员和从业人员的实用指南，并解释了不同的缓解策略的动机，其局限性以及未来研究的开放问题。

Recent advances in the capacity of large language models to generate human-like text have resulted in their increased adoption in user-facing settings. In parallel, these improvements have prompted a heated discourse around the risks of societal harms they introduce, whether inadvertent or malicious. Several studies have explored these harms and called for their mitigation via development of safer, fairer models. Going beyond enumerating the risks of harms, this work provides a survey of practical methods for addressing potential threats and societal harms from language generation models. We draw on several prior works' taxonomies of language model risks to present a structured overview of strategies for detecting and ameliorating different kinds of risks/harms of language generators. Bridging diverse strands of research, this survey aims to serve as a practical guide for both LM researchers and practitioners, with explanations of different mitigation strategies' motivations, their limitations, and open problems for future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题