第二次思考，让我们不要逐步思考！零射线推理中的偏见和毒性

论文标题

第二次思考，让我们不要逐步思考！零射线推理中的偏见和毒性

On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning

论文作者

Shaikh, Omar, Zhang, Hongxin, Held, William, Bernstein, Michael, Yang, Diyi

论文摘要

已经证明，生成的思想链（COT）可以始终如一地在广泛的NLP任务上提高大型语言模型（LLM）的性能。但是，先前的工作主要集中在逻辑推理任务上（例如算术，常识质量质量标准）；尚不清楚改进是否适用于更多种类型的推理，尤其是在社会上的环境中。具体而言，我们对两个对社会敏感的领域进行了零镜床的受控评估：有害问题和刻板印象的基准。我们发现，敏感域中的零射cot推理显着增加了模型产生有害或不良输出的可能性，并且趋势跨越了不同的及时格式和模型变体。此外，我们表明有害婴儿随着模型的大小而增加，但随后随着指示的改进而减少。我们的工作表明，应谨慎使用零镜头，尤其是在涉及边缘化的群体或敏感主题时。

Generating a Chain of Thought (CoT) has been shown to consistently improve large language model (LLM) performance on a wide range of NLP tasks. However, prior work has mainly focused on logical reasoning tasks (e.g. arithmetic, commonsense QA); it remains unclear whether improvements hold for more diverse types of reasoning, especially in socially situated contexts. Concretely, we perform a controlled evaluation of zero-shot CoT across two socially sensitive domains: harmful questions and stereotype benchmarks. We find that zero-shot CoT reasoning in sensitive domains significantly increases a model's likelihood to produce harmful or undesirable output, with trends holding across different prompt formats and model variants. Furthermore, we show that harmful CoTs increase with model size, but decrease with improved instruction following. Our work suggests that zero-shot CoT should be used with caution on socially important tasks, especially when marginalized groups or sensitive topics are involved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题