论文标题

人工智能安全中的批判性概念

The Concept of Criticality in AI Safety

论文作者

Spielberg, Yitzhak, Azaria, Amos

论文摘要

当AI代理人不将自己的行为与人类价值观保持一致时,它们可能会造成严重伤害。解决价值一致性问题的一种方法是包括监视所有代理商行动的人类操作员。尽管事实是,该解决方案可以确保最大安全性,但它效率很低,因为它要求人类操作员将所有注意力都集中在代理商上。在本文中,我们提出了一个更有效的解决方案,该解决方案允许操作员从事其他活动而无需忽略其监视任务。在我们的方法中,AI代理只要求操作员的许可才采取关键行动,即可能有害的行动。我们介绍了有关AI安全性的关键行动的概念,并讨论了如何构建衡量行动关键性的模型。我们还讨论了如何使用操作员的反馈来使代理商更聪明。

When AI agents don't align their actions with human values they may cause serious harm. One way to solve the value alignment problem is by including a human operator who monitors all of the agent's actions. Despite the fact, that this solution guarantees maximal safety, it is very inefficient, since it requires the human operator to dedicate all of his attention to the agent. In this paper, we propose a much more efficient solution that allows an operator to be engaged in other activities without neglecting his monitoring task. In our approach the AI agent requests permission from the operator only for critical actions, that is, potentially harmful actions. We introduce the concept of critical actions with respect to AI safety and discuss how to build a model that measures action criticality. We also discuss how the operator's feedback could be used to make the agent smarter.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源