论文标题
争论性的奖励学习:关于人类偏好的推理
Argumentative Reward Learning: Reasoning About Human Preferences
论文作者
论文摘要
我们定义了一个新颖的神经符号框架,论证奖励学习,该奖励学习结合了基于偏好的论点与现有的方法,以增强人类反馈。我们的方法通过推广人类的偏好,减轻用户的负担并增加奖励模型的鲁棒性来改善先前的工作。我们通过许多实验证明了这一点。
We define a novel neuro-symbolic framework, argumentative reward learning, which combines preference-based argumentation with existing approaches to reinforcement learning from human feedback. Our method improves prior work by generalising human preferences, reducing the burden on the user and increasing the robustness of the reward model. We demonstrate this with a number of experiments.