论文标题

对话政策学习的自动课程学习和过度重复的罚款

Automatic Curriculum Learning With Over-repetition Penalty for Dialogue Policy Learning

论文作者

Zhao, Yangyang, Wang, Zhenyu, Huang, Zhenhua

论文摘要

基于强化学习的对话政策学习很难被应用于真实用户,以训练对话代理,因为成本很高。用户模拟器选择随机的用户目标供对话代理进行训练,被认为是真正用户的负担得起的替代品。但是,这种随机抽样方法忽略了人类学习的定律,使学习的对话政策效率低下且不稳定。我们提出了一种新颖的框架,自动课程学习的深Q-Network(ACL-DQN),该框架将传统的随机抽样方法替换为教师策略模型,以实现自动课程学习的对话政策。教师模型安排有意义的有序课程,并通过监视对话代理的学习进度和过度重复的惩罚来自动调整它,而无需任何先验知识。对话代理的学习进度反映了对话代理的能力与采样目标的样本效率难度之间的关系。过度重复的罚款保证了采样的多样性。实验表明,ACL-DQN可以显着提高对话任务的有效性和稳定性,并具有统计学上的显着余量。此外,可以通过配备不同的课程时间表来进一步改善该框架,这表明该框架具有强大的概括性。

Dialogue policy learning based on reinforcement learning is difficult to be applied to real users to train dialogue agents from scratch because of the high cost. User simulators, which choose random user goals for the dialogue agent to train on, have been considered as an affordable substitute for real users. However, this random sampling method ignores the law of human learning, making the learned dialogue policy inefficient and unstable. We propose a novel framework, Automatic Curriculum Learning-based Deep Q-Network (ACL-DQN), which replaces the traditional random sampling method with a teacher policy model to realize the dialogue policy for automatic curriculum learning. The teacher model arranges a meaningful ordered curriculum and automatically adjusts it by monitoring the learning progress of the dialogue agent and the over-repetition penalty without any requirement of prior knowledge. The learning progress of the dialogue agent reflects the relationship between the dialogue agent's ability and the sampled goals' difficulty for sample efficiency. The over-repetition penalty guarantees the sampled diversity. Experiments show that the ACL-DQN significantly improves the effectiveness and stability of dialogue tasks with a statistically significant margin. Furthermore, the framework can be further improved by equipping with different curriculum schedules, which demonstrates that the framework has strong generalizability.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源