论文标题
在线NO-Regret基于模型的META RL用于个性化导航
Online No-regret Model-Based Meta RL for Personalized Navigation
论文作者
论文摘要
车辆导航系统与车辆驾驶员之间的相互作用可以作为基于模型的强化学习问题进行配合,在此问题中,导航系统(代理)必须迅速适应驾驶员(环境动力学)的特性,以提供最佳的转弯驾驶指令的顺序。大多数现代的导航系统(例如Google Maps,Waze,Garmin)并非旨在个性化其低级互动,以跨广泛的驾驶方式(例如,车辆类型,反应时间,专业知识水平)为个体用户提供个性化。为了开发适应各种驾驶风格的个性化导航系统,我们提出了一种基于No-Regret模型的RL方法,该方法迅速符合当前用户的动态。当用户与之交互时,导航系统迅速构建了一个特定于用户的模型,使用模型预测性控制优化导航命令。通过以这种方式个性化策略,我们的方法能够提供与用户动态相匹配的及时驾驶说明。我们的理论分析表明,我们的方法是一种无重组算法,我们在不可知的环境中提供了收敛速率。我们使用驱动模拟器使用60多个小时的现实世界用户数据进行的经验分析表明,我们的方法可以将碰撞数量减少60%以上。
The interaction between a vehicle navigation system and the driver of the vehicle can be formulated as a model-based reinforcement learning problem, where the navigation systems (agent) must quickly adapt to the characteristics of the driver (environmental dynamics) to provide the best sequence of turn-by-turn driving instructions. Most modern day navigation systems (e.g, Google maps, Waze, Garmin) are not designed to personalize their low-level interactions for individual users across a wide range of driving styles (e.g., vehicle type, reaction time, level of expertise). Towards the development of personalized navigation systems that adapt to a variety of driving styles, we propose an online no-regret model-based RL method that quickly conforms to the dynamics of the current user. As the user interacts with it, the navigation system quickly builds a user-specific model, from which navigation commands are optimized using model predictive control. By personalizing the policy in this way, our method is able to give well-timed driving instructions that match the user's dynamics. Our theoretical analysis shows that our method is a no-regret algorithm and we provide the convergence rate in the agnostic setting. Our empirical analysis with 60+ hours of real-world user data using a driving simulator shows that our method can reduce the number of collisions by more than 60%.