在工作中学习：人类机器人互动中的长期行为适应

论文标题

在工作中学习：人类机器人互动中的长期行为适应

Learning on the Job: Long-Term Behavioural Adaptation in Human-Robot Interactions

论文作者

Del Duchetto, Francesco, Hanheide, Marc

论文摘要

在这项工作中，我们提出了一个框架，以允许在公共空间中长时间部署的自主机器人，以从用户交互中适应自己的行为。机器人行为计划嵌入了加固学习（RL）框架中，其中目标是最大程度地提高交互过程中用户参与度的水平。我们使用限制信心的价值题（UCBVI）算法，该算法提供了一种有用的方法来管理探索 - 探索 - 实时互动的权衡。经过训练的委托模型在策略执行期间实时生成奖励功能。我们在英国林肯市的一个公共博物馆中测试了这种方法，该博物馆被部署为游客的导游。结果表明，经过几个月的探索，机器人政策学会了维持用户的参与时间更长的时间，而巡回演出期间访问的物品数量的最初静态政策比最初的静态政策增加了22.8％，完成旅行的可能性增加了30％。这项工作是在社交环境中的机器人应用长期适应行为适应的有前途的一步。

In this work, we propose a framework for allowing autonomous robots deployed for extended periods of time in public spaces to adapt their own behaviour online from user interactions. The robot behaviour planning is embedded in a Reinforcement Learning (RL) framework, where the objective is maximising the level of overall user engagement during the interactions. We use the Upper-Confidence-Bound Value-Iteration (UCBVI) algorithm, which gives a helpful way of managing the exploration-exploitation trade-off for real-time interactions. An engagement model trained end-to-end generates the reward function in real-time during policy execution. We test this approach in a public museum in Lincoln (UK), where the robot is deployed as a tour guide for the visitors. Results show that after a couple of months of exploration, the robot policy learned to maintain the engagement of users for longer, with an increase of 22.8% over the initial static policy in the number of items visited during the tour and a 30% increase in the probability of completing the tour. This work is a promising step toward behavioural adaptation in long-term scenarios for robotics applications in social settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题