信号时间逻辑目标的易于加固学习

论文标题

信号时间逻辑目标的易于加固学习

Tractable Reinforcement Learning of Signal Temporal Logic Objectives

论文作者

Venkataraman, Harish, Aksaray, Derya, Seiler, Peter

论文摘要

信号时间逻辑（STL）是一种表达语言，用于指定时间限制现实世界的机器人任务和安全规范。最近，人们有兴趣通过增强学习（RL）来满足STL规格的最佳政策。学习满足STL规格通常需要足够长的状态历史来计算奖励和下一步行动。对历史的需求导致了学习问题的指数状态空间增长。因此，对于大多数现实世界应用，学习问题在计算上变得棘手。在本文中，我们提出了一种紧凑的手段，以在新的增强状态空间表示中捕获状态历史。在新的增强状态空间中提出并解决了与目标（最大化满意度的概率）的近似值。我们显示了近似解决方案的性能结合，并通过模拟将其与现有技术的解决方案进行了比较。

Signal temporal logic (STL) is an expressive language to specify time-bound real-world robotic tasks and safety specifications. Recently, there has been an interest in learning optimal policies to satisfy STL specifications via reinforcement learning (RL). Learning to satisfy STL specifications often needs a sufficient length of state history to compute reward and the next action. The need for history results in exponential state-space growth for the learning problem. Thus the learning problem becomes computationally intractable for most real-world applications. In this paper, we propose a compact means to capture state history in a new augmented state-space representation. An approximation to the objective (maximizing probability of satisfaction) is proposed and solved for in the new augmented state-space. We show the performance bound of the approximate solution and compare it with the solution of an existing technique via simulations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题