敏捷四光飞行的学习控制政策的基准比较

论文标题

敏捷四光飞行的学习控制政策的基准比较

A Benchmark Comparison of Learned Control Policies for Agile Quadrotor Flight

论文作者

Kaufmann, Elia, Bauersfeld, Leonard, Scaramuzza, Davide

论文摘要

四型是高度非线性的动力学系统，需要仔细调谐控制器才能将其推向其物理极限。最近，已经提出了基于学习的控制策略，用于四型二次，因为它们有可能允许从高维原始感觉观察到动作的直接映射到直接映射。由于样本效率低下，在真实平台上进行的这种学识渊博的控制器是不切实际的，甚至是不可能的。模拟中的培训很有吸引力，但需要在域之间传递政策，这要求训练有素的政策对此类域间隙具有稳健性。在这项工作中，我们做出了两项贡献：（i）我们对现有的敏捷四型飞行的现有学习控制政策进行了第一个基准比较，并表明训练一项控制政策，该政策指挥身体和推力，从而导致更强大的SIM到现实转移，而与直接指定的单个转子的直接指定的单个旋转式的策略相比，（ii）我们的第一次控制策略可以通过策略进行了深入的控制，我们可以通过控制策略来进行挑战，以培训一定的策略，以培训一项艰难的策略，以培训一项艰难的行动，以培训一项艰巨的培训。超过45公里/小时的速度实验。

Quadrotors are highly nonlinear dynamical systems that require carefully tuned controllers to be pushed to their physical limits. Recently, learning-based control policies have been proposed for quadrotors, as they would potentially allow learning direct mappings from high-dimensional raw sensory observations to actions. Due to sample inefficiency, training such learned controllers on the real platform is impractical or even impossible. Training in simulation is attractive but requires to transfer policies between domains, which demands trained policies to be robust to such domain gap. In this work, we make two contributions: (i) we perform the first benchmark comparison of existing learned control policies for agile quadrotor flight and show that training a control policy that commands body-rates and thrust results in more robust sim-to-real transfer compared to a policy that directly specifies individual rotor thrusts, (ii) we demonstrate for the first time that such a control policy trained via deep reinforcement learning can control a quadrotor in real-world experiments at speeds over 45km/h.

下载PDF全文

下载文献需遵守相关版权规定

论文标题