通过深度加固学习在一个小型的空中机器人中倒置，以触发和控制旋转操作

论文标题

通过深度加固学习在一个小型的空中机器人中倒置，以触发和控制旋转操作

Inverted Landing in a Small Aerial Robot via Deep Reinforcement Learning for Triggering and Control of Rotational Maneuvers

论文作者

Habas, Bryan, Langelaan, Jack W., Cheng, Bo

论文摘要

对于空中机器人来说，以快速而健壮的方式倒置是一项艰巨的壮举，尤其是完全取决于板载和计算。尽管如此，这项壮举通常由蝙蝠，苍蝇和蜜蜂等生物传单进行。我们以前的工作已经确定了一系列板载视觉提示与运动学动作之间的直接因果关系，这些关系允许在小型空中机器人中可靠地执行这种具有挑战性的特技操纵。在这项工作中，我们首先利用深入的强化学习和基于物理的模拟来获得从任何任意方法条件开始的一般最佳控制策略，用于稳健的倒置着陆。这种优化的控制策略提供了从系统的观察空间到其电动机命令动作空间的计算效率映射，包括触发和控制旋转操作。这是通过训练系统在大量和方向变化的各种进近飞行速度上进行训练来完成的。接下来，我们通过在模拟中改变了机器人的惯性参数，通过域随机化对学习策略进行了模拟策略的传输和实验验证。通过实验试验，我们确定了几个主要因素，这些因素极大地改善了着陆鲁棒性和确定倒置成功的主要机制。我们预计，本研究中开发的学习框架可以推广以解决更具挑战性的任务，例如利用嘈杂的板载感觉数据，降落在各种方向的表面上或降落在动态移动的表面上。

Inverted landing in a rapid and robust manner is a challenging feat for aerial robots, especially while depending entirely on onboard sensing and computation. In spite of this, this feat is routinely performed by biological fliers such as bats, flies, and bees. Our previous work has identified a direct causal connection between a series of onboard visual cues and kinematic actions that allow for reliable execution of this challenging aerobatic maneuver in small aerial robots. In this work, we first utilized Deep Reinforcement Learning and a physics-based simulation to obtain a general, optimal control policy for robust inverted landing starting from any arbitrary approach condition. This optimized control policy provides a computationally-efficient mapping from the system's observational space to its motor command action space, including both triggering and control of rotational maneuvers. This was done by training the system over a large range of approach flight velocities that varied with magnitude and direction. Next, we performed a sim-to-real transfer and experimental validation of the learned policy via domain randomization, by varying the robot's inertial parameters in the simulation. Through experimental trials, we identified several dominant factors which greatly improved landing robustness and the primary mechanisms that determined inverted landing success. We expect the learning framework developed in this study can be generalized to solve more challenging tasks, such as utilizing noisy onboard sensory data, landing on surfaces of various orientations, or landing on dynamically-moving surfaces.

下载PDF全文

下载文献需遵守相关版权规定

论文标题