通过深度加固学习学习游泳

论文标题

通过深度加固学习学习游泳

Learning swimming via deep reinforcement learning

论文作者

Zhang, Jin, Zhou, Lei, Cao, Bochao

论文摘要

几十年来，人们一直在寻找可以通过低能成本来实现水下推进的鱼类的动作。在拍打主体周围的非平稳流场的复杂性使这个问题非常困难。在较早的研究中，通常将运动模式作为某些周期函数规定，该功能在整个运动空间的小子域中限制了以下优化过程。在这项工作中，为了避免这种运动约束，变异自动编码器（VAE）旨在将各种拍打动作压缩到简单的动作向量中。然后，我们让拍打的机翼连续与水隧道环境相互作用，并通过增强学习（RL）框架相应地调整其动作。通过此自动闭环实验，我们获得了几种运动模式，与具有相同推力水平的纯谐动作相比，可以导致高流体动力效率。而且我们发现，经过许多试验和错误，当前实验中的RL培训始终会收敛于接近谐波运动的运动模式。换句话说，当前的工作证明，具有适当幅度和频率的谐波运动始终是有效水下推进的最佳选择。此外，这里提出的RL框架也可以扩展到对其他复杂游泳问题的研究，这可能为创建可以像真正的鱼一样游泳的机器人鱼铺平了道路。

For decades, people have been seeking for fishlike flapping motions that can realize underwater propulsion with low energy cost. Complexity of the nonstationary flow field around the flapping body makes this problem very difficult. In earlier studies, motion patterns are usually prescribed as certain periodic functions which constrains the following optimization process in a small subdomain of the whole motion space. In this work, to avoid this motion constraint, a variational autoencoder (VAE) is designed to compress various flapping motions into a simple action vector. Then we let a flapping airfoil continuously interact with water tunnel environment and adjust its action accordingly through a reinforcement learning (RL) framework. By this automatic close-looped experiment, we obtain several motion patterns that can result in high hydrodynamic efficiency comparing to pure harmonic motions with the same thrust level. And we find that, after numerous trials and errors, RL trainings in current experiment always converge to motion patterns that are close to harmonic motions. In other words, current work proves that harmonic motion with appropriate amplitude and frequency is always an optimal choice for efficient underwater propulsion. Furthermore, the RL framework proposed here can be also extended to the study of other complex swimming problems, which might pave the way for the creation of a robotic fish that can swim like a real fish.

下载PDF全文

下载文献需遵守相关版权规定

论文标题