论文标题
通过政策和奖励成型进行无人机控制的强化学习
Reinforcement Learning for UAV control with Policy and Reward Shaping
论文作者
论文摘要
近年来,无人驾驶飞机(UAV)相关的技术扩大了该地区的知识,引发了需要解决方案的新问题和挑战。此外,由于该技术允许人们通常进行自动化的过程,因此在工业领域的需求非常大。这些车辆的自动化已在文献中解决,采用了不同的机器学习策略。增强学习(RL)是一种自动化框架,经常用于培训自主剂。 RL是一种机器学习范式,其中代理与环境互动以解决给定的任务。但是,自主学习可能会很耗时,计算上昂贵,并且在高度复杂的情况下可能不实用。交互式增强学习使外部培训师可以在学习任务时向代理商提供建议。在这项研究中,我们着手教授RL代理,同时使用奖励形成和策略成型技术来控制无人机。提出了两个模拟方案进行培训。一个没有障碍,一个有障碍。我们还研究了每种技术的影响。结果表明,与仅使用基于策略的方法训练的代理商同时培训的代理商同时培训的代理人获得的奖励较低。然而,代理商在培训期间的执行时间较低,分散较少。
In recent years, unmanned aerial vehicle (UAV) related technology has expanded knowledge in the area, bringing to light new problems and challenges that require solutions. Furthermore, because the technology allows processes usually carried out by people to be automated, it is in great demand in industrial sectors. The automation of these vehicles has been addressed in the literature, applying different machine learning strategies. Reinforcement learning (RL) is an automation framework that is frequently used to train autonomous agents. RL is a machine learning paradigm wherein an agent interacts with an environment to solve a given task. However, learning autonomously can be time consuming, computationally expensive, and may not be practical in highly-complex scenarios. Interactive reinforcement learning allows an external trainer to provide advice to an agent while it is learning a task. In this study, we set out to teach an RL agent to control a drone using reward-shaping and policy-shaping techniques simultaneously. Two simulated scenarios were proposed for the training; one without obstacles and one with obstacles. We also studied the influence of each technique. The results show that an agent trained simultaneously with both techniques obtains a lower reward than an agent trained using only a policy-based approach. Nevertheless, the agent achieves lower execution times and less dispersion during training.