一种新型的熵最大化基于TD3的强化学习，用于自动PID调整

论文标题

一种新型的熵最大化基于TD3的强化学习，用于自动PID调整

A Novel Entropy-Maximizing TD3-based Reinforcement Learning for Automatic PID Tuning

论文作者

Chowdhury, Myisha A., Lu, Qiugang

论文摘要

比例综合衍生（PID）控制器已被广泛用于过程行业。但是，PID控制器的令人满意的控制性能在很大程度上取决于调谐参数。常规的PID调整方法需要对系统模型的广泛了解，这并不总是在复杂的动态系统中尤其是知道的。相比之下，基于增强的基于学习的PID调整已获得了普及，因为它可以将PID调整视为黑盒问题，并提供最佳的PID参数，而无需明确的过程模型。在本文中，我们提出了一种新颖的熵最大化双胞胎延迟的深层确定性策略梯度（EMTD3）方法，用于自动化PID调整。在拟议的方法中，一开始就采用了熵最大化随机演员，以鼓励对动作空间的探索。然后，将确定性参与者部署以专注于本地剥削并发现最佳解决方案。熵最大化项的结合可以显着提高样品效率，并有助于快速收敛到全球溶液。我们提出的方法应用于二阶系统的PID调整，以验证其在提高样品效率和发现与传统TD3相比的最佳PID参数方面的有效性。

Proportional-integral-derivative (PID) controllers have been widely used in the process industry. However, the satisfactory control performance of a PID controller depends strongly on the tuning parameters. Conventional PID tuning methods require extensive knowledge of the system model, which is not always known especially in the case of complex dynamical systems. In contrast, reinforcement learning-based PID tuning has gained popularity since it can treat PID tuning as a black-box problem and deliver the optimal PID parameters without requiring explicit process models. In this paper, we present a novel entropy-maximizing twin-delayed deep deterministic policy gradient (EMTD3) method for automating the PID tuning. In the proposed method, an entropy-maximizing stochastic actor is employed at the beginning to encourage the exploration of the action space. Then a deterministic actor is deployed to focus on local exploitation and discover the optimal solution. The incorporation of the entropy-maximizing term can significantly improve the sample efficiency and assist in fast convergence to the global solution. Our proposed method is applied to the PID tuning of a second-order system to verify its effectiveness in improving the sample efficiency and discovering the optimal PID parameters compared to traditional TD3.

下载PDF全文

下载文献需遵守相关版权规定

论文标题