论文标题
PFPN:使用粒子过滤策略网络连续控制物理模拟字符
PFPN: Continuous Control of Physically Simulated Characters using Particle Filtering Policy Network
论文作者
论文摘要
使用增强学习的基于物理性角色控制的数据驱动方法已成功地应用于产生高质量的运动。但是,现有方法通常依赖于高斯分布来表示行动策略,在解决高度明确的字符的高维连续控制问题时,可以过早承诺采取次优动作。在本文中,为了提高基于物理的角色控制器的学习性能,我们提出了一个框架,该框架将基于粒子的行动政策视为替代高斯政策。我们利用粒子过滤以动态探索和离散操作空间,并跟踪表示为混合物分布的后策略。最终的策略可以替代单峰高斯政策,该政策一直是角色控制问题的主食,而无需更改用于执行策略优化的强化学习算法的基础模型体系结构。我们证明了我们的方法在各种运动捕获模仿任务上的适用性。与使用高斯人相应的实现相比,使用基于粒子的策略的基线实现了更好的模仿性能和收敛速度,并且在角色控制过程中对外部扰动更为强大。相关代码可在以下网址提供:https://motion-lab.github.io/pfpn。
Data-driven methods for physics-based character control using reinforcement learning have been successfully applied to generate high-quality motions. However, existing approaches typically rely on Gaussian distributions to represent the action policy, which can prematurely commit to suboptimal actions when solving high-dimensional continuous control problems for highly-articulated characters. In this paper, to improve the learning performance of physics-based character controllers, we propose a framework that considers a particle-based action policy as a substitute for Gaussian policies. We exploit particle filtering to dynamically explore and discretize the action space, and track the posterior policy represented as a mixture distribution. The resulting policy can replace the unimodal Gaussian policy which has been the staple for character control problems, without changing the underlying model architecture of the reinforcement learning algorithm used to perform policy optimization. We demonstrate the applicability of our approach on various motion capture imitation tasks. Baselines using our particle-based policies achieve better imitation performance and speed of convergence as compared to corresponding implementations using Gaussians, and are more robust to external perturbations during character control. Related code is available at: https://motion-lab.github.io/PFPN.