熵调节神经天然参与者批评算法的有限时间分析

论文标题

熵调节神经天然参与者批评算法的有限时间分析

Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic Algorithm

论文作者

Cayci, Semih, He, Niao, Srikant, R.

论文摘要

具有神经网络的代表力的天然参与者批评（NAC）及其变体在解决Markov的决策问题方面取得了令人印象深刻的经验成功。在本文中，我们介绍了具有神经网络近似的NAC的有限时间分析，并确定神经网络，正则化和优化技术的作用（例如，梯度裁剪和平均），以在样本复杂性，迭代复杂性，迭代复杂性和演员范围内实现良好的性能。特别是，我们证明（i）（i）通过提供足够的探索来避免近乎确定性和严格的次优政策，以及（ii）正则化导致正规化MDP的样本复杂性和网络宽度边界，从而产生有利的偏见偏见权利方面的优化，从而确保了急剧的样本复杂性和网络宽度边界。在此过程中，我们确定了由于分配转移而导致的策略优化，参与者神经网络均匀近似能力的重要性。

Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving Markov decision problems with large state spaces. In this paper, we present a finite-time analysis of NAC with neural network approximation, and identify the roles of neural networks, regularization and optimization techniques (e.g., gradient clipping and averaging) to achieve provably good performance in terms of sample complexity, iteration complexity and overparametrization bounds for the actor and the critic. In particular, we prove that (i) entropy regularization and averaging ensure stability by providing sufficient exploration to avoid near-deterministic and strictly suboptimal policies and (ii) regularization leads to sharp sample complexity and network width bounds in the regularized MDPs, yielding a favorable bias-variance tradeoff in policy optimization. In the process, we identify the importance of uniform approximation power of the actor neural network to achieve global optimality in policy optimization due to distributional shift.

下载PDF全文

下载文献需遵守相关版权规定

论文标题