用于有效的非参数匪徒探索的子采样

论文标题

用于有效的非参数匪徒探索的子采样

Sub-sampling for Efficient Non-Parametric Bandit Exploration

论文作者

Baudry, Dorian, Kaufmann, Emilie, Maillard, Odalric-Ambrym

论文摘要

在本文中，我们提出了第一种基于重新采样的多军匪徒算法，该算法同时对不同的武器家族（即伯诺利，高斯和泊松分布）同时实现了最佳的最佳遗憾。与汤普森采样不同，在每种情况下都需要在最佳的情况下指定不同的采样，我们的建议RB-SDA不需要任何依赖分布的调整。 RB-SDA属于子采样算法（SDA）的家族，该家族结合了BESA [1]和SSMC [2]算法与不同的子采样方案首先使用的子采样概念。特别是，RB-SDA使用随机块采样。我们进行了一项实验研究，以评估这种有前途的新型新颖方法的灵活性和鲁棒性。

In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions). Unlike Thompson Sampling which requires to specify a different prior to be optimal in each case, our proposal RB-SDA does not need any distribution-dependent tuning. RB-SDA belongs to the family of Sub-sampling Duelling Algorithms (SDA) which combines the sub-sampling idea first used by the BESA [1] and SSMC [2] algorithms with different sub-sampling schemes. In particular, RB-SDA uses Random Block sampling. We perform an experimental study assessing the flexibility and robustness of this promising novel approach for exploration in bandit models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题