论文标题

用于有效的非参数匪徒探索的子采样

Sub-sampling for Efficient Non-Parametric Bandit Exploration

论文作者

Baudry, Dorian, Kaufmann, Emilie, Maillard, Odalric-Ambrym

论文摘要

在本文中,我们提出了第一种基于重新采样的多军匪徒算法,该算法同时对不同的武器家族(即伯诺利,高斯和泊松分布)同时实现了最佳的最佳遗憾。与汤普森采样不同,在每种情况下都需要在最佳的情况下指定不同的采样,我们的建议RB-SDA不需要任何依赖分布的调整。 RB-SDA属于子采样算法(SDA)的家族,该家族结合了BESA [1]和SSMC [2]算法与不同的子采样方案首先使用的子采样概念。特别是,RB-SDA使用随机块采样。我们进行了一项实验研究,以评估这种有前途的新型新颖方法的灵活性和鲁棒性。

In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions). Unlike Thompson Sampling which requires to specify a different prior to be optimal in each case, our proposal RB-SDA does not need any distribution-dependent tuning. RB-SDA belongs to the family of Sub-sampling Duelling Algorithms (SDA) which combines the sub-sampling idea first used by the BESA [1] and SSMC [2] algorithms with different sub-sampling schemes. In particular, RB-SDA uses Random Block sampling. We perform an experimental study assessing the flexibility and robustness of this promising novel approach for exploration in bandit models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源