论文标题

渐近信息指导的采样

Asymptotically Optimal Information-Directed Sampling

论文作者

Kirschner, Johannes, Lattimore, Tor, Vernade, Claire, Szepesvári, Csaba

论文摘要

我们引入了一种简单有效的算法,用于随机线性斑块,其中有限的许多动作在有限的时间内具有渐近最佳的作用,并且(几乎)最差的最佳时间是最佳的。该方法基于频繁的信息指导的采样(IDS)框架,其替代信息是针对定义渐近下限的优化问题所告知的信息增益。我们的分析阐明了IDS如何平衡遗憾与信息之间的权衡,并发现了最近提出的原始偶偶有方法与IDS算法之间的令人惊讶的联系。我们从经验上证明,ID在有限的时间内与UCB具有竞争力,并且在渐近状态下可能会更好。

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Our analysis sheds light on how IDS balances the trade-off between regret and information and uncovers a surprising connection between the recently proposed primal-dual methods and the IDS algorithm. We demonstrate empirically that IDS is competitive with UCB in finite-time, and can be significantly better in the asymptotic regime.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源