渐近信息指导的采样

论文标题

渐近信息指导的采样

Asymptotically Optimal Information-Directed Sampling

论文作者

Kirschner, Johannes, Lattimore, Tor, Vernade, Claire, Szepesvári, Csaba

论文摘要

我们引入了一种简单有效的算法，用于随机线性斑块，其中有限的许多动作在有限的时间内具有渐近最佳的作用，并且（几乎）最差的最佳时间是最佳的。该方法基于频繁的信息指导的采样（IDS）框架，其替代信息是针对定义渐近下限的优化问题所告知的信息增益。我们的分析阐明了IDS如何平衡遗憾与信息之间的权衡，并发现了最近提出的原始偶偶有方法与IDS算法之间的令人惊讶的联系。我们从经验上证明，ID在有限的时间内与UCB具有竞争力，并且在渐近状态下可能会更好。

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Our analysis sheds light on how IDS balances the trade-off between regret and information and uncovers a surprising connection between the recently proposed primal-dual methods and the IDS algorithm. We demonstrate empirically that IDS is competitive with UCB in finite-time, and can be significantly better in the asymptotic regime.

下载PDF全文

下载文献需遵守相关版权规定

论文标题