论文标题

土匪具有部分可观察的混杂数据

Bandits with Partially Observable Confounded Data

论文作者

Tennenholtz, Guy, Shalit, Uri, Mannor, Shie, Efroni, Yonathan

论文摘要

我们研究线性上下文匪徒,可以访问从某些固定策略中取样的大型,混杂的离线数据集。我们表明,此问题与侧面信息的强盗问题的变体密切相关。我们构建了一种利用投影信息的线性强盗算法,并证明了后悔的界限。我们的结果表明了利用混杂的离线数据的能力。特别是,我们证明了遗憾的界限,这些界限通过与数据中上下文的可见维度相关的因素来改善当前界限。我们的结果表明,混淆的离线数据可以显着改善在线学习算法。最后,我们通过合成模拟展示了我们方法的各种特征。

We study linear contextual bandits with access to a large, confounded, offline dataset that was sampled from some fixed policy. We show that this problem is closely related to a variant of the bandit problem with side information. We construct a linear bandit algorithm that takes advantage of the projected information, and prove regret bounds. Our results demonstrate the ability to take advantage of confounded offline data. Particularly, we prove regret bounds that improve current bounds by a factor related to the visible dimensionality of the contexts in the data. Our results indicate that confounded offline data can significantly improve online learning algorithms. Finally, we demonstrate various characteristics of our approach through synthetic simulations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源