论文标题
具有多样性的K臂匪徒,重新审视
Diversity-Preserving K-Armed Bandits, Revisited
论文作者
论文摘要
我们考虑了Celis等人提出的基于强盗的框架,用于提供多样性的建议。 (2019年),他在多层人士的情况下主要是通过减少线性匪徒的设置来接近它。我们使用设置的特定结构设计了UCB算法,并表明当最佳混合作用在所有动作上(即当期望多样性)带来了一些概率质量时,它在自然案例中具有有限的分布依赖性遗憾。遗憾的下限显示,否则,至少在模型毫无疑问时,会遭受$ \ ln t $遗憾。我们还讨论了一个示例以外的示例。
We consider the bandit-based framework for diversity-preserving recommendations introduced by Celis et al. (2019), who approached it in the case of a polytope mainly by a reduction to the setting of linear bandits. We design a UCB algorithm using the specific structure of the setting and show that it enjoys a bounded distribution-dependent regret in the natural cases when the optimal mixed actions put some probability mass on all actions (i.e., when diversity is desirable). The regret lower bounds provided show that otherwise, at least when the model is mean-unbounded, a $\ln T$ regret is suffered. We also discuss an example beyond the special case of polytopes.