论文标题
固定置信度设置中级联匪徒的最佳手臂识别
Best Arm Identification for Cascading Bandits in the Fixed Confidence Setting
论文作者
论文摘要
我们设计和分析Cascadebai是一种在级联匪徒框架内找到最佳$ k $物品(也称为手臂)的算法。 Cascadebai时间复杂性的上限是通过克服关键的分析挑战来得出的,即概率估算每个步骤中可用反馈的量。为此,我们定义了一类新的随机变量(R.V.),我们称其为左侧次高斯r.v.;这些是R.V.的累积生成函数(CGF)只能由CGF的非阳性参数限制。这使得伯恩斯坦型浓度不平等达到足够紧密的应用。我们通过推导下界的时间复杂性来显示Cascadebai的性能在某些实际制度中是最佳的。最后,广泛的数值模拟证实了Cascadebai的功效以及我们上部对其时间复杂性的紧密度。
We design and analyze CascadeBAI, an algorithm for finding the best set of $K$ items, also called an arm, within the framework of cascading bandits. An upper bound on the time complexity of CascadeBAI is derived by overcoming a crucial analytical challenge, namely, that of probabilistically estimating the amount of available feedback at each step. To do so, we define a new class of random variables (r.v.'s) which we term as left-sided sub-Gaussian r.v.'s; these are r.v.'s whose cumulant generating functions (CGFs) can be bounded by a quadratic only for non-positive arguments of the CGFs. This enables the application of a sufficiently tight Bernstein-type concentration inequality. We show, through the derivation of a lower bound on the time complexity, that the performance of CascadeBAI is optimal in some practical regimes. Finally, extensive numerical simulations corroborate the efficacy of CascadeBAI as well as the tightness of our upper bound on its time complexity.