论文标题
几乎具有共享表示的线性匪徒的最小算法
Nearly Minimax Algorithms for Linear Bandits with Shared Representation
论文作者
论文摘要
我们为具有共同表示形式的多任务和终身线性土匪提供新颖的算法。具体来说,我们考虑播放带有尺寸$ d $的$ M $线性土匪的设置,每个$ t $ rounds,这些$ M $ bandit任务共享一个常见的$ k(\ ll d)$尺寸线性表示。对于我们同时播放任务的多任务设置,以及我们依次播放任务的终身设置,我们提出了实现$ \ widetilde {o} \ left的新颖算法缩小现有结果的差距[Yang等,2021]。我们的主要技术包括针对低级别线性提取器的更有效的估计器以及该估计器的随附的新颖分析。
We give novel algorithms for multi-task and lifelong linear bandits with shared representation. Specifically, we consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(\ll d)$ dimensional linear representation. For both the multi-task setting where we play the tasks concurrently, and the lifelong setting where we play tasks sequentially, we come up with novel algorithms that achieve $\widetilde{O}\left(d\sqrt{kMT} + kM\sqrt{T}\right)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors and closes the gap in existing results [Yang et al., 2021]. Our main technique include a more efficient estimator for the low-rank linear feature extractor and an accompanied novel analysis for this estimator.