在线非模块化最小化，成本延迟：从完整信息到强盗反馈

论文标题

在线非模块化最小化，成本延迟：从完整信息到强盗反馈

Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback

论文作者

Lin, Tianyi, Pacchiano, Aldo, Yu, Yaodong, Jordan, Michael I.

论文摘要

通过稀疏估算和贝叶斯优化的在线学习的申请，我们考虑了在线不受限制的非管道最小化的问题，并且在全部信息和强盗反馈设置中都延迟成本。与以前关于在线不受限制的下管最小化的作品相反，我们专注于具有特殊结构的一类非模型功能，并证明了在线和静态场景中的几种在线和近似在线强盗梯度下降算法的遗憾保证。我们在全部信息和强盗反馈设置中为代理商的遗憾得出了界限，即使选择决定与接收成本之间的延迟是没有约束的。我们方法的关键是$（α，β）$ - 遗憾的概念，以及从〜\ citet {el-2020-Optimal}的通用凸松弛模型的扩展，其分析具有独立利益。我们进行并展示了几项模拟研究，以证明我们的算法的功效。

Motivated by applications to online learning in sparse estimation and Bayesian optimization, we consider the problem of online unconstrained nonsubmodular minimization with delayed costs in both full information and bandit feedback settings. In contrast to previous works on online unconstrained submodular minimization, we focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms in static and delayed scenarios. We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded. Key to our approach is the notion of $(α, β)$-regret and the extension of the generic convex relaxation model from~\citet{El-2020-Optimal}, the analysis of which is of independent interest. We conduct and showcase several simulation studies to demonstrate the efficacy of our algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题