在线半监督学习与强盗反馈

论文标题

在线半监督学习与强盗反馈

Online Semi-Supervised Learning with Bandit Feedback

论文作者

Upadhyay, Sohini, Yurochkin, Mikhail, Agarwal, Mayank, Khazaeni, Yasaman, DjallelBouneffouf

论文摘要

我们在半监督学习和上下文匪徒的交叉点中提出了一个新问题，这是由包括Clini-Cal试验和AD建议在内的多个应用所激发的。我们演示了一种半监督的学习方法，可以调整新的问题制定方法。我们还提出了线性上下文匪徒的贪婪，并用半监督的缺失的奖励归纳。我们认为两种方法都最好地开发嵌入多GCN的上下文匪徒。我们的算法在几个现实世界数据集上进行了验证。

We formulate a new problem at the intersectionof semi-supervised learning and contextual bandits,motivated by several applications including clini-cal trials and ad recommendations. We demonstratehow Graph Convolutional Network (GCN), a semi-supervised learning approach, can be adjusted tothe new problem formulation. We also propose avariant of the linear contextual bandit with semi-supervised missing rewards imputation. We thentake the best of both approaches to develop multi-GCN embedded contextual bandit. Our algorithmsare verified on several real world datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题