aliexpress学习到级：最大化在线模型绩效而无需在线

论文标题

aliexpress学习到级：最大化在线模型绩效而无需在线

AliExpress Learning-To-Rank: Maximizing Online Model Performance without Going Online

论文作者

Huzhang, Guangda, Pang, Zhen-Jia, Gao, Yongqing, Liu, Yawen, Shen, Weijie, Zhou, Wen-Ji, Da, Qing, Zeng, An-Xiang, Yu, Han, Yu, Yang, Zhou, Zhi-Hua

论文摘要

学习范围（LTR）已成为电子商务应用程序中的关键技术。大多数现有的LTR方法遵循从在线系统中收集的离线标记数据的监督学习范式。但是，已经注意到，以前的LTR模型可以在离线验证数据上具有良好的验证性能，但在线绩效差，反之亦然，这意味着离线和在线评估之间可能存在很大的不一致。我们在本文中调查并确认这种不一致存在，并可能对Aliexpress搜索产生重大影响。不一致的原因包括在学习过程中对项目上下文的无知，并且离线数据集不足以学习上下文。因此，本文提出了一个具有项目上下文的LTR的评估符框架。该框架由评估者组成，该评估者概括地评估涉及上下文的建议，以及通过强化学习来最大化评估者得分的生成器，以及确保评估者概括的歧视者。在模拟环境和Aliexpress搜索在线系统中进行的广泛实验表明，首先，离线数据集中的基于数据的经典指标可能与在线性能显示出很大的不一致，甚至可能具有误导性。其次，所提出的评估者得分与在线绩效相比，与普通排名指标相比，与在线绩效一致。最后，结果，我们的方法在在线A/B测试中的转换率（CR）方面取得了重大改进（\ TextGreater $ 2 \％$）。

Learning-to-rank (LTR) has become a key technology in E-commerce applications. Most existing LTR approaches follow a supervised learning paradigm from offline labeled data collected from the online system. However, it has been noticed that previous LTR models can have a good validation performance over offline validation data but have a poor online performance, and vice versa, which implies a possible large inconsistency between the offline and online evaluation. We investigate and confirm in this paper that such inconsistency exists and can have a significant impact on AliExpress Search. Reasons for the inconsistency include the ignorance of item context during the learning, and the offline data set is insufficient for learning the context. Therefore, this paper proposes an evaluator-generator framework for LTR with item context. The framework consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning, and a discriminator that ensures the generalization of the evaluator. Extensive experiments in simulation environments and AliExpress Search online system show that, firstly, the classic data-based metrics on the offline dataset can show significant inconsistency with online performance, and can even be misleading. Secondly, the proposed evaluator score is significantly more consistent with the online performance than common ranking metrics. Finally, as the consequence, our method achieves a significant improvement (\textgreater$2\%$) in terms of Conversion Rate (CR) over the industrial-level fine-tuned model in online A/B tests.

下载PDF全文

下载文献需遵守相关版权规定

论文标题