酒吧：为推荐系统开放基准测试

论文标题

酒吧：为推荐系统开放基准测试

BARS: Towards Open Benchmarking for Recommender Systems

论文作者

Zhu, Jieming, Dai, Quanyu, Su, Liangcai, Ma, Rong, Liu, Jinyang, Cai, Guohao, Xiao, Xi, Zhang, Rui

论文摘要

过去二十年来，亲眼目睹了个性化推荐技术的快速发展。尽管在推荐系统的研究和实践中取得了重大进展，但迄今为止，该领域缺乏公认的基准标准。许多现有研究以临时的方式进行模型评估和比较，例如，采用自己的私人数据拆分或使用不同的实验设置。这样的惯例不仅增加了重现现有研究的困难，而且还会导致不一致的实验结果。这在很大程度上限制了研究的可信度和实践价值。为了解决这些问题，我们提出了一个计划项目（即酒吧），目的是为推荐系统开放基准测试。与朝着此目标的一些早期尝试相比，我们通过设置可再现研究的标准化基准管道来迈出进一步的一步，该管道整合了有关数据集，源代码，超参数设置，运行日志和评估结果的所有详细信息。该基准的设计考虑到全面和可持续性。它涵盖了匹配和排名任务，还使研究人员可以轻松跟随并为该领域的研究做出贡献。该项目不仅将减少研究人员重新实施或重新运行现有基准的冗余努力，而且还可以推动有关推荐系统的更扎实，更可重复的研究。我们想呼吁所有人使用栏基准进行将来的评估，并通过门户网站为项目做出贡献：https：//openbench.gith.github.io/bars。

The past two decades have witnessed the rapid development of personalized recommendation techniques. Despite significant progress made in both research and practice of recommender systems, to date, there is a lack of a widely-recognized benchmarking standard in this field. Many existing studies perform model evaluations and comparisons in an ad-hoc manner, for example, by employing their own private data splits or using different experimental settings. Such conventions not only increase the difficulty in reproducing existing studies, but also lead to inconsistent experimental results among them. This largely limits the credibility and practical value of research results in this field. To tackle these issues, we present an initiative project (namely BARS) aiming for open benchmarking for recommender systems. In comparison to some earlier attempts towards this goal, we take a further step by setting up a standardized benchmarking pipeline for reproducible research, which integrates all the details about datasets, source code, hyper-parameter settings, running logs, and evaluation results. The benchmark is designed with comprehensiveness and sustainability in mind. It covers both matching and ranking tasks, and also enables researchers to easily follow and contribute to the research in this field. This project will not only reduce the redundant efforts of researchers to re-implement or re-run existing baselines, but also drive more solid and reproducible research on recommender systems. We would like to call upon everyone to use the BARS benchmark for future evaluation, and contribute to the project through the portal at: https://openbenchmark.github.io/BARS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题