BERT4REC的系统审查和可复制性研究，以进行顺序建议

论文标题

BERT4REC的系统审查和可复制性研究，以进行顺序建议

A Systematic Review and Replicability Study of BERT4Rec for Sequential Recommendation

论文作者

Petrov, Aleksandr, Macdonald, Craig

论文摘要

BERT4REC是基于变压器体系结构的顺序推荐的有效模型。在原始出版物中，Bert4Rec声称比其他可用的顺序推荐方法优越（例如Sasrec），现在经常将其用作顺序建议的最先进的基线。但是，并非所有随后的出版物都证实了这一结果，并提出了其他模型，这些模型在有效性方面被证明超过了Bert4Rec。在本文中，我们会系统地回顾所有将Bert4Rec与另一个受欢迎的基于变压器的模型（即Sasrec）进行比较的出版物，并表明BERT4REC结果在这些出版物中不一致。为了了解这种不一致的原因，我们分析了Bert4Rec的可用实现，并表明我们在使用默认配置参数时未能重现原始Bert4Rec出版物的结果。但是，与默认配置相比，如果训练更长的时间（最高30倍），我们可以用原始代码复制报告的结果。我们还根据拥抱面孔库库对Bert4REC的实施，我们证明，该库在3个OUT 4数据集中重复了最初报告的结果，同时需要减少95％的培训时间来融合。总体而言，从我们的系统审查和详细的实验中，我们得出结论，Bert4Rec确实确实表现出了最新的有效性来进行连续推荐，但仅在经过足够的时间进行培训时。此外，我们表明我们的实现可以通过调整拥抱面孔库中可用的其他变压器体系结构（例如，使用Deberta提供的散布注意力，或更大的隐藏层大小参见Albert）。

BERT4Rec is an effective model for sequential recommendation based on the Transformer architecture. In the original publication, BERT4Rec claimed superiority over other available sequential recommendation approaches (e.g. SASRec), and it is now frequently being used as a state-of-the art baseline for sequential recommendations. However, not all subsequent publications confirmed this result and proposed other models that were shown to outperform BERT4Rec in effectiveness. In this paper we systematically review all publications that compare BERT4Rec with another popular Transformer-based model, namely SASRec, and show that BERT4Rec results are not consistent within these publications. To understand the reasons behind this inconsistency, we analyse the available implementations of BERT4Rec and show that we fail to reproduce results of the original BERT4Rec publication when using their default configuration parameters. However, we are able to replicate the reported results with the original code if training for a much longer amount of time (up to 30x) compared to the default configuration. We also propose our own implementation of BERT4Rec based on the Hugging Face Transformers library, which we demonstrate replicates the originally reported results on 3 out 4 datasets, while requiring up to 95% less training time to converge. Overall, from our systematic review and detailed experiments, we conclude that BERT4Rec does indeed exhibit state-of-the-art effectiveness for sequential recommendation, but only when trained for a sufficient amount of time. Additionally, we show that our implementation can further benefit from adapting other Transformer architectures that are available in the Hugging Face Transformers library (e.g. using disentangled attention, as provided by DeBERTa, or larger hidden layer size cf. ALBERT).

下载PDF全文

下载文献需遵守相关版权规定

论文标题