在神经信息检索中持续学习长期主题序列

论文标题

在神经信息检索中持续学习长期主题序列

Continual Learning of Long Topic Sequences in Neural Information Retrieval

论文作者

Gerald, Thomas, Soulier, Laure

论文摘要

在信息检索（IR）系统中，趋势和用户的兴趣可能会随着时间的推移而变化，更改请求或内容的分布。由于神经排名的方法在很大程度上取决于训练数据，因此从长远来看，了解最新IR方法以解决新领域的转移能力至关重要。在本文中，我们首先根据MSMARCO语料库提出一个数据集，旨在建模长长的主题以及IR属性驱动的受控设置。然后，我们深入分析了最近的神经IR模型的能力，同时不断学习这些流。我们的实证研究强调了特殊情况发生灾难性遗忘（例如，任务之间的相似性，文本长度上的特殊性和学习模型的方式）以模型设计提供未来的方向。

In information retrieval (IR) systems, trends and users' interests may change over time, altering either the distribution of requests or contents to be recommended. Since neural ranking approaches heavily depend on the training data, it is crucial to understand the transfer capacity of recent IR approaches to address new domains in the long term. In this paper, we first propose a dataset based upon the MSMarco corpus aiming at modeling a long stream of topics as well as IR property-driven controlled settings. We then in-depth analyze the ability of recent neural IR models while continually learning those streams. Our empirical study highlights in which particular cases catastrophic forgetting occurs (e.g., level of similarity between tasks, peculiarities on text length, and ways of learning models) to provide future directions in terms of model design.

下载PDF全文

下载文献需遵守相关版权规定

论文标题