评估代币级别和通道级的密集检索模型，以获取数学信息检索

论文标题

评估代币级别和通道级的密集检索模型，以获取数学信息检索

Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval

论文作者

Zhong, Wei, Yang, Jheng-Hong, Xie, Yuqing, Lin, Jimmy

论文摘要

随着基于双重编码器的密集检索方法的最新成功，研究将这种方法应用于具有良好效率和内域有效性的各种有趣的下游检索任务。最近，我们还看到数学信息检索（MIR）任务中存在着密集的检索模型，但是最有效的系统仍然是考虑手工制作的结构特征的经典检索方法。在这项工作中，我们尝试结合两全其美的最佳：\定义明确的结构搜索方法，用于有效公式搜索和有效的双层编码器密集检索模型，以捕获上下文相似性。具体而言，我们已经评估了两个代表性的双重编码模型，用于在最近的MIR任务上进行令牌级和通道级的密集检索。我们的结果表明，双重编码器模型与现有的结构搜索方法高度互补，并且我们能够推进mir数据集上的最新时间。

With the recent success of dense retrieval methods based on bi-encoders, studies have applied this approach to various interesting downstream retrieval tasks with good efficiency and in-domain effectiveness. Recently, we have also seen the presence of dense retrieval models in Math Information Retrieval (MIR) tasks, but the most effective systems remain classic retrieval methods that consider hand-crafted structure features. In this work, we try to combine the best of both worlds:\ a well-defined structure search method for effective formula search and efficient bi-encoder dense retrieval models to capture contextual similarities. Specifically, we have evaluated two representative bi-encoder models for token-level and passage-level dense retrieval on recent MIR tasks. Our results show that bi-encoder models are highly complementary to existing structure search methods, and we are able to advance the state-of-the-art on MIR datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题