基于检索的模型的概括属性

论文标题

基于检索的模型的概括属性

Generalization Properties of Retrieval-based Models

论文作者

Basu, Soumya, Rawat, Ankit Singh, Zaheer, Manzil

论文摘要

许多现代高性能的机器学习模型（例如GPT-3）主要依赖于扩展模型，例如变压器网络。同时，一条并行的工作旨在通过推理期间使用其他（标记）实例来增强输入实例来提高模型性能。此类增强的示例包括特定于任务的提示和通过非参数组件从培训数据中获取的类似示例。值得注意的是，基于检索的方法在广泛的问题上取得了成功，从标准的自然语言处理和视觉任务到蛋白质折叠，包括WebGPT和Alphafold在内的许多努力所证明的那样。尽管文学越来越多地展示了这些模型的希望，但此类模型的理论基础仍然没有得到充实的影响。在本文中，我们提出了基于检索模型的正式处理，以表征其概括能力。特别是，我们专注于两类基于检索的分类方法：首先，我们分析了一个本地学习框架，该框架基于每个输入实例检索的示例采用明确的局部经验风险最小化。有趣的是，我们表明，将基本的学习任务分解为本地子任务使该模型能够采用低复杂性参数组件以确保良好的整体准确性。我们探索的第二类基于检索的方法将使用内核方法学习一个全局模型，以直接映射输入实例并将示例检索到预测中，而无需明确求解本地学习任务。

Many modern high-performing machine learning models such as GPT-3 primarily rely on scaling up models, e.g., transformer networks. Simultaneously, a parallel line of work aims to improve the model performance by augmenting an input instance with other (labeled) instances during inference. Examples of such augmentations include task-specific prompts and similar examples retrieved from the training data by a nonparametric component. Remarkably, retrieval-based methods have enjoyed success on a wide range of problems, ranging from standard natural language processing and vision tasks to protein folding, as demonstrated by many recent efforts, including WebGPT and AlphaFold. Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored. In this paper, we present a formal treatment of retrieval-based models to characterize their generalization ability. In particular, we focus on two classes of retrieval-based classification approaches: First, we analyze a local learning framework that employs an explicit local empirical risk minimization based on retrieved examples for each input instance. Interestingly, we show that breaking down the underlying learning task into local sub-tasks enables the model to employ a low complexity parametric component to ensure good overall accuracy. The second class of retrieval-based approaches we explore learns a global model using kernel methods to directly map an input instance and retrieved examples to a prediction, without explicitly solving a local learning task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题