Matryoshka表示学习

论文标题

Matryoshka表示学习

Matryoshka Representation Learning

论文作者

Kusupati, Aditya, Bhatt, Gantavya, Rege, Aniket, Wallingford, Matthew, Sinha, Aditya, Ramanujan, Vivek, Howard-Snyder, William, Chen, Kaifeng, Kakade, Sham, Jain, Prateek, Farhadi, Ali

论文摘要

学习的表示形式是现代ML系统中的核心组成部分，服务于多种下游任务。在培训此类表示时，通常情况下，每个下游任务的计算和统计约束都是未知的。在这种情况下，固定的容量表示形式可以超越或不适应手头的任务。这导致我们问：我们可以设计一个灵活的表示形式，该表示可以适应具有不同计算资源的多个下游任务？我们的主要贡献是Matryoshka表示学习（MRL），它以不同的粒度编码信息，并允许单个嵌入来适应下游任务的计算约束。 MRL最小化修改现有表示管道的表示管道，并且在推理和部署期间没有额外的成本。 MRL学习至少与受独立训练的低维度相同的精确和丰富的粗略表示。学识渊博的Matryoshka表示中的灵活性提供：（a）在同一准确性级别的Imagenet-1k分类中，嵌入式尺寸较小14倍；（b）在Imagenet-1k和4K上进行大规模检索的最多14倍现实世界加速；（c）长尾少量分类的精度提高了2％，同时与原始表示一样强大。最后，我们表明MRL无缝扩展到各种模式的网络尺度数据集（ImageNet，JFT） - 视觉（VIT，RESNET），VISION + LAGGHAN（ALLIGN）和语言（BERT）。 MRL代码和预处理的模型在https://github.com/raivnlab/mrl上开源。

Learned representations are a central component in modern ML systems, serving a multitude of downstream tasks. When training such representations, it is often the case that computational and statistical constraints for each downstream task are unknown. In this context rigid, fixed capacity representations can be either over or under-accommodating to the task at hand. This leads us to ask: can we design a flexible representation that can adapt to multiple downstream tasks with varying computational resources? Our main contribution is Matryoshka Representation Learning (MRL) which encodes information at different granularities and allows a single embedding to adapt to the computational constraints of downstream tasks. MRL minimally modifies existing representation learning pipelines and imposes no additional cost during inference and deployment. MRL learns coarse-to-fine representations that are at least as accurate and rich as independently trained low-dimensional representations. The flexibility within the learned Matryoshka Representations offer: (a) up to 14x smaller embedding size for ImageNet-1K classification at the same level of accuracy; (b) up to 14x real-world speed-ups for large-scale retrieval on ImageNet-1K and 4K; and (c) up to 2% accuracy improvements for long-tail few-shot classification, all while being as robust as the original representations. Finally, we show that MRL extends seamlessly to web-scale datasets (ImageNet, JFT) across various modalities -- vision (ViT, ResNet), vision + language (ALIGN) and language (BERT). MRL code and pretrained models are open-sourced at https://github.com/RAIVNLab/MRL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题