论文标题

部分可观测时空混沌系统的无模型预测

General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference

论文作者

Du, Jingfei, Ott, Myle, Li, Haoran, Zhou, Xing, Stoyanov, Veselin

论文摘要

许多NLP任务上的最新技术是通过大型预训练的语言模型来实现的,这需要大量的计算。我们探索一个设置,在一个文本上进行了许多不同的预测。在这种情况下,可以使用共享文本编码器在推理过程中的某些计算成本可以摊销。我们比较了训练这种编码器的方法,并表明编码对多个任务进行了预先训练的编码,可以很好地推广到看不见的任务。我们还比较了从该编码器中提取固定尺寸和有限尺寸表示的方法,包括从多个层或位置提取的汇总方式的不同方法。我们的最佳方法与知识蒸馏相比,一旦系统处理7个任务,就可以实现更高的准确性和较低的计算成本。此外,我们表明,通过二进制量化,我们可以将提取的表示的大小减少16倍,从而使它们存储以备后用。最终的方法提供了一种令人信服的解决方案,用于在同一文本上执行多个任务时,以一小部分计算成本使用大规模的预训练模型。

The state of the art on many NLP tasks is currently achieved by large pre-trained language models, which require a considerable amount of computation. We explore a setting where many different predictions are made on a single piece of text. In that case, some of the computational cost during inference can be amortized over the different tasks using a shared text encoder. We compare approaches for training such an encoder and show that encoders pre-trained over multiple tasks generalize well to unseen tasks. We also compare ways of extracting fixed- and limited-size representations from this encoder, including different ways of pooling features extracted from multiple layers or positions. Our best approach compares favorably to knowledge distillation, achieving higher accuracy and lower computational cost once the system is handling around 7 tasks. Further, we show that through binary quantization, we can reduce the size of the extracted representations by a factor of 16 making it feasible to store them for later use. The resulting method offers a compelling solution for using large-scale pre-trained models at a fraction of the computational cost when multiple tasks are performed on the same text.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源