论文标题
算术抽样:大型语言模型的平行解码
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
论文作者
论文摘要
大型语言模型的解码方法通常在产出的多样性和计算的并行性之间权衡。诸如梁搜索和Gumbel Top-K采样等方法可以保证光束的每个元素的不同输出,但不容易并行化。另外,诸如温度采样及其修饰(Top-K采样,核采样,典型解码等)之类的方法是令人尴尬的平行,但对重复样本没有保证。我们提出了一个根据算术代码簿进行抽样的框架,该算法由大型语言模型隐含地定义,与常见的采样变化兼容,在某些条件下具有可证明的光束多样性,并且具有令人尴尬的并行并提供了与原始模型的无偏见和一致的期望。我们证明了方法对WMT机器翻译的有效性,而在估计预期的BLEU得分奖励时将标准偏差减半,并在独立采样和梁搜索之间缩小BLEU得分差距高达63%。
Decoding methods for large language models often trade-off between diversity of outputs and parallelism of computation. Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize. Alternatively, methods such as temperature sampling and its modifications (top-k sampling, nucleus sampling, typical decoding, and others), are embarrassingly parallel, but have no guarantees about duplicate samples. We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model, compatible with common sampling variations, with provable beam diversity under certain conditions, as well as being embarrassingly parallel and providing unbiased and consistent expectations from the original model. We demonstrate the effectiveness of our approach on WMT machine translation, more than halving the standard deviation when estimating expected BLEU score reward, and closing the BLEU score gap between independent sampling and beam search by up to 63%.