REX：无数据剩余量化错误扩展

论文标题

REX：无数据剩余量化错误扩展

REx: Data-Free Residual Quantization Error Expansion

论文作者

Yvinec, Edouard, Dapgony, Arnaud, Cord, Matthieu, Bailly, Kevin

论文摘要

深度神经网络（DNNS）在计算机视觉和自然语言处理中无处不在，但遭受了高推理成本。可以通过量化解决此问题，该量化包括将浮点操作转换为较低的位宽度格式。随着对隐私权的日益关注，我们将精力集中在无数据的方法上。但是，这种技术缺乏对目标设备的适应性，因为硬件通常仅支持特定的位宽度。因此，为了适应各种设备，量化方法应具有足够的灵活性，以找到良好的准确性V.S.为每个位宽度和目标设备加速权衡。为了实现这一目标，我们提出了Rex，Rex是一种量化方法，该方法利用残余误差扩展以及组的稀疏性和集合近似值，以获得更好的并行化。 REX得到了强大的理论保证的支持，并在每个基准的应用程序（从视觉到NLP任务到NLP任务），体系结构（Convnets，Transformers）和位宽度（从INT8到三元量化）上实现了卓越的性能。

Deep neural networks (DNNs) are ubiquitous in computer vision and natural language processing, but suffer from high inference cost. This problem can be addressed by quantization, which consists in converting floating point operations into a lower bit-width format. With the growing concerns on privacy rights, we focus our efforts on data-free methods. However, such techniques suffer from their lack of adaptability to the target devices, as a hardware typically only support specific bit widths. Thus, to adapt to a variety of devices, a quantization method shall be flexible enough to find good accuracy v.s. speed trade-offs for every bit width and target device. To achieve this, we propose REx, a quantization method that leverages residual error expansion, along with group sparsity and an ensemble approximation for better parallelization. REx is backed off by strong theoretical guarantees and achieves superior performance on every benchmarked application (from vision to NLP tasks), architecture (ConvNets, transformers) and bit-width (from int8 to ternary quantization).

下载PDF全文

下载文献需遵守相关版权规定

论文标题