用当地梯度在边缘学习

论文标题

用当地梯度在边缘学习

Learning with Local Gradients at the Edge

论文作者

Lomnitz, Michael, Daniels, Zachary, Zhang, David, Piacentino, Michael

论文摘要

为了在具有快速收敛和低内存的边缘设备上学习，我们提出了一种新颖的无反向传播优化算法，称为目标投影投影随机梯度下降（TPSGD）。 TPSGD将直接的随机目标投影推广到使用任意损失函数的工作，并扩展了训练复发性神经网络（RNN）的目标投影，此外还有其他损失函数。 TPSGD使用层的随机梯度下降（SGD）和通过标签的随机投影生成的局部目标来训练网络逐层训练，仅向前传递。 TPSGD在优化过程中不需要保留梯度，与SGD反向传播（BP）方法相比，大大降低了内存分配，这些方法需要整个神经网络权重，输入/输出和中间结果的多个实例。我们的方法在完全连接的层，卷积层和经常性层的相对较浅的网络上，在5％精度内的BP梯度散发性能相当。 TPSGD还胜过由多层感知器，卷积神经网络（CNN）和RNN组成的浅层模型中的其他最先进的无梯度算法，具有竞争力的准确性，记忆力和时间更少。我们评估TPSGD在训练深神网络（例如VGG）中的性能，并将方法扩展到多层RNN。这些实验突出了与使用TPSGD在边缘的TPSGD进行域转移的优化基于层的适配器训练有关的新研究方向。

To enable learning on edge devices with fast convergence and low memory, we present a novel backpropagation-free optimization algorithm dubbed Target Projection Stochastic Gradient Descent (tpSGD). tpSGD generalizes direct random target projection to work with arbitrary loss functions and extends target projection for training recurrent neural networks (RNNs) in addition to feedforward networks. tpSGD uses layer-wise stochastic gradient descent (SGD) and local targets generated via random projections of the labels to train the network layer-by-layer with only forward passes. tpSGD doesn't require retaining gradients during optimization, greatly reducing memory allocation compared to SGD backpropagation (BP) methods that require multiple instances of the entire neural network weights, input/output, and intermediate results. Our method performs comparably to BP gradient-descent within 5% accuracy on relatively shallow networks of fully connected layers, convolutional layers, and recurrent layers. tpSGD also outperforms other state-of-the-art gradient-free algorithms in shallow models consisting of multi-layer perceptrons, convolutional neural networks (CNNs), and RNNs with competitive accuracy and less memory and time. We evaluate the performance of tpSGD in training deep neural networks (e.g. VGG) and extend the approach to multi-layer RNNs. These experiments highlight new research directions related to optimized layer-based adaptor training for domain-shift using tpSGD at the edge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题