论文标题

在数据增强下的梯度下降训练,并在线噪声副本

On gradient descent training under data augmentation with on-line noisy copies

论文作者

Hagiwara, Katsuyuki

论文摘要

在机器学习中,数据增强(DA)是一种改善概括性能的技术。在本文中,我们主要考虑使用数据集的嘈杂副本在DA下进行线性回归的梯度下降,其中将噪声注入输入中。我们分析了在每个时期新生成和使用的随机噪声副本的情况;即,使用在线嘈杂副本的情况。因此,它被视为对通过DA进行噪声注入训练过程的方法的分析。即Da的在线版本。在三种情况下,我们得出了训练过程的平均行为,即在平方错误的总和下,在平均平方错误下进行的全批培训,全批批次和迷你批次培训。我们表明,在所有情况下,使用在线副本的DA训练大致等于脊正规化,其正则化参数对应于注射噪声的方差。另一方面,我们表明,学习率乘以噪声副本的数量,以及一个在平方错误之和的全批中,一个在平方平方误差下的小批量乘。即,带有在线副本的DA产生明显的培训加速度。明显的加速度和正则化效果分别来自副本数据中的原始部分和噪声。这些结果在数值实验中得到了证实。在数值实验中,我们发现我们的结果可以大约应用于参数不足的情况下,并且不能在过度参数化的情况下。此外,我们通过离线噪声副本在DA下的DA下进行了实验研究,发现我们对线性回归的分析可以应用于神经网络。

In machine learning, data augmentation (DA) is a technique for improving the generalization performance. In this paper, we mainly considered gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyzed the situation where random noisy copies are newly generated and used at each epoch; i.e., the case of using on-line noisy copies. Therefore, it is viewed as an analysis on a method using noise injection into training process by DA manner; i.e., on-line version of DA. We derived the averaged behavior of training process under three situations which are the full-batch training under the sum of squared errors, the full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to a ridge regularization whose regularization parameter corresponds to the variance of injected noise. On the other hand, we showed that the learning rate is multiplied by the number of noisy copies plus one in full-batch under the sum of squared errors and the mini-batch under the mean squared error; i.e., DA with on-line copies yields apparent acceleration of training. The apparent acceleration and regularization effect come from the original part and noise in a copy data respectively. These results are confirmed in a numerical experiment. In the numerical experiment, we found that our result can be approximately applied to usual off-line DA in under-parameterization scenario and can not in over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression is possible to be applied to neural networks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源