论文标题
具有两个时间尺度更新规则的培训生成对抗网络的关键批次大小的存在和估计
Existence and Estimation of Critical Batch Size for Training Generative Adversarial Networks with Two Time-Scale Update Rule
论文作者
论文摘要
先前的结果表明,使用不同的学习率的两个时间尺度更新规则(TTUR),例如不同的恒定率或不同的衰减率,可用于理论和实践中的培训生成的对抗网络(GAN)。此外,不仅学习率,而且批处理大小对于用TTURS训练gan也很重要,它们都会影响培训所需的步骤数。本文研究了批处理大小与基于恒定学习速率训练gans所需的步骤数之间的关系。从理论上讲,对于具有恒定学习速率的TTUR,查找鉴别器和发电机损耗函数所需的步骤数量随着批次尺寸的增加而减小,并且存在一个关键批次大小最小化随机的一阶甲骨文(SFO)的复杂性。然后,我们使用FR'Echet Inception距离(FID)作为训练的性能度量,并提供数值结果,表明达到低FID得分所需的步骤数量会随着批次尺寸的增加而降低,并且一旦批处理大小超过测得的关键批次批次,SFO的复杂性就会增加。此外,我们表明测得的临界批量大小接近从我们的理论结果中估计的尺寸。
Previous results have shown that a two time-scale update rule (TTUR) using different learning rates, such as different constant rates or different decaying rates, is useful for training generative adversarial networks (GANs) in theory and in practice. Moreover, not only the learning rate but also the batch size is important for training GANs with TTURs and they both affect the number of steps needed for training. This paper studies the relationship between batch size and the number of steps needed for training GANs with TTURs based on constant learning rates. We theoretically show that, for a TTUR with constant learning rates, the number of steps needed to find stationary points of the loss functions of both the discriminator and generator decreases as the batch size increases and that there exists a critical batch size minimizing the stochastic first-order oracle (SFO) complexity. Then, we use the Fr'echet inception distance (FID) as the performance measure for training and provide numerical results indicating that the number of steps needed to achieve a low FID score decreases as the batch size increases and that the SFO complexity increases once the batch size exceeds the measured critical batch size. Moreover, we show that measured critical batch sizes are close to the sizes estimated from our theoretical results.