用于建模长期时间依赖性建模的迷你批次学习策略：环境应用的研究

论文标题

用于建模长期时间依赖性建模的迷你批次学习策略：环境应用的研究

Mini-Batch Learning Strategies for modeling long term temporal dependencies: A study in environmental applications

论文作者

Xu, Shaoming, Khandelwal, Ankush, Li, Xiang, Jia, Xiaowei, Liu, Licheng, Willard, Jared, Ghosh, Rahul, Cutler, Kelly, Steinbach, Michael, Duffy, Christopher, Nieber, John, Kumar, Vipin

论文摘要

在许多环境应用中，经常性神经网络（RNN）通常用于建模具有较长时间依赖性的物理变量。但是，由于小批量培训，不考虑批处理（批量内）和批次之间（批处理间）之间的训练段之间的时间关系，这可能会导致性能有限。国家的RNN旨在通过在批次之间传递隐藏状态来解决这个问题。由于状态RNN忽略了批量内的时间依赖，因此在训练稳定与捕获时间依赖性之间存在权衡。在本文中，我们提供了对不同状态RNN建模策略的定量比较，并提出了两种策略来实施内部和间间的时间依赖性。首先，我们通过将批量定义为暂时订购的一组培训段来扩展状态RNN，从而实现了时间信息内共享。尽管这种方法显着提高了性能，但由于高度顺序的训练，它导致了更大的训练时间。为了解决这个问题，我们进一步提出了一项新策略，该策略在培训领域开始之前就可以从时间步长到目标变量的初始值来增强培训部分。换句话说，我们提供目标变量的初始值，作为附加输入，以便网络可以将学习更改相对于初始值。通过使用此策略，可以按任何顺序（迷你批次培训）通过样品，从而大大减少训练时间，同时保持性能。在证明我们在水文建模方面的方法时，我们观察到，当将这些方法应用于状态变化的状态变化时，例如土壤水和积雪等状态变化，而不是连续移动的通量变量，例如流量流量（例如流量流）时，预测准确性的最显着提高就会发生。

In many environmental applications, recurrent neural networks (RNNs) are often used to model physical variables with long temporal dependencies. However, due to mini-batch training, temporal relationships between training segments within the batch (intra-batch) as well as between batches (inter-batch) are not considered, which can lead to limited performance. Stateful RNNs aim to address this issue by passing hidden states between batches. Since Stateful RNNs ignore intra-batch temporal dependency, there exists a trade-off between training stability and capturing temporal dependency. In this paper, we provide a quantitative comparison of different Stateful RNN modeling strategies, and propose two strategies to enforce both intra- and inter-batch temporal dependency. First, we extend Stateful RNNs by defining a batch as a temporally ordered set of training segments, which enables intra-batch sharing of temporal information. While this approach significantly improves the performance, it leads to much larger training times due to highly sequential training. To address this issue, we further propose a new strategy which augments a training segment with an initial value of the target variable from the timestep right before the starting of the training segment. In other words, we provide an initial value of the target variable as additional input so that the network can focus on learning changes relative to that initial value. By using this strategy, samples can be passed in any order (mini-batch training) which significantly reduces the training time while maintaining the performance. In demonstrating our approach in hydrological modeling, we observe that the most significant gains in predictive accuracy occur when these methods are applied to state variables whose values change more slowly, such as soil water and snowpack, rather than continuously moving flux variables such as streamflow.

下载PDF全文

下载文献需遵守相关版权规定

论文标题