Draformer：差异重建的注意变压器用于预测时间序列

论文标题

Draformer：差异重建的注意变压器用于预测时间序列

DRAformer: Differentially Reconstructed Attention Transformer for Time-Series Forecasting

论文作者

Li, Benhan, Du, Shengdong, Li, Tianrui, Hu, Jie, Jia, Zhen

论文摘要

时间序列预测在许多现实世界中都起着重要作用，例如设备生命周期预测，天气预报和交通流量预测。可以从最近的研究中可以看出，各种基于变压器的模型在预测时间序列中显示出了显着的结果。但是，仍然有一些问题限制了在时间序列预测任务上基于变压器的模型的能力：（i）直接在原始数据上学习由于其复杂且不稳定的功能表示，因此很容易受到噪声的影响；（ii）自我发挥的机制不足以对变化的特征和时间依赖性的关注不足。为了解决这两个问题，我们提出了一个基于变压器的差异重构注意模型Draformer。具体而言，Draformer具有以下创新：（i）对差异序列进行学习，该序列通过差异和突出序列的变化来保留清晰稳定的序列特征；（ii）重建的注意力：综合距离注意通过可学习的高斯内核展示了顺序距离，分布式差异注意通过将差异序列映射到适应性特征空间来计算分布差异，并且两个有效地集中在具有突出关联的序列上。（iii）重建的解码器输入，该输入通过集成变异信息和时间相关来提取序列特征，从而获得了更全面的序列表示。在四个大规模数据集上进行的大量实验表明，Draformer的表现优于最先进的基线。

Time-series forecasting plays an important role in many real-world scenarios, such as equipment life cycle forecasting, weather forecasting, and traffic flow forecasting. It can be observed from recent research that a variety of transformer-based models have shown remarkable results in time-series forecasting. However, there are still some issues that limit the ability of transformer-based models on time-series forecasting tasks: (i) learning directly on raw data is susceptible to noise due to its complex and unstable feature representation; (ii) the self-attention mechanisms pay insufficient attention to changing features and temporal dependencies. In order to solve these two problems, we propose a transformer-based differentially reconstructed attention model DRAformer. Specifically, DRAformer has the following innovations: (i) learning against differenced sequences, which preserves clear and stable sequence features by differencing and highlights the changing properties of sequences; (ii) the reconstructed attention: integrated distance attention exhibits sequential distance through a learnable Gaussian kernel, distributed difference attention calculates distribution difference by mapping the difference sequence to the adaptive feature space, and the combination of the two effectively focuses on the sequences with prominent associations; (iii) the reconstructed decoder input, which extracts sequence features by integrating variation information and temporal correlations, thereby obtaining a more comprehensive sequence representation. Extensive experiments on four large-scale datasets demonstrate that DRAformer outperforms state-of-the-art baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题