重新考虑时间序列分类中的注意机制

论文标题

重新考虑时间序列分类中的注意机制

Rethinking Attention Mechanism in Time Series Classification

论文作者

Zhao, Bowen, Xing, Huanlai, Wang, Xinhan, Song, Fuhong, Xiao, Zhiwen

论文摘要

基于注意的模型已在许多领域（例如计算机视觉和自然语言处理）广泛使用。但是，尚未深入探索时间序列分类（TSC）中的相关应用，导致大量TSC算法仍然遇到注意机制的一般问题，例如二次复杂性。在本文中，我们通过提出灵活的多头线性注意（FMLA）来促进注意机制的效率和性能，从而通过与可变形的卷积块和在线知识蒸馏来提高局部意识。更重要的是，我们提出了一种简单但有效的遮罩机制，有助于减少时间序列中的噪声影响，并通过按比例掩盖每个给定序列的某些位置来减少所提出的FMLA的冗余。为了稳定这种机制，将样品通过模型进行了几次转发，并将其输出汇总为使用常规掩码层教相同的模型。我们在85 UCR2018数据集上进行了广泛的实验，以将我们的算法与11个知名算法进行比较，结果表明，在TOP-1的准确性方面，我们的算法具有可比性的性能。我们还将模型与三个基于变压器的模型相对于每秒的浮点操作和参数数量进行了比较，并发现我们的算法在较低的复杂性方面可显着提高效率。

Attention-based models have been widely used in many areas, such as computer vision and natural language processing. However, relevant applications in time series classification (TSC) have not been explored deeply yet, causing a significant number of TSC algorithms still suffer from general problems of attention mechanism, like quadratic complexity. In this paper, we promote the efficiency and performance of the attention mechanism by proposing our flexible multi-head linear attention (FMLA), which enhances locality awareness by layer-wise interactions with deformable convolutional blocks and online knowledge distillation. What's more, we propose a simple but effective mask mechanism that helps reduce the noise influence in time series and decrease the redundancy of the proposed FMLA by masking some positions of each given series proportionally. To stabilize this mechanism, samples are forwarded through the model with random mask layers several times and their outputs are aggregated to teach the same model with regular mask layers. We conduct extensive experiments on 85 UCR2018 datasets to compare our algorithm with 11 well-known ones and the results show that our algorithm has comparable performance in terms of top-1 accuracy. We also compare our model with three Transformer-based models with respect to the floating-point operations per second and number of parameters and find that our algorithm achieves significantly better efficiency with lower complexity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题