了解Spigot的机制：潜在结构学习的替代梯度

论文标题

了解Spigot的机制：潜在结构学习的替代梯度

Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

论文作者

Mihaylova, Tsvetomila, Niculae, Vlad, Martins, André F. T.

论文摘要

潜在的结构模型是建模语言数据的强大工具：它们可以减轻管道系统中的错误传播和注释瓶颈，同时发现有关数据的语言见解。对这些模型的端到端培训的一个挑战是Argmax操作，该操作的梯度为无效。在本文中，我们专注于替代梯度，这是解决这个问题的流行策略。我们通过向下撤回下游学习目标的角度探索潜在的结构学习。在此范式中，我们发现了直通估计器（Ste）以及最近所传播的Spigot的原则动机 - 结构化模型的Ste变体。我们的观点导致同一家庭中的新算法。我们从经验上将已知和新颖的后备估计量与流行替代方案进行了比较，为从业者提供了新的见解，并揭示了有趣的失败案例。

Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy to deal with this problem. We explore latent structure learning through the angle of pulling back the downstream learning objective. In this paradigm, we discover a principled motivation for both the straight-through estimator (STE) as well as the recently-proposed SPIGOT - a variant of STE for structured models. Our perspective leads to new algorithms in the same family. We empirically compare the known and the novel pulled-back estimators against the popular alternatives, yielding new insight for practitioners and revealing intriguing failure cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题