论文标题
最佳运输的可区分TOP-K运营商
Differentiable Top-k Operator with Optimal Transport
论文作者
论文摘要
Top-K操作,即从分数集合中找到k最大或最小的元素是重要的模型组件,它广泛用于信息检索,机器学习和数据挖掘。但是,如果以算法方式(例如,使用气泡算法)实现了TOP-K操作,则不能使用普遍的梯度下降算法以端到端的方式训练所得模型。这是因为这些实现通常涉及交换指数,其梯度无法计算。此外,从输入分数到该元素是否属于TOP-K集的指示向量的相应映射本质上是不连续的。为了解决这个问题,我们提出了一个平滑的近似值,即柔软的(可扩展的基于最佳传输的可区分)TOP-K运算符。具体而言,我们的软顶K运算符将TOP-K操作的输出近似为熵最佳传输(EOT)问题的解决方案。然后,根据EOT问题的最佳条件,可以有效地近似软操作员的梯度。我们将提出的操作员应用于K-Nearthears和Beam搜索算法,并证明了性能的提高。
The top-k operation, i.e., finding the k largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining. However, if the top-k operation is implemented in an algorithmic way, e.g., using bubble algorithm, the resulting model cannot be trained in an end-to-end way using prevalent gradient descent algorithms. This is because these implementations typically involve swapping indices, whose gradient cannot be computed. Moreover, the corresponding mapping from the input scores to the indicator vector of whether this element belongs to the top-k set is essentially discontinuous. To address the issue, we propose a smoothed approximation, namely the SOFT (Scalable Optimal transport-based diFferenTiable) top-k operator. Specifically, our SOFT top-k operator approximates the output of the top-k operation as the solution of an Entropic Optimal Transport (EOT) problem. The gradient of the SOFT operator can then be efficiently approximated based on the optimality conditions of EOT problem. We apply the proposed operator to the k-nearest neighbors and beam search algorithms, and demonstrate improved performance.