论文标题
亚线性收敛算法的计算复杂性
Computational Complexity of Sub-Linear Convergent Algorithms
论文作者
论文摘要
优化用于解决目标函数的机器学习算法引起了极大的兴趣。探索了优化常见算法的几种方法,例如梯度下降和随机梯度下降。这些方法之一是通过自适应采样来降低梯度差异,以解决大规模优化的经验风险最小化(ERM)问题。在本文中,我们将探讨如何从小样本开始,然后几何增加它并使用先前样品ERM的解决方案计算新的ERM。这将解决sublrinear收敛的一阶优化算法,但计算复杂性较低。本文从该方法的理论证明开始,然后进行了两个实验,将梯度下降与梯度下降的自适应采样和ADAM进行了比较,并在不同的数据集上使用自适应采样ADAM进行了比较。
Optimizing machine learning algorithms that are used to solve the objective function has been of great interest. Several approaches to optimize common algorithms, such as gradient descent and stochastic gradient descent, were explored. One of these approaches is reducing the gradient variance through adaptive sampling to solve large-scale optimization's empirical risk minimization (ERM) problems. In this paper, we will explore how starting with a small sample and then geometrically increasing it and using the solution of the previous sample ERM to compute the new ERM. This will solve ERM problems with first-order optimization algorithms of sublinear convergence but with lower computational complexity. This paper starts with theoretical proof of the approach, followed by two experiments comparing the gradient descent with the adaptive sampling of the gradient descent and ADAM with adaptive sampling ADAM on different datasets.