论文标题
学习稀疏分类器:连续和混合整数优化观点
Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives
论文作者
论文摘要
我们考虑用于学习稀疏分类器的离散优化公式,其中结果取决于一小部分特征的线性组合。最近的工作表明,混合整数编程(MIP)可用于求解(最佳)$ \ ell_0 $ regardized回归问题,比通常认为可能的范围要大得多。尽管它们有用,但基于MIP的全球优化方法与$ \ ell_1 $ regulination和NonConvex正规化问题的相对成熟的阶层相比,相比之下。我们的目标是通过为$ \ ell_0 $ regarlized分类开发新的基于MIP的算法来弥合计算时间的差距。我们提出了两类可扩展算法:一种精确的算法,可以在几分钟内处理$ p \ 50,000美元的功能,以及近似算法,可以用$ p \ p \ p \ 10^6 $在时间上与快速$ \ ell_1 $的算法相提并论。我们的确切算法基于\ textsl {Integration Generation}的新颖概念,该算法通过一系列涉及少量二进制变量的混合整数程序来解决原始问题(使用$ p $二进制变量)。我们的近似算法基于坐标下降和局部组合搜索。此外,我们提供了一类$ \ ell_0 $登记估计器的新估计错误界限。对真实和合成数据的实验表明,与竞争方法相比,我们的方法导致具有相当大的统计性能(尤其是可变选择)的模型。
We consider a discrete optimization formulation for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) $\ell_0$-regularized regression problems at scales much larger than what was conventionally considered possible. Despite their usefulness, MIP-based global optimization approaches are significantly slower compared to the relatively mature algorithms for $\ell_1$-regularization and heuristics for nonconvex regularized problems. We aim to bridge this gap in computation times by developing new MIP-based algorithms for $\ell_0$-regularized classification. We propose two classes of scalable algorithms: an exact algorithm that can handle $p\approx 50,000$ features in a few minutes, and approximate algorithms that can address instances with $p\approx 10^6$ in times comparable to the fast $\ell_1$-based algorithms. Our exact algorithm is based on the novel idea of \textsl{integrality generation}, which solves the original problem (with $p$ binary variables) via a sequence of mixed integer programs that involve a small number of binary variables. Our approximate algorithms are based on coordinate descent and local combinatorial search. In addition, we present new estimation error bounds for a class of $\ell_0$-regularized estimators. Experiments on real and synthetic data demonstrate that our approach leads to models with considerably improved statistical performance (especially, variable selection) when compared to competing methods.