数据源的最佳正则化

论文标题

数据源的最佳正则化

Optimal Regularization for a Data Source

论文作者

Leong, Oscar, O'Reilly, Eliza, Soh, Yong Sheng, Chandrasekaran, Venkat

论文摘要

在基于优化的反问题和统计估计的方法中，通常会通过使用正规化程序来促进解决方案中预期的结构特性来增强标准。合适的正规器的选择通常是由先前的域信息和计算考虑因素的组合驱动的。凸正规化器在计算上是有吸引力的，但它们在可以促进的结构类型上受到限制。另一方面，NonConvex正规化器的结构形式更加灵活，并且在某些应用中展示了强烈的经验性能，但是它们遇到了解决相关优化问题的计算挑战。在本文中，我们通过调查以下问题来寻求对凸正则化的功率和局限性的系统性理解：给定分布，从分布中获取的数据的最佳正规化程序是什么？数据源的哪些属性控制最佳正常器是否为凸？我们针对由连续，均匀且远离起源的功能指定的正规化器类别解决这些问题。我们说，如果正常使用器给出的能量的吉布斯密度可最大程度地提高所有正常器诱导的吉布斯密度，那么正常器对于数据分布是最佳的。正如我们认为的正规化器与星体一对一的对应关系一样，我们利用双重Brunn-Minkowski理论表明，从数据分布中得出的径向函数类似于``计算足够的统计量''，因为它是识别最佳正规化器并评估数据源的最佳正则能力的关键数量，以识别正规量的正常源。

In optimization-based approaches to inverse problems and to statistical estimation, it is common to augment criteria that enforce data fidelity with a regularizer that promotes desired structural properties in the solution. The choice of a suitable regularizer is typically driven by a combination of prior domain information and computational considerations. Convex regularizers are attractive computationally but they are limited in the types of structure they can promote. On the other hand, nonconvex regularizers are more flexible in the forms of structure they can promote and they have showcased strong empirical performance in some applications, but they come with the computational challenge of solving the associated optimization problems. In this paper, we seek a systematic understanding of the power and the limitations of convex regularization by investigating the following questions: Given a distribution, what is the optimal regularizer for data drawn from the distribution? What properties of a data source govern whether the optimal regularizer is convex? We address these questions for the class of regularizers specified by functionals that are continuous, positively homogeneous, and positive away from the origin. We say that a regularizer is optimal for a data distribution if the Gibbs density with energy given by the regularizer maximizes the population likelihood (or equivalently, minimizes cross-entropy loss) over all regularizer-induced Gibbs densities. As the regularizers we consider are in one-to-one correspondence with star bodies, we leverage dual Brunn-Minkowski theory to show that a radial function derived from a data distribution is akin to a ``computational sufficient statistic'' as it is the key quantity for identifying optimal regularizers and for assessing the amenability of a data source to convex regularization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题