论文标题
可测试学习和Rademacher复杂性的新表征的方法匹配方法
A Moment-Matching Approach to Testable Learning and a New Characterization of Rademacher Complexity
论文作者
论文摘要
Rubinfeld和Vasilyan(2022)最近发表的一篇非凡的论文启动了\ emph {可测试的学习}的研究,其中的目标是替换有效测试的难以验证的分布假设(例如高斯),并要求学习者在不知名的分布中取得成功。在此模型中,他们给出了一种有效的算法,用于在可测试的假设下学习半空间,而高斯人则可以满足。 在本文中,我们提供了一种强大的新方法,用于开发算法,用于使用工具匹配和概率度量距离的工具进行测试学习。我们为任何接受低度\ emph {sandwiching polyserials}的概念类别获得有效的可检验学习者,并捕获了我们拥有普通不可知论的学习者的最重要示例。我们将Rubinfeld和Vasilyan的结果恢复为我们的技术的推论,同时为广泛的概念类别和分布提供了改进的,几乎最佳的样本复杂性界限。 令人惊讶的是,我们表明,可检验学习的信息理论样本复杂性的紧密特征是概念类别的Rademacher复杂性,这是统计学习理论中最深入的措施之一。特别是,均匀的收敛是必要的,足以用于测试学习。这导致与(普通)特定于特定的不可知论学习的基本分离,在这种学习中,均匀的收敛足够但不是必需的。
A remarkable recent paper by Rubinfeld and Vasilyan (2022) initiated the study of \emph{testable learning}, where the goal is to replace hard-to-verify distributional assumptions (such as Gaussianity) with efficiently testable ones and to require that the learner succeed whenever the unknown distribution passes the corresponding test. In this model, they gave an efficient algorithm for learning halfspaces under testable assumptions that are provably satisfied by Gaussians. In this paper we give a powerful new approach for developing algorithms for testable learning using tools from moment matching and metric distances in probability. We obtain efficient testable learners for any concept class that admits low-degree \emph{sandwiching polynomials}, capturing most important examples for which we have ordinary agnostic learners. We recover the results of Rubinfeld and Vasilyan as a corollary of our techniques while achieving improved, near-optimal sample complexity bounds for a broad range of concept classes and distributions. Surprisingly, we show that the information-theoretic sample complexity of testable learning is tightly characterized by the Rademacher complexity of the concept class, one of the most well-studied measures in statistical learning theory. In particular, uniform convergence is necessary and sufficient for testable learning. This leads to a fundamental separation from (ordinary) distribution-specific agnostic learning, where uniform convergence is sufficient but not necessary.