论文标题

自我监督学习中的预训练的编码器可改善安全和隐私的监督学习

Pre-trained Encoders in Self-Supervised Learning Improve Secure and Privacy-preserving Supervised Learning

论文作者

Liu, Hongbin, Qu, Wenjie, Jia, Jinyuan, Gong, Neil Zhenqiang

论文摘要

监督学习中的分类器存在各种安全和隐私问题,例如1)数据中毒攻击,后门攻击和对抗方面的对抗性示例,以及2)推理攻击以及隐私方面的培训数据遗忘的权利。已经提出了各种安全和保护隐私的监督学习算法,并有正式保证来解决这些问题。但是,它们受到各种限制,例如准确性损失,小额认证的安全保证和/或效率低下。自我监督学习是一种新兴技术,可以使用未标记的数据预先培训编码器。鉴于预先训练的编码器作为功能提取器,监督学习可以使用少量标记的培训数据来训练简单而准确的分类器。在这项工作中,我们执行了第一个系统的,有原则的测量研究,以了解是否以及何时进行预训练的编码器可以解决安全或隐私的监督学习算法的局限性。我们的关键发现是,预先训练的编码器基本上可以提高1)在没有攻击的情况下进行准确性和确保安全保证,以防止数据中毒和最先进的安全学习算法的后门攻击和后门攻击(即包装和KNN),2)认证的安全保证金,无需私人攻击的准确性,而无需私人攻击的准确性,而不是精确的攻击。和/或精确机器的效率。

Classifiers in supervised learning have various security and privacy issues, e.g., 1) data poisoning attacks, backdoor attacks, and adversarial examples on the security side as well as 2) inference attacks and the right to be forgotten for the training data on the privacy side. Various secure and privacy-preserving supervised learning algorithms with formal guarantees have been proposed to address these issues. However, they suffer from various limitations such as accuracy loss, small certified security guarantees, and/or inefficiency. Self-supervised learning is an emerging technique to pre-train encoders using unlabeled data. Given a pre-trained encoder as a feature extractor, supervised learning can train a simple yet accurate classifier using a small amount of labeled training data. In this work, we perform the first systematic, principled measurement study to understand whether and when a pre-trained encoder can address the limitations of secure or privacy-preserving supervised learning algorithms. Our key findings are that a pre-trained encoder substantially improves 1) both accuracy under no attacks and certified security guarantees against data poisoning and backdoor attacks of state-of-the-art secure learning algorithms (i.e., bagging and KNN), 2) certified security guarantees of randomized smoothing against adversarial examples without sacrificing its accuracy under no attacks, 3) accuracy of differentially private classifiers, and 4) accuracy and/or efficiency of exact machine unlearning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源