厄普顿：防止作者身份通过数据中毒从公共文本发布中泄漏

论文标题

厄普顿：防止作者身份通过数据中毒从公共文本发布中泄漏

UPTON: Preventing Authorship Leakage from Public Text Release via Data Poisoning

论文作者

Wang, Ziyao, Le, Thai, Lee, Dongwon

论文摘要

考虑一个场景，当攻击者可能已经建立了基于公共著作（包括作者的公共著作）的作者归因（AA）模型时，有许多公共著作希望“匿名”写作“匿名”，其中许多公共著作希望“匿名”写作。为了实现她的愿望，我们问一个问题：“一个人可以使公开发行的著作《 T》，不可归类，以便接受T训练的AA模型不能很好地归因于其作者身份？”在这个问题上，我们提出了一种新颖的解决方案Upton，该解决方案利用了黑盒数据中毒方法来削弱训练样本中的作者身份特征，并使已释放的文本未能完成。它与以前的混淆工作不同，E.G。的对抗性攻击修改了测试样本或后门作品，该作品仅在发生触发单词时才更改模型输出。我们使用四个作者数据数据集（IMDB10，IMDB64，Anron和WJO），我们提出了经验验证，其中Upton成功将AA模型的准确性降低到了不切实际的级别（〜35％），同时保持文本仍然可读（语义相似性> 0.9）。 Upton仍然对已经接受了可用的清洁作者著作培训的AA模型有效。

Consider a scenario where an author-e.g., activist, whistle-blower, with many public writings wishes to write "anonymously" when attackers may have already built an authorship attribution (AA) model based off of public writings including those of the author. To enable her wish, we ask a question "Can one make the publicly released writings, T, unattributable so that AA models trained on T cannot attribute its authorship well?" Toward this question, we present a novel solution, UPTON, that exploits black-box data poisoning methods to weaken the authorship features in training samples and make released texts unlearnable. It is different from previous obfuscation works-e.g., adversarial attacks that modify test samples or backdoor works that only change the model outputs when triggering words occur. Using four authorship datasets (IMDb10, IMDb64, Enron, and WJO), we present empirical validation where UPTON successfully downgrades the accuracy of AA models to the impractical level (~35%) while keeping texts still readable (semantic similarity>0.9). UPTON remains effective to AA models that are already trained on available clean writings of authors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题