论文标题
质量:多属性选择性抑制
MaSS: Multi-attribute Selective Suppression
论文作者
论文摘要
机器学习技术的最近快速进步在很大程度上取决于当今可用的数据丰富,这在其中包含的数量和丰富内容方面。例如,诸如图像和声音之类的生物特征数据可以揭示人们的属性,例如年龄,性别,情感和起源,而位置/运动数据可用于推断人们的活动水平,运输方式和生活习惯。除了通过此类技术进步提供的新服务和应用程序外,还制定了各种政府政策来规范此类数据使用并保护人们的隐私和权利。结果,数据所有者通常选择简单的数据混淆(例如,图像中的人的面孔模糊)或完全扣留数据,这会导致严重的数据质量降级,并大大限制了数据的潜在效用。 为了建立一种复杂的机制,该机制在保留数据实用程序的最大程度的同时为数据所有者提供了细粒度的控制,我们提出了多属性的选择性抑制或质量,这是一种用于执行精确靶向数据手术的一般框架,以同时抑制任何选定的属性集,同时为下游机器学习任务保留其余的属性。 Mass通过两组网络之间的对抗游戏来学习一个数据修饰符,其中一个旨在抑制所选属性,而另一个旨在通过一般对比度损失以及显式分类指标确保保留其余属性。我们使用来自不同领域的多个数据集(包括面部图像,语音音频和视频剪辑)对我们提出的方法进行了广泛的评估,并获得了质量的可推广性和抑制目标属性的能力的有希望的结果,而不会对其他下游ML任务中数据的可用性产生负面影响。
The recent rapid advances in machine learning technologies largely depend on the vast richness of data available today, in terms of both the quantity and the rich content contained within. For example, biometric data such as images and voices could reveal people's attributes like age, gender, sentiment, and origin, whereas location/motion data could be used to infer people's activity levels, transportation modes, and life habits. Along with the new services and applications enabled by such technological advances, various governmental policies are put in place to regulate such data usage and protect people's privacy and rights. As a result, data owners often opt for simple data obfuscation (e.g., blur people's faces in images) or withholding data altogether, which leads to severe data quality degradation and greatly limits the data's potential utility. Aiming for a sophisticated mechanism which gives data owners fine-grained control while retaining the maximal degree of data utility, we propose Multi-attribute Selective Suppression, or MaSS, a general framework for performing precisely targeted data surgery to simultaneously suppress any selected set of attributes while preserving the rest for downstream machine learning tasks. MaSS learns a data modifier through adversarial games between two sets of networks, where one is aimed at suppressing selected attributes, and the other ensures the retention of the rest of the attributes via general contrastive loss as well as explicit classification metrics. We carried out an extensive evaluation of our proposed method using multiple datasets from different domains including facial images, voice audio, and video clips, and obtained promising results in MaSS' generalizability and capability of suppressing targeted attributes without negatively affecting the data's usability in other downstream ML tasks.