论文标题
跨数据筒仓的异步协作学习
Asynchronous Collaborative Learning Across Data Silos
论文作者
论文摘要
在大型数据集中训练时,机器学习算法可以表现良好。尽管大型组织通常具有相当大的数据资产,但这些资产可能很难以使培训成为可能的方式进行统一。在组织的不同部分中,数据通常是“孤立的”,孤岛之间几乎没有访问。在金融服务或医疗保健等严格监管的行业中,数据资产的这种分裂尤其普遍。在本文中,我们提出了一个框架,以实现对数据筒仓中机器学习模型的异步协作培训。这使数据科学团队可以协作训练机器学习模型,而无需彼此共享数据。我们提出的方法增强了常规的联合学习技术,使其适合于这种组织内化的跨性别环境中的这种异步培训。我们通过广泛的实验来验证我们提出的方法。
Machine learning algorithms can perform well when trained on large datasets. While large organisations often have considerable data assets, it can be difficult for these assets to be unified in a manner that makes training possible. Data is very often 'siloed' in different parts of the organisation, with little to no access between silos. This fragmentation of data assets is especially prevalent in heavily regulated industries like financial services or healthcare. In this paper we propose a framework to enable asynchronous collaborative training of machine learning models across data silos. This allows data science teams to collaboratively train a machine learning model, without sharing data with one another. Our proposed approach enhances conventional federated learning techniques to make them suitable for this asynchronous training in this intra-organisation, cross-silo setting. We validate our proposed approach via extensive experiments.