论文标题
FEDQA:通过联合学习的隐私感知机器阅读理解
FedQAS: Privacy-aware machine reading comprehension with federated learning
论文作者
论文摘要
文本数据的机器阅读理解(MRC)是自然语言理解中的一项重要任务。这是一个复杂的NLP问题,其中许多正在进行的研究源于斯坦福问题答案数据集(小队)和对话问题答案(COQA)。这被认为是一种努力,教会计算机如何“理解”文本,然后能够使用深度学习来回答有关它的问题。但是,到目前为止,对于此NLP任务,缺少对私人文本数据和知识共享的大规模培训。因此,我们提出了一种保存隐私的机器阅读系统FedQas,该系统能够利用大型私人数据,而无需将这些数据集汇总在中心位置。提出的方法结合了变压器模型和联合学习技术。该系统是使用FedN框架开发的,并作为概念验证联盟计划部署。 FEDQAS具有灵活性,语言不可思议,并允许直观的参与和执行本地模型培训。此外,我们介绍了系统的体系结构和实现,并提供了基于小队数据集的参考评估,以展示其如何克服数据隐私问题,并在联合学习设置中实现联盟成员之间的知识共享。
Machine reading comprehension (MRC) of text data is one important task in Natural Language Understanding. It is a complex NLP problem with a lot of ongoing research fueled by the release of the Stanford Question Answering Dataset (SQuAD) and Conversational Question Answering (CoQA). It is considered to be an effort to teach computers how to "understand" a text, and then to be able to answer questions about it using deep learning. However, until now large-scale training on private text data and knowledge sharing has been missing for this NLP task. Hence, we present FedQAS, a privacy-preserving machine reading system capable of leveraging large-scale private data without the need to pool those datasets in a central location. The proposed approach combines transformer models and federated learning technologies. The system is developed using the FEDn framework and deployed as a proof-of-concept alliance initiative. FedQAS is flexible, language-agnostic, and allows intuitive participation and execution of local model training. In addition, we present the architecture and implementation of the system, as well as provide a reference evaluation based on the SQUAD dataset, to showcase how it overcomes data privacy issues and enables knowledge sharing between alliance members in a Federated learning setting.