论文标题

策略QA:隐私政策的阅读理解数据集

PolicyQA: A Reading Comprehension Dataset for Privacy Policies

论文作者

Ahmad, Wasi Uddin, Chi, Jianfeng, Tian, Yuan, Chang, Kai-Wei

论文摘要

隐私政策文件长而冗长。一个问题回答(QA)系统可以帮助用户查找与他们相关且重要的信息。该域中的先前研究将质量检查任务框架为检索最相关的文本段或策略文档中的句子列表。相反,我们认为为用户提供策略文档的简短文本跨度减轻了从冗长的文本段中搜索目标信息的负担。在本文中,我们提出了策略QA,该数据集包含25,017个阅读理解样式示例,该示例是根据现有的115个网站隐私政策策划的。 PolicyQA提供了714个针对各种隐私惯例的人类宣传的问题。我们评估了两个现有的神经质量检查模型,并进行严格的分析,以揭示策略QA提供的优势和挑战。

Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy document given a question. On the contrary, we argue that providing users with a short text span from policy documents reduces the burden of searching the target information from a lengthy text segment. In this paper, we present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies. PolicyQA provides 714 human-annotated questions written for a wide range of privacy practices. We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源