论文标题

增强对策划规则的理解和自动改编

Augmented Understanding and Automated Adaptation of Curation Rules

论文作者

Tabebordbar, Alireza

论文摘要

在过去的几年中,已经做出了许多努力来策划和增加原始数据的附加值。数据策展已定义为分析师承诺将原始数据转换为上下文化数据和知识的活动和处理。数据策划使决策者和数据分析师能够从原始数据中提取价值并获得洞察力。但是,为了策划原始数据,分析师需要执行各种策划任务,包括易于出错,乏味且具有挑战性的提取链接,分类和索引。此外,派生的洞察力要求分析师花费很长时间来扫描和分析策展环境。当策展环境很大时,此问题会加剧,分析师需要策划多种多样的数据列表。为了解决这些挑战,在本文中,我们提出了增加策展任务中分析师的技术,算法和系统。我们提出:〜(1)一种基于功能和自动化的技术,用于策划原始数据。 〜(2)我们提出了一种自主方法,以适应数据策划规则。 〜(3)我们提供了一种解决用户在大规模信息空间中策划数据的过程中增强用户的解决方案。 〜(4)我们实施了一组API来自动化基本策展任务,包括命名实体提取,POS标签,分类等。

Over the past years, there has been many efforts to curate and increase the added value of the raw data. Data curation has been defined as activities and processes an analyst undertakes to transform the raw data into contextualized data and knowledge. Data curation enables decision-makers and data analyst to extract value and derive insight from the raw data. However, to curate the raw data, an analyst needs to carry out various curation tasks including, extraction linking, classification, and indexing, which are error-prone, tedious and challenging. Besides, deriving insight require analysts to spend a long period of time to scan and analyze the curation environments. This problem is exacerbated when the curation environment is large, and the analyst needs to curate a varied and comprehensive list of data. To address these challenges, in this dissertation, we present techniques, algorithms and systems for augmenting analysts in curation tasks. We propose: ~(1) a feature-based and automated technique for curating the raw data. ~(2) We propose an autonomic approach for adapting data curation rules. ~(3) We provide a solution to augment users in formulating their preferences while curating data in large scale information spaces. ~(4) We implement a set of APIs for automating the basic curation tasks, including Named Entity extraction, POS tags, classification, and etc.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源