论文标题
尺寸数据基于KNN的插补
Dimensional Data KNN-Based Imputation
论文作者
论文摘要
数据仓库(DWS)是商业智能的核心组成部分(BI)。 DW中缺少数据对数据分析有很大的影响。因此,需要完成丢失的数据。与其他主要适用于事实的现有数据归合方法不同,我们提出了一种新的插入方法。该方法包含两个步骤:1)分层插补和2)基于K-Near的邻居(KNN)插补。我们的解决方案具有考虑DW结构和依赖性约束的优势。实验评估在有效性和效率方面验证了我们的方法。
Data Warehouses (DWs) are core components of Business Intelligence (BI). Missing data in DWs have a great impact on data analyses. Therefore, missing data need to be completed. Unlike other existing data imputation methods mainly adapted for facts, we propose a new imputation method for dimensions. This method contains two steps: 1) a hierarchical imputation and 2) a k-nearest neighbors (KNN) based imputation. Our solution has the advantage of taking into account the DW structure and dependency constraints. Experimental assessments validate our method in terms of effectiveness and efficiency.