基于扩展的邻里规则的K最近的邻居分类器集合和集合

论文标题

基于扩展的邻里规则的K最近的邻居分类器集合和集合

A k nearest neighbours classifiers ensemble based on extended neighbourhood rule and features subsets

论文作者

Ali, Amjad, Hamraz, Muhammad, Gul, Naz, Khan, Dost Muhammad, Khan, Zardad, Aldahmani, Saeed

论文摘要

基于KNN的集合方法通过识别给定特征空间中最接近看不见的观察值的一组数据点，以最大程度地减少离群值的效果，以便通过使用多数投票来预测其响应。基于KNN的普通合奏发现基于k的预定义值的区域（由球体界定）的K最近观测值。但是，当测试观察遵循最接近的数据点的模式的情况下，这种情况可能无法在情况下起作用，该模式与给定球体中未包含的某个路径相同的类别。本文提出了一个K最近的邻居合奏，其中确定邻居以K步骤确定。从对测试点的第一个观察结果开始，该算法将识别出最接近上一步的观测值的单个观察结果。在集合中的每个基础学习者中，此搜索将扩展到随机引导样本上的K步骤，并从特征空间中选择了特征的随机子集。测试点的最终预测类是通过在所有基本模型给出的预测类中使用多数投票来确定的。这种新的合奏方法应用于17个基准数据集上，并将其与其他经典方法（包括基于KNN的模型）相比，就分类准确性（Kappa和Brier得分）作为绩效指标。框图还用于说明提出的和其他最新方法给出的结果的差异。在大多数情况下，提出的方法优于其余经典方法。该论文提供了一项详细的模拟研究，以进行进一步评估。

kNN based ensemble methods minimise the effect of outliers by identifying a set of data points in the given feature space that are nearest to an unseen observation in order to predict its response by using majority voting. The ordinary ensembles based on kNN find out the k nearest observations in a region (bounded by a sphere) based on a predefined value of k. This scenario, however, might not work in situations when the test observation follows the pattern of the closest data points with the same class that lie on a certain path not contained in the given sphere. This paper proposes a k nearest neighbour ensemble where the neighbours are determined in k steps. Starting from the first nearest observation of the test point, the algorithm identifies a single observation that is closest to the observation at the previous step. At each base learner in the ensemble, this search is extended to k steps on a random bootstrap sample with a random subset of features selected from the feature space. The final predicted class of the test point is determined by using a majority vote in the predicted classes given by all base models. This new ensemble method is applied on 17 benchmark datasets and compared with other classical methods, including kNN based models, in terms of classification accuracy, kappa and Brier score as performance metrics. Boxplots are also utilised to illustrate the difference in the results given by the proposed and other state-of-the-art methods. The proposed method outperformed the rest of the classical methods in the majority of cases. The paper gives a detailed simulation study for further assessment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题