论文标题
DNNR:差异最近的邻居回归
DNNR: Differential Nearest Neighbors Regression
论文作者
论文摘要
K-Nearest邻居(KNN)是机器学习中最早,最成熟的算法之一。对于回归任务,knn平均构成了一个构成许多挑战的社区中的目标:邻域定义对于预测性能至关重要,因为可以基于非信息性特征选择邻居,并且平均值不会解释该功能在本地的变化。我们提出了一种称为差异最近邻居回归(DNNR)的新方法,该方法同时解决了这两个问题:在训练期间,DNNR估计本地梯度以扩展特征;在推断期间,它使用估计的梯度执行t阶泰勒近似值。在对250多个数据集的大规模评估中,我们发现DNNR的性能与最新的梯度增强方法和MLP相当,同时保持KNN的简单性和透明度。这使我们能够得出理论错误界限并检查故障。在要求ML模型透明度的时代,DNNR在性能和解释性之间提供了良好的平衡。
K-nearest neighbors (KNN) is one of the earliest and most established algorithms in machine learning. For regression tasks, KNN averages the targets within a neighborhood which poses a number of challenges: the neighborhood definition is crucial for the predictive performance as neighbors might be selected based on uninformative features, and averaging does not account for how the function changes locally. We propose a novel method called Differential Nearest Neighbors Regression (DNNR) that addresses both issues simultaneously: during training, DNNR estimates local gradients to scale the features; during inference, it performs an n-th order Taylor approximation using estimated gradients. In a large-scale evaluation on over 250 datasets, we find that DNNR performs comparably to state-of-the-art gradient boosting methods and MLPs while maintaining the simplicity and transparency of KNN. This allows us to derive theoretical error bounds and inspect failures. In times that call for transparency of ML models, DNNR provides a good balance between performance and interpretability.