论文标题
UDRN:用于特征选择和特征投影的统一尺寸还原神经网络
UDRN: Unified Dimensional Reduction Neural Network for Feature Selection and Feature Projection
论文作者
论文摘要
尺寸还原〜(DR)将高维数据映射到较低维度的潜在空间,具有最小化的定义优化目标。 DR方法通常属于特征选择〜(FS)和特征投影〜(FP)。 FS专注于选择尺寸的关键子集,但有风险破坏数据分布(结构)。另一方面,FP将所有输入特征结合到较低的维度空间中,旨在维护数据结构。但是缺乏解释性和稀疏性。 FS和FP传统上是不兼容的类别;因此,它们尚未统一为友好的框架。我们建议理想的DR方法将FS和FP同时结合到统一的端到端多种学习框架中,同时执行基本特征发现,同时保持潜在空间中数据样本之间的固有关系。在这项工作中,我们开发了一个统一的框架,统一的尺寸还原神经网络〜(UDRN),该框架以兼容的端到端方式将FS和FP整合在一起。我们通过使用两个堆叠子网络分别实施FS和FP任务来改善神经网络结构。此外,我们设计了DR流程的数据增强,以提高方法在处理广泛的功能数据集和设计损失功能时,可以与数据增强合作。关于四个图像和四个生物数据集的广泛实验结果,包括非常高维数据,证明了DRN的优势比现有方法〜(FS,FP和FS \&FP管道),尤其是在分类和可视化等下游任务中。
Dimensional reduction~(DR) maps high-dimensional data into a lower dimensions latent space with minimized defined optimization objectives. The DR method usually falls into feature selection~(FS) and feature projection~(FP). FS focuses on selecting a critical subset of dimensions but risks destroying the data distribution (structure). On the other hand, FP combines all the input features into lower dimensions space, aiming to maintain the data structure; but lacks interpretability and sparsity. FS and FP are traditionally incompatible categories; thus, they have not been unified into an amicable framework. We propose that the ideal DR approach combines both FS and FP into a unified end-to-end manifold learning framework, simultaneously performing fundamental feature discovery while maintaining the intrinsic relationships between data samples in the latent space. In this work, we develop a unified framework, Unified Dimensional Reduction Neural-network~(UDRN), that integrates FS and FP in a compatible, end-to-end way. We improve the neural network structure by implementing FS and FP tasks separately using two stacked sub-networks. In addition, we designed data augmentation of the DR process to improve the generalization ability of the method when dealing with extensive feature datasets and designed loss functions that can cooperate with the data augmentation. Extensive experimental results on four image and four biological datasets, including very high-dimensional data, demonstrate the advantages of DRN over existing methods~(FS, FP, and FS\&FP pipeline), especially in downstream tasks such as classification and visualization.