论文标题
人口结构学习的分类器,用于高维低样本大小的类失去平衡问题
Population structure-learned classifier for high-dimension low-sample-size class-imbalanced problem
论文作者
论文摘要
高维低样本大小数据(HDLS)的分类是一个具有挑战性的问题,在大多数应用程序字段中具有类不平衡数据是一个挑战性的问题。我们将其称为不平衡的HDLSS(IHDLSS)。最近的理论结果表明,分类标准和公差相似性对HDLSS至关重要,这强调了阶级可分离性前提的阶级内差异的最大化。基于这个想法,提出了一种新型的线性二进制分类器,称为种群结构学习分类器(PSC)。提出的PSC可以通过在类可分离性的前提上最大化类间散点矩阵和阶级散点矩阵的总和来获得更好的IHDLSS概括性能,并将不同的拦截值分配给多数族裔和少数族裔类别。提出的方法的显着特征是:(1)它在IHDLSS上效果很好; (2)高维矩阵的倒数可以在低维空间中求解; (3)在确定每个班级的拦截项时是自适应的; (4)它具有与SVM相同的计算复杂性。关于基因分析的IHDLSS,对一个模拟数据集和八个现实世界基准数据集进行了一系列评估。实验结果表明,PSC优于IHDLSS中的最新方法。
The Classification on high-dimension low-sample-size data (HDLSS) is a challenging problem and it is common to have class-imbalanced data in most application fields. We term this as Imbalanced HDLSS (IHDLSS). Recent theoretical results reveal that the classification criterion and tolerance similarity are crucial to HDLSS, which emphasizes the maximization of within-class variance on the premise of class separability. Based on this idea, a novel linear binary classifier, termed Population Structure-learned Classifier (PSC), is proposed. The proposed PSC can obtain better generalization performance on IHDLSS by maximizing the sum of inter-class scatter matrix and intra-class scatter matrix on the premise of class separability and assigning different intercept values to majority and minority classes. The salient features of the proposed approach are: (1) It works well on IHDLSS; (2) The inverse of high dimensional matrix can be solved in low dimensional space; (3) It is self-adaptive in determining the intercept term for each class; (4) It has the same computational complexity as the SVM. A series of evaluations are conducted on one simulated data set and eight real-world benchmark data sets on IHDLSS on gene analysis. Experimental results demonstrate that the PSC is superior to the state-of-art methods in IHDLSS.