论文标题
符号回归的多目标遗传编程中的发展性变性
Evolvability Degeneration in Multi-Objective Genetic Programming for Symbolic Regression
论文作者
论文摘要
遗传编程(GP)是当今发现符号回归模型的最佳方法之一。为了找到能够取消准确性和复杂性的模型,广泛使用了非主导分类遗传算法II(NSGA-II)。不幸的是,已经证明NSGA-II效率低下:在早期,低复杂性模型过于复制并接管了大多数人群。因此,研究提出了不同的方法来促进多样性。在这里,我们研究了这个问题的根源,以设计出色的方法。我们发现低复杂度模型的过度复制是由于缺乏可发展性,即无法以提高准确性产生后代。因此,随着时间的流逝,我们将NSGA-II扩展以跟踪不同级别复杂性模型的发展性。有了这些信息,我们限制了允许每个复杂性水平的多少个模型来生存。我们将这种新版本的NSGA-II,Evonsga-II与在十个广泛使用的数据集中使用七种现有的多目标GP方法进行了比较,并发现Evonsga-II在几乎所有比较中都与使用这些方法相等或优越。此外,我们的结果证实了Evonsga-II的行为如预期的:大多数人口的模型形成了大多数人群。
Genetic programming (GP) is one of the best approaches today to discover symbolic regression models. To find models that trade off accuracy and complexity, the non-dominated sorting genetic algorithm II (NSGA-II) is widely used. Unfortunately, it has been shown that NSGA-II can be inefficient: in early generations, low-complexity models over-replicate and take over most of the population. Consequently, studies have proposed different approaches to promote diversity. Here, we study the root of this problem, in order to design a superior approach. We find that the over-replication of low complexity-models is due to a lack of evolvability, i.e., the inability to produce offspring with improved accuracy. We therefore extend NSGA-II to track, over time, the evolvability of models of different levels of complexity. With this information, we limit how many models of each complexity level are allowed to survive the generation. We compare this new version of NSGA-II, evoNSGA-II, with the use of seven existing multi-objective GP approaches on ten widely-used data sets, and find that evoNSGA-II is equal or superior to using these approaches in almost all comparisons. Furthermore, our results confirm that evoNSGA-II behaves as intended: models that are more evolvable form the majority of the population.