Characterising Genetic Programming Error Through Extended Bias and Variance Decomposition,IEEE Transactions on Evolutionary Computation

当前位置： X-MOL 学术 › IEEE T. Evolut. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Characterising Genetic Programming Error Through Extended Bias and Variance Decomposition
IEEE Transactions on Evolutionary Computation ( IF 14.3 ) Pub Date : 2020-12-01 , DOI: 10.1109/tevc.2020.2990626
Caitlin A. Owen , Grant Dick , Peter A. Whigham

An error function can be used to select between candidate models but it does not provide a thorough understanding of the behavior of a model. A greater understanding of an algorithm can be obtained by performing a bias-variance decomposition. Splitting the error into bias and variance is effective for understanding a deterministic algorithm such as $k$ -nearest neighbor, which provides the same predictions when performed multiple times using the same data. However, simply splitting the error into bias and variance is not sufficient for nondeterministic algorithms, such as genetic programming (GP), which potentially produces a different model each time it is run, even when using the same data. This article presents an extended bias-variance decomposition that decomposes error into bias, external variance (error attributable to limited sampling of the problem), and internal variance (error due to random actions performed in the algorithm itself). This decomposition is applied to GP to expose the three components of error, providing a unique insight into the role of maximum tree depth, number of generations, size/complexity of function set, and data standardization in influencing predictive performance. The proposed tool can be used to inform targeted improvements for reducing specific components of model error.

中文翻译：

通过扩展偏差和方差分解表征遗传编程错误

误差函数可用于在候选模型之间进行选择，但它不能提供对模型行为的透彻理解。通过执行偏差-方差分解可以获得对算法的更深入理解。将误差分解为偏差和方差对于理解确定性算法（例如 $k$ -最近邻）是有效的，该算法在使用相同数据多次执行时提供相同的预测。然而，简单地将误差分成偏差和方差对于非确定性算法是不够的，例如遗传编程 (GP)，即使使用相同的数据，每次运行时也可能产生不同的模型。本文提出了一种扩展的偏差-方差分解，将误差分解为偏差，外部方差（由于对问题的抽样有限而导致的错误）和内部方差（由于算法本身执行的随机操作导致的错误）。这种分解应用于 GP 以揭示误差的三个组成部分，提供对最大树深度、代数、函数集的大小/复杂性和数据标准化在影响预测性能方面的作用的独特见解。建议的工具可用于通知有针对性的改进，以减少模型错误的特定组成部分。和数据标准化影响预测性能。建议的工具可用于通知有针对性的改进，以减少模型错误的特定组成部分。和数据标准化影响预测性能。建议的工具可用于通知有针对性的改进，以减少模型错误的特定组成部分。

更新日期：2020-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>