当前位置: X-MOL 学术Operations Research › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimization of Tree Ensembles
Operations Research ( IF 2.7 ) Pub Date : 2020-09-01 , DOI: 10.1287/opre.2019.1928
Velibor V. Mišić 1
Affiliation  

Tree ensemble models such as random forests and boosted trees are among the most widely used and practically successful predictive models in applied machine learning and business analytics. Although such models have been used to make predictions based on exogenous, uncontrollable independent variables, they are increasingly being used to make predictions where the independent variables are controllable and are also decision variables. In this paper, we study the problem of tree ensemble optimization: given a tree ensemble that predicts some dependent variable using controllable independent variables, how should we set these variables so as to maximize the predicted value? We formulate the problem as a mixed-integer optimization problem. We theoretically examine the strength of our formulation, provide a hierarchy of approximate formulations with bounds on approximation quality and exploit the structure of the problem to develop two large-scale solution methods, one based on Benders decomposition and one based on iteratively generating tree split constraints. We test our methodology on real data sets, including two case studies in drug design and customized pricing, and show that our methodology can efficiently solve large-scale instances to near or full optimality, and outperforms solutions obtained by heuristic approaches. In our drug design case, we show how our approach can identify compounds that efficiently trade-off predicted performance and novelty with respect to existing, known compounds. In our customized pricing case, we show how our approach can efficiently determine optimal store-level prices under a random forest model that delivers excellent predictive accuracy.

中文翻译:

树木组合的优化

在应用机器学习和业务分析中,诸如随机森林和改良树木之类的树木集成模型是使用最广泛且实践最成功的预测模型之一。尽管已使用此类模型基于外生的,不可控的自变量进行预测,但越来越多地将其用于进行自变量可控制且也是决策变量的预测。在本文中,我们研究了树集合优化的问题:给定一个树集合,该树集合使用可控自变量来预测一些因变量,我们应该如何设置这些变量以最大化预测值?我们将该问题表述为混合整数优化问题。我们从理论上考察了我们配方的强度,提供一种近似公式的层次结构,并限制近似质量,并利用问题的结构来开发两种大规模的求解方法,一种基于Benders分解,另一种基于迭代生成树分裂约束。我们在真实数据集上测试了我们的方法,包括药物设计和定制定价中的两个案例研究,并表明我们的方法可以有效地解决大型实例,达到接近或完全的最优性,并且其性能优于启发式方法。在我们的药物设计案例中,我们展示了我们的方法如何识别与现有已知化合物有效权衡预测性能和新颖性的化合物。在我们的自定义定价情况下,
更新日期:2020-09-01
down
wechat
bug