Efficient Approximate Dynamic Programming Based on Design and Analysis of Computer Experiments for Infinite-Horizon Optimization,Computers & Operations Research

当前位置： X-MOL 学术 › Comput. Oper. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient Approximate Dynamic Programming Based on Design and Analysis of Computer Experiments for Infinite-Horizon Optimization
Computers & Operations Research ( IF 4.1 ) Pub Date : 2020-12-01 , DOI: 10.1016/j.cor.2020.105032
Ying Chen , Feng Liu , Jay M. Rosenberger , Victoria C.P. Chen , Asama Kulvanitchaiyanunt , Yuan Zhou

Abstract The approximate dynamic programming (ADP) method based on the design and analysis of computer experiments (DACE) approach has been demonstrated as an effective method to solve multistage decision-making problems in the literature. However, this method is still not efficient for infinite-horizon optimization considering the required large volume of sampling in the state space and high-quality value function identification. Therefore, we propose a sequential sampling algorithm and embed it into a DACE-based ADP method to obtain a high-quality value function approximation. Considering the limitations of the traditional stopping criterion (Bellman error bound), we further propose a 45-degree line stopping criterion to terminate value iteration early by identifying an optimally equivalent value function. A comparison of the computational results with those of other three existing policies indicates that the proposed sampling algorithm and stopping criterion can determine a high-quality ADP policy. Finally, we discuss the extrapolation issue of the value function approximated by multivariate adaptive regression splines, the results of which further demonstrate the quality of the ADP policy generated in this study.

中文翻译：

基于无限地平线优化计算机实验设计与分析的高效近似动态规划

摘要基于计算机实验设计与分析（DACE）方法的近似动态规划（ADP）方法已被证明是解决多阶段决策问题的有效方法。然而，考虑到需要在状态空间中进行大量采样和高质量的值函数识别，这种方法对于无限范围优化仍然不是很有效。因此，我们提出了一种顺序采样算法，并将其嵌入到基于 DACE 的 ADP 方法中，以获得高质量的值函数逼近。考虑到传统停止准则（Bellman 误差界限）的局限性，我们进一步提出了 45 度线停止准则，通过识别最优等效值函数来提前终止值迭代。将计算结果与其他三种现有策略的计算结果进行比较表明，所提出的采样算法和停止标准可以确定高质量的 ADP 策略。最后，我们讨论了由多元自适应回归样条近似的价值函数的外推问题，其结果进一步证明了本研究中生成的 ADP 策略的质量。

更新日期：2020-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11