New input selection procedure for machine learning methods in estimating daily global solar radiation,Arabian Journal of Geosciences

当前位置： X-MOL 学术 › Arab. J. Geosci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

New input selection procedure for machine learning methods in estimating daily global solar radiation
Arabian Journal of Geosciences ( IF 1.827 ) Pub Date : 2020-06-04 , DOI: 10.1007/s12517-020-05437-0
Seyed Mostafa Biazar , Vahid Rahmani , Mohammad Isazadeh , Ozgur Kisi , Yagob Dinpashoh

Selection of optimal model inputs is a challenging issue particularly for non-linear and dynamic systems. In this study, a new input selection method, procrustes analysis (PA), was implemented and compared with gamma test (GT) for estimating daily global solar radiation (Rs). The PA and GT were applied for modeling with the non-linear models of artificial neural networks (ANNs) and support vector machines (SVMs). Goodness-of-fit of the models was evaluated by the coefficient of correlation (CC), root-mean-square error (RMSE), and Nash-Sutcliffe model efficiency coefficient (NS). The uncertainty of the model outputs was determined using 95PPU% (p-factor) and d-factor. In this study, we used maximum wind speed, mean wind speed, maximum temperature, minimum temperature, mean temperature, maximum sea surface pressure, minimum sea surface pressure, mean sea surface pressure, mean vapor pressure, total rainfall, maximum cloudiness, mean cloudiness, maximum humidity, minimum humidity, mean humidity, sunshine hours, evaporation, mean dew point temperature, mean wet point temperature, maximum air pressure, minimum air pressure, mean air pressure, and mean vapor saturation as input variables. Maximum and mean temperature; maximum wind speed; maximum, minimum, and mean sea surface pressure; maximum, minimum, and mean air pressure; mean vapor pressure; mean cloudiness; mean humidity; sunshine hours; mean dew point temperature; mean wet point temperature; and mean vapor saturation pressure were identified as significant input variables by GT in five or more of the eight studied stations. Also, mean air pressure, mean cloudiness, and mean temperature were identified as significant input variables for Rs modeling by the PA method for more than four stations. Results indicated that although ANN-GT and SVM-GT showed better goodness-of-fit metrics, ANN-PA and SVM-PA had lower uncertainties for estimating Rs. According to the obtained results, almost all models showed that the higher the bandwidth (95PPU or P-factor), the greater the d-factor, and the lower the bandwidth, the lower the d-factor, SVM-PA has the lowest uncertainty among the four models. So, it can be seen that the lowest bandwidth also belonged to the SVM-PA model for Kiashahr with a P-factor of 0.8% and a d-factor of 0.06, although the Aliabad-E-Katoul had the lowest d-factor of 0.017 and a p-factor of 1%. The highest d-factor belonged to the ANN-GT model for a Bandar-E-Torkman with a d-factor of 0.817 and a p-factor of 76%. One reason for the high uncertainty in this model might be due to the number of input variables selected by the GT. Lower uncertainty is a major scale for choosing the optimal model for solving a given problem, suggesting results of the SVM-PA model with lower uncertainty are more reliable.

中文翻译：

用于估计每日全球太阳辐射的机器学习方法的新输入选择程序

选择最佳模型输入是一个具有挑战性的问题，尤其是对于非线性和动态系统。在这项研究中，实施了一种新的输入选择方法，过程分析（PA），并将其与伽马测试（GT）进行了比较，以估算每日的全球太阳辐射（Rs）。PA和GT用于通过人工神经网络（ANN）和支持向量机（SVM）的非线性模型进行建模。通过相关系数（CC），均方根误差（RMSE）和Nash-Sutcliffe模型效率系数（NS）评估模型的拟合优度。使用95PPU％（p因子）和d因子确定模型输出的不确定性。在这项研究中，我们使用了最大风速，平均风速，最高温度，最低温度，平均温度，最大海面压力，最小海面压力，平均海面压力，平均蒸气压，总降雨量，最大混浊度，平均混浊度，最大湿度，最小湿度，平均湿度，日照时间，蒸发量，平均露点温度，平均湿点温度，最大气压，最小气压，平均气压和平均蒸汽饱和度作为输入变量。最高和平均温度；最大风速；最大，最小和平均海面压力；最大，最小和平均气压；平均蒸气压平均多云平均湿度日照时间；平均露点温度；平均湿点温度；在八个研究站点中的五个或更多站点中，GT将平均蒸汽饱和压力和平均蒸汽饱和压力确定为重要的输入变量。此外，平均气压，平均浑浊，通过PA方法，将超过四个站点的温度和平均温度确定为Rs建模的重要输入变量。结果表明，尽管ANN-GT和SVM-GT表现出更好的拟合优度指标，但ANN-PA和SVM-PA估计Rs的不确定性较低。根据获得的结果，几乎所有模型都显示带宽（95PPU或P因子）越高，d因子越大，而带宽越低，d因子越低，SVM-PA的不确定性最低。在四个模型中。因此，可以看出，最低带宽也属于Kiashahr的SVM-PA模型，P因子为0.8％，d因子为0.06，尽管Aliabad-E-Katoul的最低d因子为0.017，p因子为1％。d因子为0的Bandar-E-Torkman的最高d因子属于ANN-GT模型。817，p因子为76％。该模型中高度不确定性的一个原因可能是由于GT选择的输入变量的数量。较低的不确定性是选择用于解决给定问题的最佳模型的主要尺度，这表明具有较低不确定性的SVM-PA模型的结果更加可靠。

更新日期：2020-06-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>