Elsevier

Applied Soft Computing

Volume 99, February 2021, 106878
Applied Soft Computing

Optimal uncertainty-guided neural network training

https://doi.org/10.1016/j.asoc.2020.106878Get rights and content

Highlights

  • Proposed a smooth cost function for uncertainty quantification NN training.

  • NN-training becomes faster, and the convergence of the training becomes higher.

  • The cost function is customizable.

  • Present philosophies of existing direct uncertainty quantification techniques.

  • Present optimal uncertainty quantification of wind power and electricity demand.

Abstract

The neural network (NN)-based direct uncertainty quantification (UQ) methods have achieved the state of the art performance since the first inauguration, known as the lower–upper-bound estimation (LUBE) method. However, currently-available cost functions for uncertainty guided NN training are not always converging, and all converged NNs do not generate optimized prediction intervals (PIs). In recent years researchers have proposed different quality criteria for PIs that raise a question about their relative effectiveness. Most of the existing cost functions of uncertainty guided NN training are not customizable, and the convergence of the NN training is uncertain. Therefore, in this paper, we propose a highly customizable smooth cost function for developing NNs to construct optimal PIs. The method computes the optimized average width of PIs, PI-failure distances, and the PI coverage probability (PICP) for the test dataset. We examine the performance of the proposed method for wind power generation, electricity demand, and temperature forecast datasets. Results show that the proposed method reduces variation in the quality of PIs, accelerates the training, and improves convergence probability from 99.2% to 99.8%.

Introduction

All-natural quantities have some uncertainties. The value of any measurement may slightly or greatly vary for the same circumstances. The variance of the quantity at a circumstance is an indication of the level of uncertainty for the circumstance. The level of uncertainty is heteroscedastic and thus the predictability of the same quantity can be different for different circumstances. Traditional point prediction systems predict a value which is the most probable for the corresponding input combination. The actual value may differ from the prediction slightly or greatly based on circumstances [1], [2]. For instance, electricity demand at off-peak and full-peak hours is highly predictable but the demand on the transition times may largely vary from one day to another day [3], [4]. The modeling error or the inherent randomness of the system may cause a difference between prediction and the actual value [5], [6]. The inclusion of some inputs such as the current temperature and calendar information may reduce the uncertainty of predictions. However, some portion of the uncertainty can be random and unpredictable from existing features. The uncertainty can also be asymmetrically heteroscedastic. This brings the realization that the point prediction with a certain error possibility cannot provide adequate information to the user about the level of uncertainty [7], [8].

Probabilistic forecast, such as an uncertainty bound is also popular in decision-making [9], [10]. However, a single uncertainty bound is unable to represent the level of uncertainty. Multiple uncertainty-bounds can be applied to quantify uncertainties. PI is a recognized UQ method, applies the upper and the lower bounds to quantify the level of uncertainty. Probabilistic forecasts such as prediction intervals (PIs) with a certain coverage probability are more appropriate for understanding the uncertain condition. Fig. 1 presents the uncertainty captured by PIs with 99% coverage probability. The probability density function and the width of the PI vary from sample to sample based on the corresponding uncertainty [11], [12]. Decision-makers get the most probable regions of targets from PIs, generated by NNs even for an asymmetric and heteroscedastic system [13], [14].

The end-user finds the NN as a black box. NNs can provide state-of-art performance in many areas with proper training. The designer of any NN chooses NN-size, activation function, and initial weights. Initial weights can also be random. Proper training provides an optimal selection of weights at interconnects and biases [15]. The optimal NN output is a weighted sum of inputs and functions of inputs. A reward-based NN training optimizes weights [16], [17], [18]. The reward is calculated based on the performance of NN through a cost function in each cycle. Therefore, we need to design a cost function considering the purpose, quality criteria, critical situations, and the convergence of optimization [19], [20], [21].

This paper proposes an optimal PI construction technique considering different aspects of the recently proposed NN-based direct PI construction techniques. The direct construction of PIs from the NN result in a sharp PI for any probability distributions. Probability distributions can be skewed-Gaussian, log-normal, multimodal, etc. We discuss several direct PI construction techniques in the literature. The relative performance of these techniques is questionable. Therefore, we comprehensively investigate these cost functions. An individual algorithm may perform as the best with a certain dataset during its proposal. However, the result may vary with different datasets. Therefore, we discuss the philosophy of developing cost functions and provide a novel one. The proposed cost function combines the important philosophies of recently proposed NN-based direct PI computation methods. The paper presents a rigorous performance analysis for the wind power generation and electricity demand data. Later, we apply the method for a day ahead temperature forecast. The method is also applicable for the UQ of several other datasets, such as electricity prices, and other renewable generations. The improvements in the convergence for electricity demand, hydro, and solar generations are also analyzed. Moreover, weather, geographical positions, and various human-made events can be the input of a NN. We cannot express the effect of many events to the output through mathematical equations. NNs can find all of those hidden relations with reward-based training.

Table 1 presents popular uncertainty quantification methods, their considerations, nature of their cost functions, and the convergence probability of the simulation. The point prediction algorithm with an RMSE value indicates an overall error value. However, the error value may change based on the input combination. Even the delta method is not considering heteroscedastic uncertainty. Other traditional methods consider the heteroscedasticity. However, they do not consider a probable asymmetric probability distribution. Traditional methods are trained without the help of a cost function. Therefore, the convergence of NN training is 100% or very close to 100%.

LUBE is the very first cost function based direct NN training method for uncertainty quantification. It does not consider failure distance, the cost function is discontinuous, and the cost function is not customizable. Later, C. Wan proposes an improved method with a training convergence of 99.0%. The cost function of C. Wan’s method is continuous and penalizes both of a high and a low PICP. However, it is not smooth, and it is not customizable. L. G. Marin’s cost function is smooth and provides a convergence of 99.2%. However, it does not consider the failure distance and not customizable. Instead of the failure distance, this method considers the deviation from the mid interval. G. Zhang’s method considers the failure distance. However, the cost function of this method is discontinuous [1]. Therefore, we propose a cost function that considers heteroscedasticity, asymmetry, and failure distance. The cost function is also smooth and customizable.

Key contributions of this paper are as follows:

  • 1.

    The paper discusses the philosophies of currently available direct uncertainty quantification techniques.

  • 2.

    This paper proposes a smooth cost function for uncertainty quantification NN training.

  • 3.

    The cost function is customizable.

  • 4.

    The convergence of training is higher. Overall NN training time is reduced.

  • 5.

    This paper demonstrates the effectiveness of the proposed cost function on wind power, electricity demand, and city temperature datasets.

These opportunities may help future researchers and engineers to train more robust NN and to obtain superior performances.

The paper is organized with the following flow of information. Section 2 presents the basics of the uncertainty quantification and the advantage of the NN-based direct uncertainty quantification. Section 3 presents all NN-based direct PI construction methods. Section 4 presents the proposed PI construction method. Section 5 reports the simulation results and performance metrics. Section 6 is the concluding section.

Section snippets

Increased uncertainty in power grid

All real-world events consist of sub-events, among which many are random. However, their combined effect can be interval predictable or even deterministic. When tossing a coin for a single time, the probability of getting the head or the tail is equal. However, the outcome of one hundred tossings of a coin is quite predictable. There is a 97.9% chance of getting 40 to 60 heads and there is 72.88% chance of getting 45 to 55 heads. Therefore, 20% region of the output range contains about 97%–98%

Direct NN-based PI construction methods

Different groups have proposed several direct NN-based UQ methods. The current section discusses relevant methods, those methods modify or propose a cost function to overcome a limitation.

Proposed method

In this paper, a smooth and customizable cost function is proposed for the uncertainty guided NN training. Uncertainty guided NNs predict the upper bound and the lower bound. Bounds are computed with NNs without any assumption on the distribution.

Studying motives of all proposed algorithms, the following key criteria of a good cost function are concluded:

  • 1.

    PICP, PINAW and PI normalized average failure distance (PINAFD) are important parameters of an ideal cost function. The consideration of the

Result evaluation

We download wind power generation and the electricity demand samples from August 2012 to August 2019 from the UK-grid website.1 Four recent samples and the time in the hour (TimeDay=hour+minutes60) on a corresponding day is provided as the input to the NN. Fig. 5 presents the structure of the NN with input–output combinations. We apply the simulated annealing technique for NN training [52], [53]. The initial optimization temperature is set to 5, the cooling factor

Conclusion

Uncertainty is inescapable. Uncertainty-aware decisions bring higher sustainability and profitability. The NN based LUBE PI construction method has achieved state-of-the-art performance in terms of narrow width and required PICP. However, there exist several issues. (i) NNs need to be re-trained for new types of signals, and the non-smooth LUBE cost function often fails to achieve an efficient uncertainty guided NN. (ii) The NN training needs to consider critical samples instead of considering

CRediT authorship contribution statement

H. M. Dipu Kabir: Conceptualisation, Formalisation of the research, Critical review of the methodology before experiment, Design of experiment, Writing, Participation in experiment, Data collection and analysis. Abbas Khosravi: Internal review, Supervision. Abdollah Kavousi-Fard: Preparing manuscript, Reviewing answers, Supervision. Saeid Nahavandi: Internal review, Supervision. Dipti Srinivasan: Internal review, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This research was funded by the Australian Research Council through Discovery Projects funding scheme (DP190102181).

References (58)

  • MagoulasG.D. et al.

    Effective backpropagation training with variable stepsize

    Neural Netw.

    (1997)
  • MarcelloniF. et al.

    Enabling energy-efficient and lossy-aware data compression in wireless sensor networks by multi-objective evolutionary optimization

    Inform. Sci.

    (2010)
  • BeltramiM. et al.

    A grid-quadtree model selection method for support vector machines

    Expert Syst. Appl.

    (2020)
  • KabirH.D. et al.

    Neural network-based uncertainty quantification: A survey of methodologies and applications

    IEEE Access

    (2018)
  • ShrivastavaN.A. et al.

    Prediction interval estimation of electricity prices using pso-tuned support vector machines

    IEEE Trans. Ind. Inf.

    (2015)
  • ClementsM.P.

    Evaluating the bank of england density forecasts of inflation

    Econ. J.

    (2004)
  • KendallA. et al.

    What uncertainties do we need in bayesian deep learning for computer vision?

  • Y. Gal, Uncertainty in deep learning, University of...
  • RoyS. et al.

    A new design methodology of adaptive sliding mode control for a class of nonlinear systems with state dependent uncertainty bound

  • LaptevN. et al.

    Time-series extreme event forecasting with neural networks at uber

  • H.D. Kabir, A. Khosravi, S. Nahavandi, D. Srinivasan, Neural network training for uncertainty quantification over...
  • M.R.C. Qazani, H. Asadi, S. Khoo, S. Nahavandi, A linear time-varying model predictive control-based motion cueing...
  • TaiY. et al.

    A high-immersive medical training platform using direct intraoperative data

    IEEE Access

    (2018)
  • S.M.J. Jalali, S. Ahmadian, A. Khosravi, S. Mirjalili, M.R. Mahmoudi, S. Nahavandi, Neuroevolution-based autonomous...
  • M.R.C. Qazani, H. Asadi, S. Mohamed, S. Nahavandi, Prepositioning of a land vehicle simulation-based motion platform...
  • CruzN. et al.

    Prediction intervals with lstm networks trained by joint supervision

  • H.D. Kabir, A. Khosravi, S. Nahavandi, Partial adversarial training for neural network-based uncertainty...
  • AiS. et al.

    Household power demand prediction using evolutionary ensemble neural network pool with multiple network structures

    Sensors

    (2019)
  • LuJ. et al.

    Mixed-distribution-based robust stochastic configuration networks for prediction interval construction

    IEEE Trans. Ind. Inf.

    (2019)
  • Cited by (0)

    View full text