Abstract

Estimating models are becoming increasingly crucial in highlighting the nonlinear connections of the massive level of rough information and chaotic components. The study demonstrates a modern approach utilizing a created artificial neural network (ANN) method that may be an alternative strategy to conventional factual procedures for advancing rainfall estimate execution. A case study was presented for Turkey’s Düzce and Bolu neighboring territories located on the Black Sea’s southern coast. This study’s primary aim is to create an ANN model unique in the field to generate satisfactory results even with limited data. The proposed technique is being used to estimate rainfall and make predictions regarding future precipitation. Bolu daily average rainfall by month data and a limited number of Düzce rainfall data were used. Missing forecasts and potential rainfall projections will be examined in the fundamental research. This research further focuses on ANN computational concepts and develops a neural network for rainfall time series forecasting. The emphasis of this study was a feed-forward backpropagation network. The Levenberg–Marquardt algorithm (LMA) was implemented for training a two-layer feed-forward ANN for the missing rainfall data prediction part of this research. The inaccessible rainfall parameters for Düzce were determined for the years 1995 to 2009. From 2010 to 2020, a two-layer feed-forward ANN was trained using the gradient descent algorithm to forecast daily average rainfall data by month. The findings reported in this study guide researchers interested in implementing the ANN forecast model for an extended period of missing rainfall data.

1. Introduction

Data-driven approaches are extensively employed in many fields, including meteorological studies and environmental engineering, and a particularly favored method of modeling data processing is the artificial neural network (ANN) [1, 2]. The use of mathematical models to achieve findings with limited data improves scientific analysis advancement [3]. Artificial neural networks have many benefits, such as eliminating the difficulties of modeling nonlinear mathematical systems [4, 5]. The architecture that predicts the best outcome has been calculated by adjusting the intermediate neuron and activation functions [6, 7]. According to Rao et al. [8], the trial and error approach is the best method for estimating the layer and neuron configuration in the artificial neural network. The intermediate layer change and the neuron numbers in this layer are modified to determine the neural network’s precision.

Many scientific disciplines now have accurate forecasts due to advances in artificial neural network (ANN) methodology [9, 10]. It has numerous applications, including engineering, environment, meteorology, space and aviation, communication systems, health, informatics, and research and development. Forecasting models are built using linear and nonlinear statistical techniques such as regression analysis and ANN [11]. With several variables and a vast volume of input data, prediction efforts could be simplified. Many environmental studies make use of the ANN methodology. Chattopadhyay et al. [12] used measured independent parameters of nitrogen oxides (NOX), temperature, sulfur dioxide (SO2), and particulate matter (PM10) to predict surface ozone (O3) gas concentrations in Gangetic West Bengal. Al Omar et al. [13] used a multilayer perceptron (MLP) based predictive model on finding multihours ahead surface ozone (O3) concentrations in Ontario, Canada.

Lack of correlation between the observed input and target data means that the ANN analysis may fail to produce the desired results. The regression analysis has been used in a variety of other studies and disciplines. The regression model’s primary goal is to find a trend line that better represents the data set. Park et al. [14] implemented regression analysis and ANN models to forecast particulate matter concentration 10 µm (PM10) in the ambient air. Gualtieri et al. [15] also created regression and ANN models to predict fine particulate matter concentration and traffic-oriented nitrogen oxides (NOX). Compared to regression analysis results, ANN analysis has consistently produced more successful outcomes [14, 15].

Many studies have been conducted to predict lost and potential hydrological data values in environmental and civil engineering studies [16]. In their research, Pakdaman et al. [17] and Kashiwao et al. [18] stated that short-term rain forecasts are required for specific aspects of environmental water resources, such as evaluating the potential for flash flooding and real-time management of urban runoff systems. Hossain et al. [9] researched in Western Australia to forecast long-term seasonal rainfall. Rain, air temperature, specific humidity, relative vorticity, the vertical component of the wind, and moisture divergence flux were chosen as input data for the trained ANN to generate rainfall forecast [9]. The only output was rainfall in Western Australia using the qualified input data, and good results were obtained. Akıner and Akıner [19] conducted water quality simulation in Lake Sapanca by adopting an ANN technique to point out the main threats for superficial water quality deterioration.

It is well known that the ANN methodology produces excellent results with a large number of data points, but it is unknown whether the ANN results will produce acceptable, reliable results when projecting the past and future with limited observed data. This study’s main purpose is to determine missing temporal data from the past and then make numerical predictions for the future using the ANN technique’s cognitive skills and extreme learning ability, even with a limited amount of rainfall data. The ANN model’s performance was assessed using measured rainfall data from a neighboring area, Bolu, which has more convenient temporal rainfall data, and Düzce’s measured rainfall values. The results show that if the ANN architecture is correctly installed, satisfactory results can be obtained. As a result, it was assumed that we only have monthly observed data from January 2008 to November 2009 for Düzce, and the network was trained using data from the neighboring city of Bolu. Finally, the ANN methodology results were demonstrated using sufficient observed rainfall data accurately recorded for Düzce between 2009 and 2020.

2. Materials and Methods

2.1. Research Area

This research was carried out for the Black Sea cities of Düzce and Bolu in Turkey. Düzce City is a neighboring town to the City of Bolu. Düzce’s meteorological station is located at an elevation of 150 meters and coordinates 40° 50′ N and 31° 8′ E. The meteorology station in Bolu City is situated at the height of 740 meters; its coordinates are 40° 44′ N and 31° 36′ E. There is a 41-kilometer distance between the two weather stations. Figure 1 depicts the research area’s map, as well as the Düzce and Bolu meteorology stations.

This analysis’s primary goal is to generate regional quantitative forecasts of daily average rainfall per month using the ANN approach to approximate missing rainfall values. Similarly, potential rainfall quantities for the same city can be predicted using the same technique.

2.2. Data Collection

The Turkish State Meteorological Service (TSMS) provided rainfall data between January 1995 and December 2020 [20]. The measurements were made daily, and the weighted average of the measured values for each month was used in the analysis as a daily average rainfall per month in mm/day. However, there are some missing values in the meteorological data, especially for Düzce. On the other hand, Bolu has far better measurement data and far fewer unmetered records than Düzce. The ANN model was developed using extensive monthly data from the Bolu meteorology station from January 1995 to December 2009 and limited monthly data from the Düzce meteorology station from January 2008 to November 2009. Unfortunately, no measurements or data for Düzce were taken or recorded before 2008. This study focuses on Düzce as a research field. Consequently, it will be decided whether it is advantageous to use the ANN method when providing Düzce’s missing data.

2.3. Scenario-Based Approach

Typical applications envision training the network with large data sets and then making a forward projection. Is artificial neural network still a reliable methodology even though data sets are limited? The primary goal of the study is to find an answer to this question. The network was trained using small data sets from the past, and ANN was used to forecast the future under the scheme devised. A scenario was developed to achieve this goal, and it was assumed that we were still in 2009. It has been assumed that Düzce will require 26 years of rainfall data between 1995 and 2020.

On the other hand, it will be determined whether it is still possible to obtain this 26-year rainfall data set with minimal error. The goal is to achieve a high correlation with the created ANN model using only 23 months of rainfall data for Düzce. The rainfall data available from the neighboring city of Bolu between 1995 and 2009 was used to perform the training, validation, and test phases of the ANN methodology and train the network for this purpose. With its older and far more urbanized structure than Düzce City, the City of Bolu has made it possible to access older meteorological data. The meteorology station in Bolu provided fifteen years of monthly data from January 1995 to December 2009. However, rainfall data for the City of Düzce are only available for the 23 months between January 2008 and November 2009, based on available meteorological data until December 2009.

Indeed, the idea of using data from the city of Bolu to create an extensive temporal meteorological data set in the town of Düzce was influenced by the acceptable level of the correlation value obtained from the linear regression analysis of the two cities’ rainfall data. As a result, rainfall values from Bolu cannot be used in place of Düzce rainfall data. Furthermore, the function resulting from the linear regression analysis, when applied to a limited number of data points of 23 months, does not permit the creation of a rainfall data set of 26 years without the use of any other independent variable. Düzce and Bolu’s rainfall data are not interchangeable, but the rainfall characteristics of these two cities, which are 41 kilometers apart, are similar. Thus, ANN methodology is regarded as a technique that can generate the Duzce rainfall data set between January 1995 and December 2020.

Missing rainfall values in the City of Düzce were calculated using the artificial neural networks (ANN) model between 1995 and 2007 and December 2009. Furthermore, the potential rainfall values in the City of Düzce from 2009 to 2020 were projected using the same method. The ANN model was built using publicly available daily average by month rainfall data from Düzce and related temporal data from neighbor Bolu. Before training the network and conducting analysis through ANN, regression analysis should show a significant relationship between the input and target data. Hence, both data sets were statistically tested to see if there is a sufficient correlation to conduct the ANN analysis.

2.4. Artificial Neural Network (ANN) Model Configuration

The network setup was initiated after the data were decided and optimized for the artificial neural network. A multilayer perceptron (MLP) network can solve various engineering problems based on data as a feed-forward neural network class. An MLP is a type of ANN with layers of input, hidden, and output and is frequently used in time series forecasting [21, 22]. A feed-forward network’s weights and biases must be configured as small random values before training [23, 24]. The training data set reduces the error on the neural network output. All of the training algorithms exhibit the backpropagation, feeding the input, and updating the weight and bias values [25]. The Levenberg–Marquardt algorithm (LMA) is a combination of step reduction and Gaussian Newton algorithms. LMA was implemented to train a two-layer feed-forward ANN. LMA is a popular algorithm since it has a high success rate in first-order derivative approaches. It is widely used in artificial neural networks with backpropagation architecture [26].

The two-layer feed-forward neural network for the future rainfall forecasting phase was trained using a gradient descent algorithm. The Levenberg–Marquardt algorithm (LMA) blends Newton’s speed with the gradient descent method’s consistency, whereas backpropagation is a gradient descent algorithm. The surface is done parabolically at each iteration step of the LM learning algorithm approaching the error, and the solution is given at each iteration by the minimum gradient of the parabola. There are two implementations of the gradient descent algorithm: incremental mode and batch mode. During ANN training, weights and biases were adjusted to determine the network's global minimum of error.

In this study, the ANN architecture was deemed to produce the best performance in error reduction. The network’s optimal size was determined by adding and removing hidden layer neurons until the optimal neuron number satisfied the target training error tolerance. Previous researchers devised an equation to decide the neuron number at the hidden layer. Regarding the input neuron number (n) and output neuron number (m), the proposed number of neurons in the hidden layer varies between (2n + 1) and (2√n + m) [27, 28]. The best network architecture and the optimal neuron number were determined using a comprehensive trial and error stage. It is vital to choose the appropriate activation function such that the neurons in the neural network achieve the desired effects. The type of data used as input and the neural network’s purpose should be considered when selecting activation functions. When solving a nonlinear problem, using nonlinear activation functions produces better results [29]. The nonlinear model was used to forecast future rainfall. The neurons’ nonlinear transfer functions are sigmoidal functions that increase monotonically and are continuously differentiable.

For the missing records, on the other hand, the linear transfer function was used. The weights were adjusted iteratively based on the training set to minimize the error between the network output and the observed values. The intrinsic nonlinearity of ANN better explains dynamic meteorological phenomena than linear methods [30]. Overfitting model parameters to training data due to an excessive number of parameters or weights may result simultaneously in the training data set’s satisfactory performance [31]. The validation set controls the learning stage with the second data set, and an unbiased prediction of the generalization error to prevent overfitting is ensured by the third independent test set of data [32, 33]. Increasing the number of hidden neurons causes the target function to fluctuate, allowing the model to cope with the data’s volatility. Rainfall patterns are frequently influenced by seasonal variation. The learning rate is vital in MLP network training because it controls the weight changes at each iteration. According to Adamuthe and Vhatkar [34], a learning rate of 0.05 to 0.5 produces satisfactory results. As a result, the structure shown in Table 1 was chosen for the neural network used in this study.

Figure 2 depicts a scatterplot with a trend line connecting the predicted and observed values of missing rainfall for the train, validation, and test data sets used during ANN training.

The MLP network’s input matrix structure for further rainfall forecasting consists of five vectors with twelve elements, corresponding to five years and twelve months (see Table 2). For example, R05 in the matrix represents rainfall data from 2005, and subscripts represent each year. During the preliminary stage, observed rainfall data from 2009 (R09) were set aside as a target output for network training. At each step, the input matrix’s initial vector was continuously shifted, and the output vector produced at the former stage was simultaneously placed in the following input matrix.

Rather than using a single network that included all of the data from 2010 to 2020, smaller samples of time series were used to generate more accurate neural networks [35]. With 12 months, eleven networks were used, in other words, small size samples with each having 12 elements. The gradient descent algorithm was far better in terms of its performance than the Levenberg–Marquardt algorithm (LMA). For ANN training with the incremental mode, the linear and nonlinear transformations were used (see Table 3). By providing values between −1 and 1, the MLP network’s hyperbolic tangent function expedites weight learning more than the logistic function [36].

3. Results and Discussion

Figure 3 depicts a reliable correlation (R2 = 0.72) among the available limited 23-month-long data from neighboring cities between January 2008 and November 2009. Real-time rainfall forecasting was done using the established ANN model.

Since the linear fit correlation coefficient is 0.72, close to 1, the ANN model produces reliable prediction values with a high correlation value with such limited data. According to several research pieces in literature, this method is applicable even in such extreme cases compared to conventional statistical approaches, where the rainfall amount is much higher with a greater variation from month to month [37, 38]. The rainfall amount for the examined area of Düzce, Turkey, the rainfall amount is moderate together with a reasonable variation of rainfall values from month to month. This situation may be favorable in terms of the ANN analysis’s success. However, the main objective of this study is to create an artificial neural network (ANN) model, which is unique to the research field and gives successful results in the case of limited data; the ANN model can be used for rainfall forecasting and can provide a prediction about the future rainfall situation. Concerning the traditional statistical techniques, the ANN methodology also provides more reliable numerical results for Düzce, where the precipitation rate is moderate and has a low variation of rainfall values from month to month. Under these conditions, it is expected to achieve successful results in reaching the main objective, but it is vital to determine the appropriate network architecture and the ANN model’s training algorithm as in previous successful works [39].

MATLAB R2018b Deep Learning Toolbox [40] was used for neural network analysis, and graphical results were also presented for physical interpretation and discussion. For this study, two distinct ANN configurations were used. One was for predicting missing records, and the other was for forecasting future values. In some cases, the network’s inputs are overburdened. In these cases, the training process takes a long time. By removing data that does not contribute to network training, principle component analysis (PCA) can reduce input data [41, 42]. As a result, the correlation of input data with each other is avoided.

However, results did not change with or without PCA, and PCA was not required during this study. Without using PCA, the best possible outcome could be obtained. The ANN analysis reveals a linear relationship between rainfall amounts in Düzce and Bolu. When estimated Düzce rainfall records and observed Bolu rainfall records are scatter plotted, the fit to generated data is a linear polynomial, and there is a high correlation between them, as shown in Figure 4.

Throughout the first part of this study, the missing monthly rainfall amounts for Düzce City from January 1995 to December 2009 were predicted using the ANN model, and the results are shown in Figure 5. The stars represent the model’s forecasts, and the circles represent Düzce’s daily average rainfall records by month. Both units are measured in millimeters per day. During the ANN study, linear and nonlinear transfer functions were evaluated together, and the linear transfer function was favored depending on the mean square error (MSE). As a result, the linear function was deemed preferable for estimating missing rainfall data using ANN. The ANN model was used in the second phase of this research to predict potential rainfall records for the City of Düzce from 2010 to 2020.

Predicted values from the first phase of this research were used for this purpose. Figure 6 depicts Düzce’s observed and ANN predictions of daily average by month rainfall records between December 2009 and December 2020. Figure 7 illustrates the correlation between ANN model outputs and observed Düzce rainfall records. The coefficients of determination (R2) and correlation (R) were calculated to be 0.62 and 0.79, respectively.

The model’s performance in estimating missing values can be used to gauge the study’s success. The correlation coefficients for train, validation, and test results are 0.87, 0.92, and 0.93. Besides, the mean square error value was calculated to be 0.053 mm2d−2. The applied model’s satisfaction level can be determined by comparing these results to other studies [43, 44]. Products from similar papers [45, 46] were examined, and it is clear that the performance of the model used in this study is highly reliable among the minimal data.

4. Conclusions

ANN has been used to simulate dynamic hydrological processes as an essential alternative method and is commonly used for forecasting. This research’s aim has two components. The first step is to create an ANN model specifically for predicting missing rainfall records in Düzce using rainfall data from the neighboring city of Bolu. Before beginning ANN analysis, a relationship between meteorological events in both cities should be discovered. Statistical approaches such as regression analysis are the most straightforward and widely used method for this purpose. The ANN model may not produce the desired results if there is no relationship between the dependent and independent variables. A strong correlation between rainfall records from Düzce and Bolu was discovered using regression analysis. The best network architecture for missing value estimation was formed after a lengthy trial and error stage.

It was impossible to proceed to the second phase of this research to forecast rainfall data from 2010 to 2020 without completing the first phase since the first part’s data would be an input for the second part. The input parameters were five consecutive years of observed and forecasted daily average by month (mm/day) rainfall data. The year following those five successive years of input was the anticipated outcome. Projections were implemented using both linear and nonlinear transformation models. Throughout the first phase, the linear model yields more substantial results.

Because of the effectiveness of linear and nonlinear transfer functions in avoiding local minima, the use of linear and nonlinear transition functions together prevents the projected data from becoming trapped at the minimum peak values. According to the findings, Düzce’s monthly average rainfall level ranges from 0 to 4.5 mm/day. This study demonstrates that ANN is an excellent method for estimating long-term rainfall data even with few measurements.

The ANN model’s successful application for rainfall forecasting indicates that the methodology used in this study can also be used for future studies on extreme rainfall events and flood analysis predictions. People can be better prepared for potential meteorological extreme events in this manner. This paper’s findings suggest that it can be a valuable guide for evaluating rainfall prediction’s efficacy and reliability using the appropriately developed ANN models concerning network architecture and the implemented case-specific algorithms.

Data Availability

The data are available from the Turkish State Meteorological Service (TSMS), Meteorological Data Information Sales and Presentation System (MEVBIS), Meteorological Data for Düzce and Bolu, January 1995 to December 2020, Ankara, Turkey, 2021 (online) (available at https://mevbis.mgm.gov.tr/mevbis/ui/index.html#/Workspace).

Conflicts of Interest

The author declares that he has no conflicts of interest.