Abstract

The main aim of this study is to introduce a 2-layered artificial neural network (ANN) for solving the Black–Scholes partial differential equation (PDE) of either fractional or ordinary orders. Firstly, a discretization method is employed to change the model into a sequence of ordinary differential equations (ODE). Subsequently, each of these ODEs is solved with the aid of an ANN. Adam optimization is employed as the learning paradigm since it can add the foreknowledge of slowing down the process of optimization when getting close to the actual optimum solution. The model also takes advantage of fine-tuning for speeding up the process and domain mapping to confront the infinite domain issue. Finally, the accuracy, speed, and convergence of the method for solving several types of the Black–Scholes model are reported.

1. Introduction

Both PDEs of the ordinary and fractional order play an important role in pricing of financial derivatives. PDEs of the ordinary order are the basis of various models proposed for pricing of different types of options. On the contrary, financial markets show fractal behavior [14] and fractional PDEs (FPDE) which can better reflect that the reality of them have gained a lot of interest recently. Hence, finding an accurate and efficient approach for solving both types is a critical issue in pricing.

The most famous PDE in finance is the Black–Scholes (B-S) model, which is broadly adopted for option pricing. So far, studies have presented different approaches for finding the numerical solution of this model and its variation when the exact form does not exist [513]. By using tick-by-tick data, Cartea and del-Castillo-Negrete [14] found that the value of European-style options satisfies a FPDE containing a nonlocal operator in time-to-maturity known as the Caputo fractional derivative. Furthermore, in order to depict the fractal structure in the financial market, the classical B-S equation has been generalized by means of fractional derivatives which have the property of self-similarity and better modeling of long-range dependency [1517]. Although the fractional B-S is a powerful tool for explaining the hereditary and memory characteristics of the financial market, it is considerably difficult to obtain an accurate solution for it, due to the memory trait of fractional derivatives [18]. As a result, numerous researchers have tried techniques for approximating such problems. Among different analytical models presented so far to solve time-fractional B-S, the most cited articles employed integral transform [15, 1923], wavelet-based hybrid methods [24], the separation of variables [25], the homotopy analysis and homotopy perturbation methods [2628], and Fourier Laplace transform [29]. However, due to the high computational complexity of these solutions, numerical methods are often better alternatives for solving such mathematical models.

Cartea and del Castillo-Negrete used backward difference technique together with shifted Grunwald–Letnikov scheme to solve the spatial fractional FMLS process [14]. Investigation of convergence analysis and comparison of the solution of three space fractional B-S modes were provided by Marom and Momoniat [30]. Finite differences’ methods were used to provide a solution for the time-fractional B-S model in 2013 by Song and Wang [31]. Zhang et al. also proposed a second-order finite difference method in the following year [32]. Moreover, in 2017, the weighted finite difference method was utilized to find the numerical solution of the model [33]. Chen et al. investigated predictor-corrector approaches for the solution of American option in 2015 [34]. Khan and Ansari [35] were the first to use the Sumudu transform to solve the fractional model of European options. In the same year, Zhang provided an unconditionally stable implicit numerical scheme for the model by changing the Riemann–Liouville derivatives to the Caputo derivative [32]. As methods based on the radial basis function (RBF) are widely used for approximating the ordinary PDEs, the meshless method was proposed for finding the solution of the time-fractional method as well [18]. In 2020, the fractional model was numerically approximated by using Quintic and sixth order B-spline functions as the basis for a collocation method providing high accuracy for the generalized B-S mode [36, 37]. In order to provide to sixth order accuracy in solving the generalized Black–Scholes model, Roul and Goura used both of the Crank–Nicolson scheme and sextic B-spline collocation method.

Regarding machine learning and ANN in particular, ANN has been traditionally used for predicting option prices [3843]; however, to the best of our knowledge except few research studies, there is no available literature for solving the B-S differential equation by ANN [44, 45]. Even these few research studies are limited to integer order PDEs, and there is no study for the solution of the fractional order B-S. In this paper, we are going to construct a 2-layered artificial neural network (ANN) to solve the Black–Scholes model of either time-fractional or ordinary orders. The Adaptive Moment Estimation (Adam), which has been specifically created to be used by neural networks, acts as the optimizer in the ANN to add the foreknowledge of slowing down when getting close to the optimal solution. To make the process faster and increase the efficiency of the method, fine-tuning is applied to the model. Also, to overcome the problem of the infinite problem domain for approximation domain mapping is used to map the whole problem to a finite interval. The rest of paper is organized as follows: Section 2 is dedicated to problem formulation, Section 3 explains the methodology, Section 4 presents the numerical results, and finally, the conclusion is provided in Section 5.

2. Problem Formulation

A put option on an underlying asset is said to follow a Geometric Brownian Motion (GBM), where , , and are the volatility, interest rate, and Brownian motion, respectively, if it obeys the stochastic differential equation as follows:

Using equation (1) and risk-neutral valuation formula together with the classic Feynman–Kac formula, the Black–Scholes operator is formed as below:where is the unknown function which determines the option price Robbins and Monro [46]. It has been specified that this option will have a certain payoff at a certain date in the future, depending on the value(s) taken by the stock up to that date.

It is well known now that a time-fractional Black–Scholes equation with the derivative of real order can be obtained to describe the price of an option under several circumstances such as when the change in the underlying asset is assumed to follow a fractal transmission system. Fractional derivatives, as they are called, were introduced in option pricing in a bid to take advantage of their memory properties to capture both major jumps over small periods of time and long-range dependencies in markets. Therefore, the fractional Black–Scholes model can be formulated as follows:where denotes the fractional derivative, is a real number, and , and are functions dependent on the values of , , and written using these notations for simplicity. It can be shown that the time derivative appearing in 3 equals the -order Caputo fractional derivative by Chen et al. [47].

Based on the type of the option, the corresponding condition set is as follows:

In some cases, e.g., European, moves toward infinity, thus the problem domain is semi-infinite. Some possible strategies are defined in Section 3.4 to overcome this obstacle when pricing. The corresponding parameters and their definitions are addressed in Table 1.

3. Methodology

In this section, four main concepts employed in the present approach are explained. Then, the necessary steps to be taken for finding the solution are combined to form the proposed method.

3.1. Time Discretization

Solving multivariable equations increases the time complexity and the risk of producing inconsistent answers by computational software. In summary, time discretization methods are useful tools for converting such models into a series of ordinary differential equations (ODE). This approach is the result of applying finite difference methods on one dimension of an equation to approximately calculate the value of derivatives with respect to that dimension.

Since here fractional equations are also investigated, ordinary and fractional time discretization approaches are discussed. Suppose that the PDE to be solved is defined on . Based on this method, in each time step is defined as , where is the answer, , and is the number of time steps.

3.1.1. Ordinary Time Discretization

Consider the PDE to be solved as follows:where is the answer and is a linear or nonlinear function. Instead of solving this problem on the two dimensions, it can be converted to a series of dependent ODEs.

By using the defined , the following equation is constructed:

The method is implicit, if . In this case, only posterior time step is used. The method is called explicit if where only computing time step is utilized to approximate the solution. If the value of is equal to , the method is the common Crank–Nicolson method which is unconditionally stable and of the second order in time, and it uses both posterior and computing time steps for approximating the solution of the model. Due to the nonsmoothness of the payoff function and the activation functions in our ANN, the Crank–Nicolson cannot reach its second-order convergence. It can also cause extra inconsistencies because of the same problem. Hence, from this point on, is considered.

3.1.2. Fractional Time Discretization

Suppose that the time-fractional PDE (FPDE) to be solved is as follows:where denotes the Caputo derivatives of the function and . As the first step for such FPDEs, the Caputo derivative should be discretized (readers are advised to see Hadian Rasanan et al. [48] for the preliminaries and thorough information about this type of derivative). Consider the following theorem.

Theorem 1. Suppose that is divided to parts with step size of , , and where , and the following holds for this interval:where .

Proof. See the proof in the work of Suna and Wub [49].

According to the above theorem, equation (3) can be discretized in the following form:

Now, the unknown function, , should be approximated in each time step such that it satisfies equation (3) and also its initial and boundary conditions in equation (4). In this regard, the boundary conditions can be satisfied by considering in computations. On the contrary, to satisfy the boundary condition, the sum of least square error methods is used in Section 4.

The remaining step is satisfying equation (3), so for this purpose, the cost function is chosen as below:where and is the th training data. To find the optimum weights for the network, this cost function should be minimized subject to , so the following nonlinear least square problem is obtained:

It is noteworthy that when , the ordinary time discretization method will be used.

It should be noted that the method is based on the discretization of Caputo-type derivatives, and the result based on other types is only possible if the discretization is possible for other types.

3.2. Function Approximation

According to the universal approximation theorem, every continuous function can be approximated by a feedforward neural network [50]. This theorem states that any linear function can be approximated by an ANN without any hidden layers. But for functions of higher orders, the approximation can be well-established, if the ANN has at least one hidden layer. To calculate the price of an option based on equations (2) and (3), the value of option at each time step using a 2-layered network should be approximated as follows:where is the number of neurons in the hidden layer, , , and , are the activation functions in the hidden layer, and is the activation function for the output layer. The above formula can be seen in Figure 1.

The nodes commensurate with the edges and in this network are called the biases whose inputs are unchangeable 1s, and their weights are additional parameters as means of adjusting the output of the next layer. In other words, they help the network fit best for the given data.

The most famous activation function in deep learning and neural networks is sigmoid.The two main reasons for the broad application of the sigmoid function are the straight-forward calculations of its derivatives and its bounded value. The sigmoid function is defined as follows, and its values are in :

These features make it a perfect candidate for problems that produce probabilities and for the Black–Scholes model. The values of options are nonnegative, and the effect of other parameters will not increase the calculations as they are multiplied by smaller values produced by sigmoid. On the contrary, the derivative of this well-known activator, its slope, is easily calculable between any two arbitrary points:

Although linear functions such as identity are not favorable for hidden layers as they take away the chance to generalize and adapt from the network, it is possible to use them as the activator of the output layer; hence, the hidden layers are present and are directly interacting with inputs.

3.3. Fine-Tuning

One of the key factors in the present study is the possibility of applying the fine-tuning methods during the training process. Fine-tuning is employing a previously trained neural network to find the solution of a new similar task. This process is normally applied to datasets related to images and voices. However, following the same approach, it is possible to increase the accuracy and speed of the network in this work.

Building and validating an ANN from scratch can be a huge task in its own right, depending on what data being trained on it, and many parameters such as the number of layers and the number of nodes in hidden layers, the proper activation functions, and learning rate should be found through trial and error. If a trained model that already does one task well exist and that task is similar to ours in at least some remote way, then everything the model has already learned can be taken advantage of and applied to the new specific task. If the task is completely similar, like what we are facing when solving the problem at different time steps, the exact weights can be used as the initial values. If the models are somewhat similar, still some knowledge exists on the previous network which is notable for speeding up the process of building/modifying and training the network for the new task. Then, the only job remains for the network is learning the new features and properties that were not available in the former task. Here, once the network is trained for the first time steps, the obtained parameters, weights, and biases can be effectively reemployed for training the data fed to the network in other time steps. The approximated solution and its convergence rate are compared in Section 5.

3.4. Domain Mapping

Considering the vulnerability of neural networks due to the bounded domain of their activation functions in calculations on infinite domains, the domain mapping approach is utilized to shift the problem from its semi-infinite domain to a finite interval. This helps to prevent the error caused by common solutions such as truncation of the domain.

On finite domains, will be considered, but in semi-infinite domains, transformation formulas should be used for shifting the problem to a desired finite one. Here, is used to shift the problem’s domain which is to , in which is the characteristic length of the mapping Boyd [51].

Here, is chosen in a way that of all training points stand before the mapped strike price because the price significantly differs from zero in (Rad et al. [52]). It means that, by defining as an indicator for 0.6, in the view of the fact that these points are equidistant, is computed as follows:

First, let us introduce the following notations:

Hence, the derivatives needed for the calculations according to Section 2 are

By substituting equation (17) in equations (3) and (2), the domain of the obtained Black–Scholes model is .

Since the transformation is applied to the whole problem, the payoff function and boundary conditions of equation (4) should be changed as well.

3.4.1. Discussion

Using domain mapping helps to approximate the answer on the whole interval of the problems. On the contrary, the number of training data points needed for solving the model on smaller intervals is significantly less as can be seen in Section 5. However, it can decrease the accuracy since the whole domain is being compacted into one small domain and loss of information may occur, meaning that truncating the domain causes a perfect approximation on a subdomain of problems but with acceptance of a bit of loss in accuracy, the whole domain can be covered. So, there lays a trade-off between these two methods.

3.5. Adaptive Moment Estimation Learning

Regression modeling is used to determine coefficients of mathematical functions based on empirical data. The method of least squares determines the coefficients such that the sum of the squares of the deviations between the data and the curve fit is minimized. Finding a satisfactory solution to nonlinear least square problems is one of the famous topics among scientists who work on nonlinear systems of equations. For minimizing a vector function, , that is, matrix A is defined as , and with respect to a predefined , that is to say, is found in a way thatin which

Several methods have been introduced for solving this nonlinear least square model so far. As we can see the same vector can be constructed when solving differential equations, similarly, for the Black–Scholes models, in each time step, the final goal is finding the proper network weights for solving the following system of equations:

The best possible solution to this system of equations is calculated when the sum of the squares of these equations gets smaller. Therefore, the problem is converted into an optimization problem. By minimizing the following equation, the appropriate weights and biases for our network are found:

Gradient descent is the most famous iterative algorithm employed as a learning paradigm to solve regression problems. In , after initializing the weights, the gradients, , of the cost function is calculated. The cost function is the sum square error of the output based on the desired output for each member of the training dataset. Then, based on , the weights become updated . This process is repeated by considering the new values as the initial ones until the cost function is desirably minimized. is known as the learning parameter and is used to balance the rate of increase or decrease in each iteration. It should be noted that the only constraint on this value is . Various methods and theorems are introduced to find the boundaries of this value according to the problem. Some suggest using a big value and decreasing its value as the result approaches the correct value. In contrast, some others suggest choosing a very small value and then increasing it exponentially in time when the correct direction is found. But generally, they all prefer making this parameter a function of time, while none of them propose a single formula to calculate the exact amount of it for the best approximation [46, 53, 54].

GD family has different optimizers such as Stochastic Gradient Descent (SGD), Adaptive Moment Estimation (Adam), Root Mean Square Propagation (RMSprop), and Nesterov Accelerated Gradient (NAG) which are mostly used in deep learning because of their speed and strength according to various control parameters such as the size of the training datasets and the pattern in which the training data is scattered.

In GD for updating only one parameter, all available samples in the dataset should be visited; however, in SGD [55], minibatches, which are small subsets of the whole dataset, are used to update a single parameter. For relatively large datasets, this causes the algorithm to converge faster. GD is an actual optimizer trying to find the exact gradients while in SGD; as explained, the algorithm only approximates the gradients and not the precise value. Since SGD fluctuates a lot, due to frequent updates with high variance, it shows a paradoxical behavior. It can explore new and potential directions to find the minimum but, at the same time, this behavior puts the network in danger of completely missing the local or global minimum.

A solution was proposed by the father of propagation, Geoffrey Hinton Tieleman and Hinton [56]. This thorough study which has not been academically submitted or published gained a lot of attention. The proposed algorithm fights the possibilities of vanishing or exploding the gradients. In other words, RMSprop normalizes the gradient using a moving average of squared gradients. This normalization balances the step size, reducing the step size for large gradients to avoid exploding and increasing it for small ones to avoid vanishing. Since this approach uses the exponentially decaying average, it is related to the most recent gradients, so the past gradient would not play a great role in updating the parameters. This leads to slow changes in the learning rate; however, it is relatively faster than GD.

So far, it can be seen that RMSProp and SGD are the best options. Adam is an adaptive algorithm which is generally considered as the combination of these two paradigms with momentum. This methodology has been specifically created to be used by neural networks.

Like RMSprop, Adam employs squared gradients to modify the learning rate. Also, the first and second moments are utilized using the moving average of the gradient such as SGD. However, a specific learning rate is calculated for each network parameter (weights) using two hyperparameters. Here, a summary of how Adam optimization works for the present model is provided. For the complete explanation, readers are encouraged to study Kingma and Ba [57]. The convergence of the method has been described in several great papers. But finally, all of the studies confirm the convergence proof provided in the first papers [58].

All computations are done using autograd package of Python. In this package, Adam optimization is implemented as follows:def adam (grad, , callback = None, num_iters = 100,step_size = 0.001, b1 = 0.9, b2 = 0.999, eps = 10−8):for i in range (num_iters):if callback: callback First moment estimate. Second moment estimate. Bias correction..return x

As the name of the method describes, it is derived from adaptive moment estimation. th moment of a random variable is defined as the expected value of that variable to the power of n:where shows the th moment and is a random variable. The gradient of the cost function of the neural network can be considered a random variable since it usually evaluated on some small random batch of data. The first moment is mean, and the second moment is uncentered variance. To estimate the moments, Adam utilizes exponentially moving averages and computed on the gradient evaluated on a current minibatch:where and denote the moving averages and is the gradient of the current data presented to the network. According to Kingma and Ba [57], which is also mentioned in the above snippet from autograd package, the values of hyperparameters and have two default values of 0.9 and 0.999, respectively. While the authors did not discuss the choosing process of these two variables, all studies reported very promising and, in most cases, perfect estimations using these two default values (see also [59]). The vectors of moving averages are initialized with zeros at the first iteration.

The remaining problem with these moments was being biased towards zero since and are initialized as vectors of 0’s. In other words, especially during the initial epochs and when the decay rates are small (i.e., and are close to 1), the values of and will not change significantly or even at all. So, the authors proposed the following bias corrections in order to surmount this obstacle:

Now, for each of the parameters (weights), a specific updating rule can be created:where is an control parameter preventing the fractional part from producing a division by zero error.

Different scientific studies have shown that Adam outperforms other methods. According to empirical practices, this method has better performance and accuracy. This is also discussed in Section 5. One problem that is stated by many studies is the convergence of the method. However, Kingma and Ba [57] provided the analysis for the convex problems; other papers argued the convergence of the method on a few nonconvex problems. And, with some modification, they finally agreed on its usability.

In [60], the full analysis of the convexity of the Black–Scholes model is proposed. Due to differences such as the failure of put-call parity in real markets instead of theory, this paper proves that, for all American options, they preserve their convexity in bubbled markets as well as nonbubbled ones. They showed that European options are convexity preserving only for bounded payoffs. Thus, in this respect, the prices of American options are more robust than their European counterparts. In the same study, it is shown that models for bubbles are convexity preserving for bounded contracts. More precisely, consider , and let and be the option prices such that their corresponding volatilities are nonnegative and which satisfy .

Theorem 2. Assume that is concave. Then, is concave in for any . Moreover, the option price is decreasing in the volatility, that is, for all . Similarly, if is convex and bounded, then is convex in x for any . Moreover, the option price is increasing in the volatility, that is, for all .

The full proof is available in [60]. As they mentioned, the proof is valid under the assumption of the uniqueness of the result for such an option, which is proved in their thorough study on properties of Black–Scholes models in more realistic markets as well.

4. Numerical Results and Discussion

In this section, three test examples with exact solutions are chosen according to the previous works for examining the accuracy and efficiency of the proposed ANN. According to what is mentioned in Section 3.5, this approach is applicable to other kinds of options such as barriers and American options. All computations are performed using software on a CPU machine with of memory. Only one hidden layer with 20 hidden neurons is used in all of the samples. The initial weights are scattered in . The number of epochs for the first time stamp is 5000, and for the rest of the steps, thanks to fine-tuning, decreasing this number to provides very promising results. All values in for the first time-step iterations and all values in do not lead to overfitting/underfitting, but the best results in our experiment achieved using the stated values. If the learning rate is low, then training is more reliable, but optimization will take a lot of time because steps towards the minimum of the loss function are small, and on the contrary, if the learning rate is high, then training may not converge or may even diverge. Weight changes can be so big that the optimizer overshoots the minimum and makes the loss worse. In this research, best values for the learning rate are found using the genetic algorithm.

Example 1. Let us consider a European call option, in which its interest rate, volatility, and strike price are 0.05, 0.2, and 10, respectively. The governing equation is similar to equation (31), and the boundary conditions set is as follows:With the maturity of 1, in years, the approximate price at is shown in Figures 2(a) and 2(b). The exact solution of this call option can be obtained using the following analytical solution denoted by :When truncating the domain at 15, the cardinal of the training dataset is 150 containing equidistant points scattered between and . The number of time steps, , is 20, and the average calculation time per each epoch on the abovementioned configuration is . Figure 2(a) demonstrates that when the problem domain is truncated, the approximate price is behaving fine until it reaches the truncation point which is 15 in these examples.

To solve this issue, the problems are mapped to using Section 3.4; then, is computed using the proposed ANN and sigmoid functions as the activation functions; then, is reverted to the original model’s domain using the inverse mapping function so that is calculated. Only 10 equidistant points are used as training points in this case, and the logarithmic absolute errors obtained from two approaches are compared in Figure 3. It should be noted that, after the truncation point, the error increases rapidly for the first approach, but when truncating the domain, the overall error is higher at the beginning of the interval but it remains steady and even falls at the end of the domain. Since the mapping function converges to infinity on , the numerical calculation on software such as Python will not be able to perform the calculations. So, these comparisons are done using a very big value for . Figure 4 confirms the fact that the Adam optimizer performs better than the other two optimizers, as it starts to converge and moves towards the answer in earlier epochs for the first time step. The average calculation time for SGD and RMSprop are, respectively, and per epoch. It is noteworthy that RMSprop crashed due to overflow encounters, and the depicted figure is just for comparing, with the learning rate of 0.01 instead of 0.03 which somehow might make the comparison unreliable. But the point is observed, and this method fails in comparison to the other methods for solving this type of the Black–Scholes model.

Figures 5(a) and 5(b) illustrate the incredible influence of fine-tuning on the objective convergence.

Example 2. Consider the following fractional model of a European option with homogeneous boundary conditions as follows:The interest rate and volatility are 0.05 and 0.25, respectively. Other variables are calculated based on these two variable, , , and . The fractional order of the equation is . Also,The exact solution for this equation is formulated as below:The number of time steps, , is 10, the number of hidden neurons in this example is 6, and the average calculation time per each epoch on the abovementioned configuration is . Also, . The approximate solution calculated using the Adam neural network is plotted in Figure 6. However, smaller number of training data points, 20 and 40, still produced errors in , and 60 equidistant points are used as training points in this case to reach the logarithmic absolute errors illustrated in Figure 7(a), giving us the freedom to increase the accuracy even more. Because the number of epochs in this example is very small, the absolute errors obtained from SGD, RMSprop, and Adam are compared in Figure 7. Here, it can be seen that, with a small number of neurons and training points, the accuracy of the model is more promising than the other optimizers. The average calculation time for SGD and RMSprop are, respectively, and per epoch.

Figure 8 illustrates the incredible influence of fine-tuning on the objective convergence. When fine-tuning is used, the objective function starts with a very small value, and hence, it converges rapidly even for very small values of the cost .

Example 3. Let us consider a European put option, in which its interest rate, volatility, and strike price are 0.05, 0.2, and 10, respectively:With the maturity of 1, in years, the achieved result is shown in Figure 9. The exact solution of this put option can be obtained using the following analytical solution denoted by :Since the problem domain is unbounded according to the boundary conditions, when truncating the domain at 15, the cardinal of the training dataset is 110, containing the equidistant point scattered between and . The number of time steps, , is 10, and the average calculation time per each epoch on the abovementioned configuration is . Increasing the number of data points will increase the accuracy for this configuration slightly (other parameters might need to be adjusted as well); however, we preferred this size to reduce complexity and memory usage.
In Figure 9(a), it is shown that when the problem domain is truncated, unlike Example 3, the approximate price is behaving fine throughout the whole unbounded interval. But this does not state that this behavior is the expected behavior of the option considering that the boundary condition makes the option price move towards zero. To make it clearer, the errors for truncated and mapped approximate solutions are compared in Figure 10. In Figure 10(a), the absolute error before the truncation point is relatively better than Figure 10(b). However, after the truncation point, it starts to increase and then again flattens out which is predictable according to the boundary condition of this specific function. In other words, this behavior cannot be generalized to other options as well because farther points are outside the training dataset and the network cannot learn their values. So, the preferred way is employing a mapped domain with lower accuracy but more stable behavior. Besides, only 10 equidistant points are used as training points in this case.
Figure 11 shows the superiority of the Adam optimizer as it starts to converge and moves towards the answer in earlier epochs for the first time step. The average calculation time for SGD and RMSprop are, respectively, and per epoch. Figures 12(a) and 12(b) illustrate the influence of fine-tuning on the objective convergence.

Remark for future work: this paper did not cover the application of the method on space fractional or time-space fractional equations [61]. As the sigmoid functions are not fractional, the end result will lack the proper accuracy; therefore, readers are encouraged to use other basis functions such as fractional Chebyshev functions to approximate space fractional models.

5. Conclusion

This study investigates neural networks with the famous Adam optimizer for solving financial Black–Scholes equations. Converting the PDE into a series of time-dependent ODEs using the backward Euler finite difference method and then solving each of these equations using the proposed model confirm the satisfactory result and fast calculation of the method. The speed of the method is caused by the parallel computations in the neural network for each independent neuron, the straightforward calculations of sigmoid activation functions that do not add to the complexity of the model, and also the small number of training points and hidden neurons for achieving very promising accuracy. The neural network outperforms other methodologies regarding the consistency and accuracy of the model in confrontation with machines or calculation mistakes because of its fault tolerance. Fine-tuning plays a significant role in this method by reducing the building, validation, and calculation time. It also helps the method converge faster by finding the appropriate direction for gradients as depicted in three examples in Section 4. Domain mapping, which has not been used in ANNs before to the best of found knowledge, is employed to make calculations possible on bigger or infinite problem domains. As a result of combining these approaches into one single ANN, reliable, fast, and accurate results were calculated. The methodology is applicable to other types of options priced by either ordinary or fractional models as well as other partial differential equations in any other field of study that can be solved using this network.

Data Availability

The data used to support the findings of the study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors thank Prof. John P. Boyd (University of Michigan) for his thorough guidance and answering our questions about the methods borrowed from his articles to apply on the problems of the present work.