Accelerated version of NUBEAM capabilities in DIII-D using neural networks

https://doi.org/10.1016/j.fusengdes.2020.112125Get rights and content

Highlights

  • Neural network version of NUBEAM, referred to as NubeamNet, was trained for DIII-D.

  • NubeamNet reproduces NUBEAM's predictions with a high level of accuracy.

  • NubeamNet exhibits execution time orders of magnitude faster than that of NUBEAM.

  • NubeamNet enables use of NUBEAM in control applications demanding fast calculation.

  • Potential control applications include scenario optimization and real-time control.

Abstract

A neural network model of the effects of neutral beam injection on DIII-D has been developed. The training and testing data used by the model have been generated by the NUBEAM module of TRANSP for experimental discharges from the 2018 DIII-D campaign. Using a principle component analysis to reduce the dimensionality of profile data, the model has been shown to reproduce the results of the Monte Carlo code NUBEAM with a high level of accuracy and an execution time orders of magnitude faster than the execution time of NUBEAM. This makes the neural network model uniquely suited to applications in model-based scenario planning (off-line) and active control (on-line), where a large number of simulation runs are required by the associated optimization tasks that need to be performed before and during the discharge.

Introduction

In order to maintain stability and maximize performance in tokamak plasmas, the spatial distributions of densities, temperatures, current, and momentum, among other factors, must be carefully controlled. The evolution of these profiles is described by a system of nonlinear partial differential equations, which in many applications cannot be modeled from first principles due to the extreme calculation power that would be required for their solution. Reduced physics-based models such as TRANSP [1], [2] are commonly used instead in both analytic and predictive capacities. However, this type of physics-oriented model is still too time-consuming to be useful for certain applications such as between-shots discharge planning and real-time control. Because of this, a different set of control-oriented models with significantly faster calculation times is needed. Fast models of the plasma response to different types of actuation, such as neutral beam injection, are especially necessary for model-based control design. One approach to developing control-oriented models is to use empirical scaling laws [3]. These empirical models can easily achieve a calculation time fast enough to be useful for control applications, but may only be valid for specific plasma scenarios, which can lead to a considerable decrease in accuracy as plasma operation deviates from the reference scenario.

Another approach to building control-oriented models is to train machine learning algorithms such as neural networks [4] to reproduce the function of interest. Neural networks are based on the idea of building a mathematical model that operates in a similar way to the nervous system in order to reproduce some function. They have been proven to be capable of learning any function, no matter how complicated or nonlinear, given adequate training data and at least one hidden layer with enough neurons in that layer [5]. Each neuron in the network receives input signals from other neurons, processes them, and outputs a different signal. In a multi-layer perceptron (MLP) neural network, as illustrated in Fig. 1, multiple neurons are arranged in layers, and the signals move in a single direction from the inputs to the outputs. In this diagram, x1, x2, and x3 make up the input layer, f1 and f2 make up the output layer, and in between there is one hidden layer. Each neuron in the hidden and output layers has a value determined by its inputs, the weights of each connection learned during training, and its activation function. Activation functions introduce a nonlinear component to the neural network calculation, which allows the network to learn nonlinear functions. Typical activation functions used for a regression problem are the rectified linear unit (ReLU), which is equal to zero for negative inputs and equal to the input for positive inputs, for neurons in the hidden layers. For neurons in the output layer, a linear activation function is used. When a neural network is initially created, the weights associated with each connection between neurons are randomized. During the training process, the network is given a set of inputs with known outputs. For each input in the training set, the network predicts an output and compares it to the correct output using a loss function chosen by the developer which quantifies the error between the prediction and the true output. The network tracks the trend in the loss function between each update of the weights to determine if the training is moving in the right direction. Loss functions can be as simple as a mean squared error between the predicted and true outputs, or can be significantly more complicated and tuned to the specific system. A gradient descent algorithm is then used to update the weights with the goal of minimizing the loss function. Training ends when the value of the loss function is satisfactorily low.

Recently, neural networks have been created to replicate the results of a number of physics-based plasma models, achieving significantly reduced calculation time without major sacrifices in accuracy [7], [8]. These neural-network models can then be integrated into predictive codes [9] to evaluate the state of the plasma much faster that could be done using the original models. Specifically, a neural network version of the TRANSP Monte Carlo neutral beam module NUBEAM [10], [11], referred to as NubeamNet, was developed for the National Spherical Torus eXperiment Upgrade (NSTX-U) [12]. The neural-network model can predict the effects of neutral beam injection on the plasma in a fraction of the time NUBEAM requires, with similar levels of accuracy. This work aims to continue along these lines by creating a real-time capable version of NUBEAM which is valid for DIII-D. The DIII-D neutral beam system contains eight independently modulated neutral beam sources that can provide a combined power of up to approximately 20 MW. In TRANSP, each one of the powers associated with the two 150° off-axis beams in DIII-D is divided into four different components in order to account for different tilting configurations. Because of this, NUBEAM uses fourteen NBI power inputs instead of the eight physical neutral beam powers. Therefore, the neural network developed in this work also uses these fourteen powers as inputs. It is anticipated that this model will aid in optimal scenario planning and iterative control algorithm design, as well as other control applications including real-time control, estimation and forecasting in DIII-D [13], [14], [15].

Reduced analytical models [16] do exist as an alternative to this neural network model. However, reduced analytical models may require more significant assumptions to be made about the system, such as taking into account finite-orbit-width effects using an orbit average of the beam deposition. The neural network approach does not require any simplifying assumptions to be made beyond those used in NUBEAM. Instead it relies entirely on data, and is only valid for inputs within the range of its training data. Moreover, available analytical models do not calculate torque, which would be needed for certain model-based control applications such as rotation control.

This paper is organized as follows. In Section 2, the development of the dataset used to train and evaluate the model is described. In Section 3, the method used to determine the topology of the model is explained. In Section 4, predictions are shown using data that was not used in the training or parameter tuning stages. In Section 5, conclusions and plans for future work are discussed.

Section snippets

Dataset development

One of the advantages of using neural networks to model a system as opposed to highly reduced physics-based models is that neural networks make no assumptions about the structure of the model. Given sufficient training data, they are able to learn and reproduce highly nonlinear relationships. However, due to their complete reliance on data, they are unable to extrapolate for inputs outside of the range seen during the training process. While neural networks are incredibly capable of producing

Determination of model architecture

The neural networks in this work use a fully connected structure, meaning that each node feeds into every node in the next layer. As stated in Section 1, when building a neural network, the weights of each connection between nodes are initially randomized, and adjusted through the training process. In order to account for the uncertainty caused by the initial randomization, five separate neural networks were trained in parallel with different initial weights as illustrated in Fig. 4. When

Model evaluation

Results from the model as described in Section 3 are shown in this section. Fig. 9 shows predictions of (a) current drive, (b) fast ion pressure, (c) beam heating to electrons, and (d) neutron rate plotted vs. time. These results are from TRANSP run 175282T01. This run is from the testing dataset, so the neural network did not see it during training. In the plots, the red dashed line shows the NUBEAM data, the dark blue solid line shows the average prediction across all five trained neural

Conclusions

A neural network model for calculating the effects of neutral beam injection in DIII-D has been developed. The model was trained on simulation data from NUBEAM derived from TRANSP runs for shots from the DIII-D 2018 campaign. The speed of predictions makes it useful for applications demanding fast simulations such as optimal scenario planning between discharges. The calculation time also suggests that the neural-network model may be useful for real-time control applications such as feedback

Authors’ contributions

Shira Morosohk: Methodology, Software, Investigation, Writing – Original Draft

Dan Boyer: Conceptualization, Methodology, Software

Eugenio Schuster: Conceptualization, Writing – Review and Editing, Supervision

Conflict of interest

The authors declare no conflict of interest.

Declaration of Competing Interest

The authors report no declarations of interest.

Acknowledgment

This work has been supported by the U.S. Department of Energy, Office of Science, Office of Fusion Energy Sciences, under Award DE-SC0010661, and by the National Science Foundation Graduate Research Fellowship Program under Grant No. 1842163.

References (21)

There are more references available in the full text version of this article.

Cited by (0)

View full text