Physics-informed reinforcement learning optimization of nuclear assembly design
Introduction
The sustainability of the existing light water reactor fleet is one of the main missions of U.S. nuclear industry and Department of Energy. The existing fleet provides roughly half of all carbon-free electricity in the United States. However, the number of reactors online has declined in recent years, mainly driven by cost. Reducing the nuclear fuel cost is one way to improve fleet efficiency. The nuclear fuel designers dictate the number and attributes of an assembly in terms of its enrichment and burnable poison loading. In effect, fuel designers attempt to solve a “combinatorial optimization” problem by utilizing expert judgement, nuclear design principles, and physics-based tools. Combinatorial optimization (Korte et al., 2012) in nuclear reactor design and operation is a known problem that aims to find an optimal pattern from a finite set of patterns (Kropaczek and Turinsky, 1991). Indeed, the search space for combinatorial optimization is finite by definition, and thus an optimal solution always exists.
Nuclear fuel design involves two common problems: (1) core optimization and (2) assembly optimization. Core optimization aims at finding the best loading pattern of all assemblies in the core such that the reactor operation is economic and meets safety constraints (Kropaczek and Turinsky, 1991). Assembly optimization (the focus of this work) aims on finding the optimal material composition and location of all fuel rods in the assembly such that when the assembly is introduced in the core, economic and safety constraints are satisfied (Francois et al., 2003). A review of related literature on optimization techniques is included in Section 3.1.
For assembly optimization, unlike pressurized water reactors (PWR), boiling water reactor (BWR) designs feature more heterogeneous fuel enrichment distribution radially (Fensin, 2004), which will also be seen in this work. For PWRs, some utilities adopt optimization tools to find the most economic core design more rapidly, for example ROSA (Verhagen et al., 1997) (Reload Optimization with Simulated Annealing). However, such stochastic optimization (SO) based frameworks while fast for individual pattern evaluation, are often computationally expensive for finding high performing solutions, thus their commercial application has not found much adoption for more complex problems as in the case of BWRs. It is also worth mentioning that SO code packages such as ROSA leverage surrogate models to reduce computational burden and thus do not rely on licensed methodologies. Therefore, when the best design option is found by SO, manual tuning still needs to be performed by the licensed codes. Aside from classical SO, to the authors’ knowledge, there have been very limited attempts so far to investigate the performance of modern reinforcement learning (RL) algorithms (e.g., deep Q learning, proximal policy optimization) to support nuclear engineering decisions regarding fuel assembly optimization, either for PWR or BWR. RL algorithms would prove effective if they demonstrate promising performance in embedding domain or expert knowledge through reward shaping, in exploring the search space effectively, and in their ability to more effectively find a global optimum than standard SO in a problem with many local optima. Accordingly, we explore the ability to train an intelligent system by RL that is able to learn from interactions with physics-based environments and prior expert knowledge, such that it can take proper actions in a short amount of time to efficiently arrange and optimize nuclear fuel within the assembly. RL is compared to SO algorithms (i.e., genetic algorithms, simulated annealing), which act as baselines that have been widely investigated in literature.
In this work, we provide important definitions about the design of nuclear assemblies of interest in Section 2. The methodology is described in Section 3, which starts with a literature review of related work, followed by the optimization strategy and the process of building physics-based environments to facilitate RL and SO. Next, the mathematical foundation of RL and SO algorithms and their connection to the physics-based environment are described, followed by the code deployment. The results of this paper are presented in two case studies in Section 4 and Section 5, respectively. The first case study highlights a small and low-dimensional nuclear assembly (BWR 6 × 6) with global optima known beforehand using brute-force search, where RL/SO algorithms are assessed and compared to each other. Next, RL is compared to SO in a bigger high-dimensional nuclear assembly (BWR 10x10), that is also limited by expensive simulation costs. Finally, the conclusions of this work are presented in Section 6.
Section snippets
Nuclear fuel assembly design
The system optimized in this work is the nuclear fuel assembly; a top view of two BWR assembly designs of interest to this work are sketched in Fig. 1. Assembly optimization is seen as a permutation with repetition problem with cost proportional to , where m is the number of fuel types (i.e., choices) to pick from, while n is the number of fuel rod locations to optimize (i.e., number of times to choose). To reduce the search space, researchers tend to take advantage of problem symmetry to
Methodology
Before moving into the methodology details, it is worth defining the basic concept of RL, which is illustrated in Fig. 2, and can be summarized in three main steps:
- 1.
The agent: which is the optimizer for our study. The agent is controlled by the RL algorithm that trains the agent to take proper actions. The algorithm takes the current state and the current reward (as inputs), and decides the next action to take (as output). This sequence is the core of deep reinforcement learning, described in
Case study 1: BWR 6 × 6 fuel assembly
The first case study forms a 6 × 6 BWR assembly with a total of 36 rods. The environment setup and reward shaping are described in the first subsection, while the results are presented and discussed in the second subsection.
Case study 2: BWR 10 × 10 fuel assembly
In this section, the optimization strategy is described first, which includes detailed reward/fitness shaping. Next, RL and SO performances in optimizing the BWR 10x10 assembly are evaluated and discussed.
Closing remarks
The potential efficiency gains in nuclear fuel cost encourage fuel designers to solve high dimensional, and expensive combinatorial optimization problems. Fuel optimization is still mainly tackled by expert judgement and classical stochastic optimization (SO) algorithms. In this work, we propose a reinforcement learning (RL) physics-informed optimization methodology based on deep RL to improve upon SO performance under a robust and licensed nuclear code. The methodology utilizes deep Q learning
CRediT authorship contribution statement
Majdi I. Radaideh: Conceptualization, Methodology, Software, Formal analysis, Investigation, Writing - original draft, Validation, Visualization. Isaac Wolverton: Software, Writing - review & editing. Joshua Joseph: Software, Writing - review & editing. James J. Tusar: Data curation, Validation, Writing - review & editing. Uuganbayar Otgonbaatar: Project administration, Funding acquisition, Writing - review & editing. Nicholas Roy: Resources, Supervision, Writing - review & editing. Benoit
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work is sponsored by Exelon Corporation, a nuclear electric power generation company, under the award (40008739). Joshua Joseph was supported by the MIT Quest for Intelligence and their support is gratefully acknowledged.
References (55)
- et al.
Optimization of fuel core loading pattern design in a vver nuclear power reactors using particle swarm optimization (pso)
Annals of Nuclear Energy
(2009) - et al.
Application of differential evolution algorithms to multi-objective optimization problems in mixed-oxide fuel assembly design
Annals of Nuclear Energy
(2019) - et al.
A practical optimization procedure for radial bwr fuel lattice design using tabu search with a multiobjective function
Annals of Nuclear Energy
(2003) - et al.
The ant-q algorithm applied to the nuclear reload problem
Annals of Nuclear Energy
(2002) - et al.
Sfcompo-2.0: An oecd nea database of spent nuclear fuel isotopic assays, reactor design specifications, and operating data
Annals of Nuclear Energy
(2017) - et al.
Evolution algorithms in combinatorial optimization
Parallel Computing
(1988) - et al.
Using a multi-state recurrent neural network to optimize loading patterns in bwrs
Annals of Nuclear Energy
(2004) - et al.
Continuous firefly algorithm applied to pwr core pattern enhancement
Nuclear Engineering and Design
(2013) Incorporating sensitivity and uncertainty analysis to a lattice physics code with application to casmo-4
Annals of Nuclear Energy
(2012)- et al.
Optimization of pwr fuel assembly radial enrichment and burnable poison location based on adaptive simulated annealing
Nuclear Engineering and Design
(2009)
Using hopfield neural network to optimize fuel rod loading patterns in vver/1000 reactor by applying axial variation of enrichment distribution
Applied Soft Computing
Fractional-order particle swarm based multi-objective pwr core loading pattern optimization
Annals of Nuclear Energy
Post-irradiation analysis of the gundremmingen bwr spent fuel, Tech. rep., EUR-6301
Commission of the European Communities
A unified tabu search heuristic for vehicle routing problems with time windows
Journal of the Operational Research Society
Axial: a system for boiling water reactor fuel assembly axial optimization using genetic algorithms
Annals of Nuclear Energy
Optimum boiling water reactor fuel design strategies to enhance reactor shutdown by the standby liquid control system
Greedy randomized adaptive search procedures
Journal of Global Optimization
DEAP: Evolutionary algorithms made easy
Journal of Machine Learning Research
Metaheuristics in combinatorial optimization
Annals of Operations Research
Cited by (52)
Application of Data-Driven technology in nuclear Engineering: Prediction, classification and design optimization
2023, Annals of Nuclear EnergyNEORL: NeuroEvolution Optimization with Reinforcement Learning—Applications to carbon-free energy systems
2023, Nuclear Engineering and DesignDeep reinforcement learning for a multi-objective operation in a nuclear power plant
2023, Nuclear Engineering and TechnologyPreliminary development of machine learning-based error correction model for low-fidelity reactor physics simulation
2023, Annals of Nuclear EnergyDevelopment of an artificial neural network model for generating macroscopic cross-sections for RAST-AI
2023, Annals of Nuclear EnergySimulation and learning-driven design for architected cement-based materials
2023, Journal of Building EngineeringCitation Excerpt :Very recently, applications of deep RL have been rising in the field of computational mechanics, especially in fluid dynamics regarding flow control, process optimization, and shape optimization [19]. Radaideh et al. have also demonstrated that deep RL can outperform stochastic optimization algorithms (genetic algorithm and simulated annealing) for optimization of nuclear fuel assemblies that involves high-dimensional large state spaces [20]. Considering that mechanical design is a “game” to achieve a high “score” with respect to properties of interest, Luo et al. applied deep RL to the inverse design of layered phononic crystals [21].