Physics-informed reinforcement learning optimization of nuclear assembly design

https://doi.org/10.1016/j.nucengdes.2020.110966Get rights and content

Abstract

Optimization of nuclear fuel assemblies if performed effectively, will lead to fuel efficiency improvement, cost reduction, and safety assurance. However, assembly optimization involves solving high-dimensional and computationally expensive combinatorial problems. As such, fuel designers’ expert judgement has commonly prevailed over the use of stochastic optimization (SO) algorithms such as genetic algorithms and simulated annealing. To improve the state-of-art, we explore a class of artificial intelligence (AI) algorithms, namely, reinforcement learning (RL) in this work. We propose a physics-informed AI optimization methodology by establishing a connection through reward shaping between RL and the tactics fuel designers follow in practice by moving fuel rods in the assembly to meet specific constraints and objectives. The methodology utilizes RL algorithms, deep Q learning and proximal policy optimization, and compares their performance to SO algorithms. The methodology is applied on two boiling water reactor assemblies of low-dimensional (2×106 combinations) and high-dimensional (1031 combinations) natures. The results demonstrate that RL is more effective than SO in solving high dimensional problems, i.e., 10 × 10 assembly, through embedding expert knowledge in form of game rules and effectively exploring the search space. For a given computational resources and timeframe relevant to fuel designers, RL algorithms outperformed SO through finding more feasible patterns, 4–5 times more than SO, and through increasing search speed, as indicated by the RL outstanding computational efficiency. The results of this work clearly demonstrate RL effectiveness as another decision support tool for nuclear fuel assembly optimization.

Introduction

The sustainability of the existing light water reactor fleet is one of the main missions of U.S. nuclear industry and Department of Energy. The existing fleet provides roughly half of all carbon-free electricity in the United States. However, the number of reactors online has declined in recent years, mainly driven by cost. Reducing the nuclear fuel cost is one way to improve fleet efficiency. The nuclear fuel designers dictate the number and attributes of an assembly in terms of its enrichment and burnable poison loading. In effect, fuel designers attempt to solve a “combinatorial optimization” problem by utilizing expert judgement, nuclear design principles, and physics-based tools. Combinatorial optimization (Korte et al., 2012) in nuclear reactor design and operation is a known problem that aims to find an optimal pattern from a finite set of patterns (Kropaczek and Turinsky, 1991). Indeed, the search space for combinatorial optimization is finite by definition, and thus an optimal solution always exists.

Nuclear fuel design involves two common problems: (1) core optimization and (2) assembly optimization. Core optimization aims at finding the best loading pattern of all assemblies in the core such that the reactor operation is economic and meets safety constraints (Kropaczek and Turinsky, 1991). Assembly optimization (the focus of this work) aims on finding the optimal material composition and location of all fuel rods in the assembly such that when the assembly is introduced in the core, economic and safety constraints are satisfied (Francois et al., 2003). A review of related literature on optimization techniques is included in Section 3.1.

For assembly optimization, unlike pressurized water reactors (PWR), boiling water reactor (BWR) designs feature more heterogeneous fuel enrichment distribution radially (Fensin, 2004), which will also be seen in this work. For PWRs, some utilities adopt optimization tools to find the most economic core design more rapidly, for example ROSA (Verhagen et al., 1997) (Reload Optimization with Simulated Annealing). However, such stochastic optimization (SO) based frameworks while fast for individual pattern evaluation, are often computationally expensive for finding high performing solutions, thus their commercial application has not found much adoption for more complex problems as in the case of BWRs. It is also worth mentioning that SO code packages such as ROSA leverage surrogate models to reduce computational burden and thus do not rely on licensed methodologies. Therefore, when the best design option is found by SO, manual tuning still needs to be performed by the licensed codes. Aside from classical SO, to the authors’ knowledge, there have been very limited attempts so far to investigate the performance of modern reinforcement learning (RL) algorithms (e.g., deep Q learning, proximal policy optimization) to support nuclear engineering decisions regarding fuel assembly optimization, either for PWR or BWR. RL algorithms would prove effective if they demonstrate promising performance in embedding domain or expert knowledge through reward shaping, in exploring the search space effectively, and in their ability to more effectively find a global optimum than standard SO in a problem with many local optima. Accordingly, we explore the ability to train an intelligent system by RL that is able to learn from interactions with physics-based environments and prior expert knowledge, such that it can take proper actions in a short amount of time to efficiently arrange and optimize nuclear fuel within the assembly. RL is compared to SO algorithms (i.e., genetic algorithms, simulated annealing), which act as baselines that have been widely investigated in literature.

In this work, we provide important definitions about the design of nuclear assemblies of interest in Section 2. The methodology is described in Section 3, which starts with a literature review of related work, followed by the optimization strategy and the process of building physics-based environments to facilitate RL and SO. Next, the mathematical foundation of RL and SO algorithms and their connection to the physics-based environment are described, followed by the code deployment. The results of this paper are presented in two case studies in Section 4 and Section 5, respectively. The first case study highlights a small and low-dimensional nuclear assembly (BWR 6 × 6) with global optima known beforehand using brute-force search, where RL/SO algorithms are assessed and compared to each other. Next, RL is compared to SO in a bigger high-dimensional nuclear assembly (BWR 10x10), that is also limited by expensive simulation costs. Finally, the conclusions of this work are presented in Section 6.

Section snippets

Nuclear fuel assembly design

The system optimized in this work is the nuclear fuel assembly; a top view of two BWR assembly designs of interest to this work are sketched in Fig. 1. Assembly optimization is seen as a permutation with repetition problem with cost proportional to O(mn), where m is the number of fuel types (i.e., choices) to pick from, while n is the number of fuel rod locations to optimize (i.e., number of times to choose). To reduce the search space, researchers tend to take advantage of problem symmetry to

Methodology

Before moving into the methodology details, it is worth defining the basic concept of RL, which is illustrated in Fig. 2, and can be summarized in three main steps:

  • 1.

    The agent: which is the optimizer for our study. The agent is controlled by the RL algorithm that trains the agent to take proper actions. The algorithm takes the current state and the current reward (as inputs), and decides the next action to take (as output). This sequence is the core of deep reinforcement learning, described in

Case study 1: BWR 6 × 6 fuel assembly

The first case study forms a 6 × 6 BWR assembly with a total of 36 rods. The environment setup and reward shaping are described in the first subsection, while the results are presented and discussed in the second subsection.

Case study 2: BWR 10 × 10 fuel assembly

In this section, the optimization strategy is described first, which includes detailed reward/fitness shaping. Next, RL and SO performances in optimizing the BWR 10x10 assembly are evaluated and discussed.

Closing remarks

The potential efficiency gains in nuclear fuel cost encourage fuel designers to solve high dimensional, and expensive combinatorial optimization problems. Fuel optimization is still mainly tackled by expert judgement and classical stochastic optimization (SO) algorithms. In this work, we propose a reinforcement learning (RL) physics-informed optimization methodology based on deep RL to improve upon SO performance under a robust and licensed nuclear code. The methodology utilizes deep Q learning

CRediT authorship contribution statement

Majdi I. Radaideh: Conceptualization, Methodology, Software, Formal analysis, Investigation, Writing - original draft, Validation, Visualization. Isaac Wolverton: Software, Writing - review & editing. Joshua Joseph: Software, Writing - review & editing. James J. Tusar: Data curation, Validation, Writing - review & editing. Uuganbayar Otgonbaatar: Project administration, Funding acquisition, Writing - review & editing. Nicholas Roy: Resources, Supervision, Writing - review & editing. Benoit

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work is sponsored by Exelon Corporation, a nuclear electric power generation company, under the award (40008739). Joshua Joseph was supported by the MIT Quest for Intelligence and their support is gratefully acknowledged.

References (55)

  • S. Tayefi et al.

    Using hopfield neural network to optimize fuel rod loading patterns in vver/1000 reactor by applying axial variation of enrichment distribution

    Applied Soft Computing

    (2014)
  • A. Zameer et al.

    Fractional-order particle swarm based multi-objective pwr core loading pattern optimization

    Annals of Nuclear Energy

    (2020)
  • P. Barbero et al.

    Post-irradiation analysis of the gundremmingen bwr spent fuel, Tech. rep., EUR-6301

    Commission of the European Communities

    (1979)
  • Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S., 2016. Neural combinatorial optimization with reinforcement...
  • Bengio, Y., Lodi, A., Prouvost, A., 2020. Machine learning for combinatorial optimization: a methodological tour...
  • Berny, A., 2000. Selection and reinforcement learning for combinatorial optimization. In: International Conference on...
  • D.P. Bertsekas et al.
    (1995)
  • Brockman, G., Cheung, V., Pettersson, L., Schneider, J. Schulman, Tang, J., Zaremba, W., 2016. Openai gym, arXiv...
  • J.-F. Cordeau et al.

    A unified tabu search heuristic for vehicle routing problems with time windows

    Journal of the Operational Research Society

    (2001)
  • C.M. del Campo et al.

    Axial: a system for boiling water reactor fuel assembly axial optimization using genetic algorithms

    Annals of Nuclear Energy

    (2001)
  • Edenius, M., Ekberg, K., Forssén, B.H., Knott, D., 1995. Casmo-4, a fuel assembly burnup program, user’s manual,...
  • M.L. Fensin

    Optimum boiling water reactor fuel design strategies to enhance reactor shutdown by the standby liquid control system

    (2004)
  • T.A. Feo et al.

    Greedy randomized adaptive search procedures

    Journal of Global Optimization

    (1995)
  • F.-A. Fortin et al.

    DEAP: Evolutionary algorithms made easy

    Journal of Machine Learning Research

    (2012)
  • Verhagen, F., Van der Schaar, M., De Kruijf, W., Van de Wetering, T., Jones, R., 1997. Rosa, a utility tool for loading...
  • Gambardella, L.M., Dorigo, M., 1995. Ant-q: A reinforcement learning approach to the traveling salesman problem. In:...
  • M. Gendreau et al.

    Metaheuristics in combinatorial optimization

    Annals of Operations Research

    (2005)
  • Cited by (52)

    • Simulation and learning-driven design for architected cement-based materials

      2023, Journal of Building Engineering
      Citation Excerpt :

      Very recently, applications of deep RL have been rising in the field of computational mechanics, especially in fluid dynamics regarding flow control, process optimization, and shape optimization [19]. Radaideh et al. have also demonstrated that deep RL can outperform stochastic optimization algorithms (genetic algorithm and simulated annealing) for optimization of nuclear fuel assemblies that involves high-dimensional large state spaces [20]. Considering that mechanical design is a “game” to achieve a high “score” with respect to properties of interest, Luo et al. applied deep RL to the inverse design of layered phononic crystals [21].

    View all citing articles on Scopus
    View full text