SelectNet: Self-paced learning for high-dimensional partial differential equations

doi:10.1016/j.jcp.2021.110444

Journal of Computational Physics

Volume 441, 15 September 2021, 110444

https://doi.org/10.1016/j.jcp.2021.110444 Get rights and content

Highlights

•
Haizhao Yang Department of Mathematics, Purdue University November 20, 2020.
•
Self-paced learning for neural network-based PDE solvers.
•
Automatic importance sampling for solutions with irregularity.
•
Efficient solutions of high dimensional and nonlinear PDEs.

Abstract

The least squares method with deep neural networks as function parametrization has been applied to solve certain high-dimensional partial differential equations (PDEs) successfully; however, its convergence is slow and might not be guaranteed even within a simple class of PDEs. To improve the convergence of the network-based least squares model, we introduce a novel self-paced learning framework, SelectNet, which quantifies the difficulty of training samples, treats samples equally in the early stage of training, and slowly explores more challenging samples, e.g., samples with larger residual errors, mimicking the human cognitive process for more efficient learning. In particular, a selection network and the PDE solution network are trained simultaneously; the selection network adaptively weighting the training samples of the solution network achieving the goal of self-paced learning. Numerical examples indicate that the proposed SelectNet model outperforms existing models on the convergence speed and the convergence robustness, especially for low-regularity solutions.

Introduction

High-dimensional partial differential equations (PDEs) are important tools in physical, financial, and biological models [39], [20], [64], [22], [61]. However, developing numerical methods for high-dimensional PDEs has been challenging due to the curse of dimensionality in the discretization of the problem. For example, in traditional methods such as finite difference methods and finite element methods, $O (N^{d})$ degree of freedom is required for a d-dimensional problem if we set N grid points or basis functions in each direction to achieve $O (\frac{1}{N})$ accuracy. Even if d becomes moderately large, the exponential growth $N^{d}$ in the dimension d makes traditional methods immediately computationally intractable.

Recent research of the approximation theory of deep neural networks (DNNs) shows that deep network approximation is a powerful tool for mesh-free function parametrization. The research on the approximation theory of neural networks traces back to the pioneering work [9], [26], [1] on the universal approximation of shallow networks with sigmoid activation functions. The recent research focus was on the approximation rate of DNNs for various function spaces in terms of the number of network parameters showing that deep networks are more powerful than shallow networks in approximation efficiency. For example, smooth functions [44], [42], [62], [18], [47], [60], [16], [15], [17], piecewise smooth functions [51], band-limited functions [49], continuous functions [63], [55], [54]. The reader is referred to [54] for the explicit characterization of the approximation error for networks with an arbitrary width and depth.

In particular, deep network approximation can lessen or overcome the curse of dimensionality under certain circumstances, making it an attractive tool for solving high-dimensional problems. For functions admitting an integral representation with a one-dimensional integral kernel, no curse of dimensionality in the approximation rate can be shown via establishing the connection of network approximation with the Monte Carlo sampling or equivalently the law of large numbers [1], [16], [15], [17], [49]. Based on the Kolmogorov-Arnold superposition theorem, for general continuous functions, [45], [24] showed that three-layer neural networks with advanced activation functions can avoid the curse of dimensionality and the total number of parameters required is only $O (d)$ ; [48] proves that deep ReLU network approximation can lessen the curse of dimensionality, if target functions are restricted to a space related to the constructive proof of the Kolmogorov-Arnold superposition theorem in [4]. If the approximation error is only concerned on a low-dimensional manifold, there is no curse of dimensionality for deep network approximation in terms of the approximation error [7], [5], [54]. Finally, there is also extensive research showing that deep network approximation can overcome the curse of dimensionality when they are applied to approximation certain PDE solutions, e.g. [27], [29].

As an efficient function parametrization tool, neural networks have been applied to solve PDEs via various approaches. Early work in [38] applies neural networks to approximate PDE solutions defined on grid points. Later in [11], [36], DNNs are employed to approximate solutions in the whole domain, and PDEs are solved by minimizing the discrete residual error in the $L^{2}$ -norm at prescribed collocation points. DNNs coupled with boundary governing terms by design can satisfy boundary conditions [46]. Nevertheless, designing boundary governing terms is usually difficult for complex geometry. Another approach to enforcing boundary conditions is to add boundary errors to the loss function as a penalized term and minimize it as well as the PDE residual error [23], [37]. The second technique is in the same spirit of least squares methods in finite element methods and is more convenient in implementation. Therefore, it has been widely utilized for PDEs with complex domains. However, network computation was usually expensive, limiting the applications of network-based PDE solvers. Thanks to the development of GPU-based parallel computing over the last two decades, which greatly boosts the network computation, network-based PDE solvers were revisited recently and have become a popular tool, especially for high-dimensional problems [13], [19], [25], [33], [58], [3], [65], [40], [2], [29], [28], [6], [53], [41]. Nevertheless, most network-based PDE solvers suffer from robustness issues: their convergence is slow and might not be guaranteed even within a simple class of PDEs.

To ease the issue above, we introduce a novel self-paced learning framework, SelectNet, to adaptively choose training samples in the least squares model. Self-paced learning [35] is a recently raised learning technique that can choose a part of the training samples for actual training over time. Specifically, for a training data set with n samplings, self-paced learning uses a vector $v \in {0, 1}^{n}$ to indicate whether each training sample should be included in the current training stage. The philosophy of self-paced learning is to simulate human beings' learning style, which tends to learn easier aspects of a learning task first and deal with more complicated samples later. Based on self-paced learning, a novel technique for selected sampling is put forward, which uses a selection neural network instead of the 0-1 selection vector v. Hence, it learns to avoid redundant training information and speeds up the convergence of learning outcomes. This idea is further improved in [30] by introducing a DNN to select training data for image classification. Among similar works, a state-of-the-art algorithm named SelectNet is proposed in [43] for image classification, especially for imbalanced data problems. Based on the observation that samples near the singularity of the PDE solution are rare compared to samples from the regular part, we extend the SelectNet [43] to network-based least squares models, especially for PDE solutions with certain irregularity. As we shall see later, numerical results show that the proposed model is competitive with the traditional (basic) least squares model for analytic solutions, and it outperforms others for low-regularity solutions, in the aspect of the convergence speed. It is worth noting that our proposed SelectNet model is essentially tuning the weights of training points to realize the adaptive sampling. Another approach is to change the distribution of training points, such as the residual-based adaptive refinement method [32].

The organization of this paper is as follows. In Section 2, we introduce the least squares methods and formulate the corresponding optimization model. In Section 3, we present the SelectNet model in detail. In Section 4, we put forward the error estimates of the basic and SelectNet models. In Section 5, we discuss the network implementation in the proposed model. In Section 6, we present ample numerical experiments for various equations to validate our model. We conclude with some remarks in the final section.

Section snippets

Least squares methods for PDEs

In this work, we aim at solving the following (initial) boundary value problems, giving a bounded domain $Ω \subset R^{d}$ :

•
elliptic equations $D_{x} u (x) = f (x), in Ω, B_{x} u (x) = g_{0} (x), on \partial Ω;$
•
parabolic equations $\frac{\partial u (x, t)}{\partial t} - D_{x} u (x, t) = f (x, t), in Ω \times (0, T), B_{x} u (x, t) = g_{0} (x, t), on \partial Ω \times (0, T), u (x, 0) = h_{0} (x), in Ω;$
•
hyperbolic equations $\frac{\partial^{2} u (x, t)}{\partial t^{2}} - D_{x} u (x, t) = f (x, t), in Ω \times (0, T), B_{x} u (x, t) = g_{0} (x, t), on \partial Ω \times (0, T), u (x, 0) = h_{0} (x), \frac{\partial u (x, 0)}{\partial t} = h_{1} (x) in Ω;$

where u is the solution function; f,

g_{0}

h_{0}

h_{1}

are given data functions;

D_{x}

is a spatial differential

SelectNet model

The network-based least squares model has been applied to solve certain high-dimensional PDEs successfully. However, its convergence is slow and might not be guaranteed. To ease this issue, we introduce a novel self-paced learning framework, SelectNet, to adaptively choose training samples in the least squares model. The basic philosophy is to mimic the human cognitive process for more efficient learning: learning first from easier examples and slowly exploring more complicated ones. The

Error estimates

In this section, theoretical analysis is presented to show the solution errors of the basic and SelectNet models are bounded by the loss function (mean square of the residual). Specifically, we will take the elliptic PDE with Neumann boundary condition as an example. The conclusion can be generalized for other well-posed PDEs by similar argument. Consider ${\begin{matrix} - Δ u + c u = f, in Ω, \\ \frac{\partial u}{\partial n} = g, on \partial Ω, \end{matrix}$ where Ω is an open subset of $R^{d}$ whose boundary ∂Ω is $C^{1}$ smooth; $f \in L^{2} (Ω)$ , $g \in L^{2} (\partial Ω)$ , $c (x) \geq σ > 0$ is a given function

Network architecture

The proposed framework is independent of the choice of DNNs. Advanced network design may improve the accuracy and convergence of the proposed framework, which would be interesting for future work.

In this paper, feedforward neural networks will be repeatedly applied. Let $ϕ (x; θ)$ denote such a network with an input x and parameters θ, then it is defined recursively as follows: $x^{0} = x, x^{l + 1} = σ (W^{l} x^{l} + b^{l}), l = 0, 1, \dots, L - 1, ϕ (x; θ) = W^{L} x^{L} + b^{L},$ where σ is an application-dependent nonlinear activation function, and θ

Numerical experiments

In this section, the proposed SelectNet model is tested on several PDE examples, including elliptic/parabolic and linear/nonlinear high-dimensional problems. Other network-based methods are also implemented for comparison. For all methods, we choose the feedforward architecture with activation $σ (x) = \max (x^{3}, 0)$ for the solution network. Additionally, for SelectNet, we choose feedforward architecture with ReLU activation for the selection network. AdamGrad [12] is employed to solve the

Conclusion

In this work, we improve the network-based least squares models on generic PDEs by introducing a selection network for selected sampling in the optimization process. The objective is to place higher weights on the sampling points having larger point-wise residual errors, and correspondingly we propose the SelectNet model that is a min-max optimization. In the implementation, both the solution and selection functions are approximated by feedforward neural networks, which are trained

CRediT authorship contribution statement

Yiqi Gu: Investigation, Software, Visualization, Writing – original draft. Haizhao Yang: Conceptualization, Investigation, Methodology, Writing – review & editing. Chao Zhou: Investigation, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Y. G. was partially supported by the Ministry of Education in Singapore under the grant MOE2018-T2-2-147 and MOE AcRF R-146-000-271-112. H. Y. was partially supported by the US National Science Foundation under award DMS-1945029. C. Z. was partially supported by the Ministry of Education in Singapore under the grant MOE AcRF R-146-000-271-112 and by NSFC under the grant award 11871364.

References (65)

Jens Berg et al.
A unified deep artificial neural network approach to partial differential equations in complex geometries
Neurocomputing
(2018)
Namig J. Guliyev et al.
Approximation capability of two hidden layer feedforward neural networks with fixed weights
Neurocomputing
(2018)
Kurt Hornik et al.
Multilayer feedforward networks are universal approximators
Neural Netw.
(1989)
Hyuk Lee et al.
Neural algorithm for solving differential equations
J. Comput. Phys.
(1990)
T.T. Lee et al.
Robust model-order reduction of complex biological processes
J. Process Control
(2002)
Vitaly Maiorov et al.
Lower bounds for approximation by MLP neural networks
Neurocomputing
(1999)
A. Malek et al.
Numerical solution for high order differential equations using a hybrid neural network-optimization method
Appl. Math. Comput.
(2006)
Hadrien Montanelli et al.
Error bounds for deep ReLU networks using the Kolmogorov–Arnold superposition theorem
Neural Netw.
(2020)
Philipp Petersen et al.
Optimal approximation of piecewise smooth functions using deep ReLU neural networks
Neural Netw.
(2018)
M. Raissi et al.
Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations
J. Comput. Phys.
(2019)

Zuowei Shen et al.

Nonlinear approximation via compositions

Neural Netw.

(2019)

Justin Sirignano et al.

Dgm: a deep learning algorithm for solving partial differential equations

J. Comput. Phys.

(2018)

Dmitry Yarotsky

Error bounds for approximations with deep ReLU networks

Neural Netw.

(2017)

Yaohua Zang et al.

Weak adversarial networks for high-dimensional partial differential equations

J. Comput. Phys.

(2020)

A.R. Barron

Universal approximation bounds for superpositions of a sigmoidal function

IEEE Trans. Inf. Theory

(May 1993)

Christian Beck et al.

Deep splitting method for parabolic PDEs

J. Braun, M. Griebel, On a constructive proof of Kolmogorov's superposition theorem, preprint, SFB 611,...

Jian-Feng Cai et al.

Enhanced expressive power and fast training of neural networks by random projections

Wei Cai et al.

Multi-scale deep neural networks for solving high dimensional PDEs

Charles K. Chui et al.

Construction of neural networks for realization of localized deep learning

Front. Appl. Math. Stat.

(2018)

Dominik Csiba et al.

Importance sampling for minibatches

J. Mach. Learn. Res.

(January 2018)

G. Cybenko

Approximation by superpositions of a sigmoidal function

Math. Control Signals Syst.

(Feb 1989)

Constantinos Daskalakis et al.

The limit points of (optimistic) gradient descent in min-max optimization

M.W.M.G. Dissanayake et al.

Neural-network-based approximations for solving partial differential equations

Commun. Numer. Methods Eng.

(1994)

John Duchi et al.

Adaptive subgradient methods for online learning and stochastic optimization

J. Mach. Learn. Res.

(July 2011)

E. Weinan et al.

Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

Commun. Math. Stat.

(Dec 2017)

E. Weinan et al.

Integrating machine learning with physics-based modeling

E. Weinan et al.

A priori estimates of the population risk for residual networks

E. Weinan et al.

A priori estimates of the population risk for two-layer neural networks

Commun. Math. Sci.

(2019)

E. Weinan et al.

Barron spaces and the compositional function spaces for neural network models

Constr. Approx.

(2020)

E. Weinan et al.

Exponential convergence of the deep neural network approximation for analytic functions

Sci. China Math.

(2018)

E. Weinan et al.

The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems

Commun. Math. Stat.

(2018)

Cited by (48)

Artificial neural network solver for time-dependent Fokker–Planck equations
2023, Applied Mathematics and Computation
Stochastic differential equations (SDEs) play a crucial role in various applications for modeling systems that have either random perturbations or chaotic dynamics at faster time scales. The time evolution of the probability distribution of a stochastic differential equation is described by the Fokker–Planck equation, which is a second order parabolic partial differential equation (PDE). Previous work combined artificial neural networks and Monte Carlo data to solve stationary Fokker–Planck equations. This paper extends this approach to time dependent Fokker–Planck equations. The main focus is on the investigation of algorithms for training a neural network that has multi-scale loss functions. Additionally, a new approach for collocation point sampling is proposed. A few 1D and 2D numerical examples are demonstrated.
Discovering governing equations in discrete systems using PINNs
2023, Communications in Nonlinear Science and Numerical Simulation
Sparse identification of nonlinear dynamical systems is a topic of continuously increasing significance in the dynamical systems community. Here we explore it at the level of lattice nonlinear dynamical systems of many degrees of freedom. We illustrate the ability of a suitable adaptation of Physics-Informed Neural Networks (PINNs) to solve the inverse problem of parameter identification in such discrete, high-dimensional systems inspired by physical applications. The methodology is illustrated in a diverse array of examples including real-field ones ( $ϕ^{4}$ and sine-Gordon), as well as complex-field (discrete nonlinear Schrödinger equation) and going beyond Hamiltonian to dissipative cases (the discrete complex Ginzburg–Landau equation). Both the successes, as well as some limitations of the method are discussed along the way.
Gradient and uncertainty enhanced sequential sampling for global fit
2023, Computer Methods in Applied Mechanics and Engineering
Surrogate models based on machine learning methods have become an important part of modern engineering to replace costly computer simulations. The data used for creating a surrogate model are essential for the model accuracy and often restricted due to cost and time constraints. Adaptive sampling strategies have been shown to reduce the number of samples needed to create an accurate model. This paper proposes a new sampling strategy for global fit called Gradient and Uncertainty Enhanced Sequential Sampling (GUESS). The acquisition function uses two terms: the predictive posterior uncertainty of the surrogate model for exploration of unseen regions and a weighted approximation of the second and higher-order Taylor expansion values for exploitation. Although various sampling strategies have been proposed so far, the selection of a suitable method is not trivial. Therefore, we compared our proposed strategy to 9 adaptive sampling strategies for global surrogate modeling, based on 26 different 1 to 8-dimensional deterministic benchmarks functions. Results show that GUESS achieved on average the highest sample efficiency compared to other surrogate-based strategies on the tested examples. An ablation study considering the behavior of GUESS in higher dimensions and the importance of surrogate choice is also presented.
DAS-PINNs: A deep adaptive sampling method for solving high-dimensional partial differential equations
2023, Journal of Computational Physics
In this work we propose a deep adaptive sampling (DAS-PINNs) method for solving partial differential equations (PDEs), where deep neural networks are utilized to approximate the solutions of PDEs and deep generative models are employed to generate new collocation points to refine the training set. The overall procedure of DAS consists of two components: solving the PDEs by minimizing the residual loss on the collocation points in the training set and generating a new training set to further improve the accuracy of the current approximate solution. In particular, we treat the residual as a probability density function and approximate it with a deep generative model, called KRnet. The new samples from KRnet are consistent with the distribution induced by the residual, i.e., more samples are located in the region of large residual and less samples are located in the region of small residual. Analogous to classical adaptive methods such as the adaptive finite element, KRnet acts as an error indicator that guides the refinement of the training set. Compared to the neural network approximation obtained with uniformly distributed collocation points, the developed algorithms can significantly improve the accuracy, especially for low regularity and high-dimensional problems. We demonstrate the effectiveness of the proposed DAS-PINNs method with numerical experiments.
Active learning based sampling for high-dimensional nonlinear partial differential equations
2023, Journal of Computational Physics
Citation Excerpt :
Therefore, it is imperative to speed up the convergence. A recent study [15] proposed a self-paced learning framework, that was inspired by an algorithm named “SelectNet” [32] originally for image classification, to ease the issue of slow convergence. In [15], new neural networks called selection networks were introduced to be trained simultaneously with the residual model based solution network.
The deep-learning-based least squares method has shown successful results in solving high-dimensional and non-linear partial differential equations (PDEs). However, this method usually converges slowly. To speed up the convergence of this approach, an active-learning-based sampling algorithm is proposed in this paper. This algorithm actively chooses the most informative training samples from a probability density function based on residual errors to facilitate error reduction. In particular, points with larger residual errors will have more chances of being selected for training. This algorithm imitates the human learning process: learners are likely to spend more time repeatedly studying mistakes than other tasks they have correctly finished. A series of numerical results are illustrated to demonstrate the effectiveness of our active-learning-based sampling in high dimensions to speed up the convergence of the deep-learning-based least squares method.
A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks
2023, Computer Methods in Applied Mechanics and Engineering
Citation Excerpt :
Different methods have been developed to automatically tune these weights and balance the losses [13–15]. Moreover, different weights for each loss term could be set at every training point [8,16–18]. For problems in a large domain, the decomposition of the spatio-temporal domain accelerates the training of PINNs and improves their accuracy [19–21].
Physics-informed neural networks (PINNs) have shown to be effective tools for solving both forward and inverse problems of partial differential equations (PDEs). PINNs embed the PDEs into the loss of the neural network using automatic differentiation, and this PDE loss is evaluated at a set of scattered spatio-temporal points (called residual points). The location and distribution of these residual points are highly important to the performance of PINNs. However, in the existing studies on PINNs, only a few simple residual point sampling methods have mainly been used. Here, we present a comprehensive study of two categories of sampling for PINNs: non-adaptive uniform sampling and adaptive nonuniform sampling. We consider six uniform sampling methods, including (1) equispaced uniform grid, (2) uniformly random sampling, (3) Latin hypercube sampling, (4) Halton sequence, (5) Hammersley sequence, and (6) Sobol sequence. We also consider a resampling strategy for uniform sampling. To improve the sampling efficiency and the accuracy of PINNs, we propose two new residual-based adaptive sampling methods: residual-based adaptive distribution (RAD) and residual-based adaptive refinement with distribution (RAR-D), which dynamically improve the distribution of residual points based on the PDE residuals during training. Hence, we have considered a total of 10 different sampling methods, including six non-adaptive uniform sampling, uniform sampling with resampling, two proposed adaptive sampling, and an existing adaptive sampling. We extensively tested the performance of these sampling methods for four forward problems and two inverse problems in many setups. Our numerical results presented in this study are summarized from more than 6000 simulations of PINNs. We show that the proposed adaptive sampling methods of RAD and RAR-D significantly improve the accuracy of PINNs with fewer residual points for both forward and inverse problems. The results obtained in this study can also be used as a practical guideline in choosing sampling methods.

View all citing articles on Scopus

¹: On leave from Department of Mathematics, National University of Singapore.

View full text

SelectNet: Self-paced learning for high-dimensional partial differential equations

Highlights

Abstract

Introduction

Section snippets

Least squares methods for PDEs

SelectNet model

Error estimates

Network architecture

Numerical experiments

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

Neurocomputing

Neurocomputing

Neural Netw.

J. Comput. Phys.

J. Process Control

Neurocomputing

Appl. Math. Comput.

Neural Netw.

Neural Netw.

J. Comput. Phys.

Neural Netw.

J. Comput. Phys.

Neural Netw.

J. Comput. Phys.

Universal approximation bounds for superpositions of a sigmoidal function

IEEE Trans. Inf. Theory

Deep splitting method for parabolic PDEs

Enhanced expressive power and fast training of neural networks by random projections

Multi-scale deep neural networks for solving high dimensional PDEs

Construction of neural networks for realization of localized deep learning

Front. Appl. Math. Stat.

Importance sampling for minibatches

J. Mach. Learn. Res.

Approximation by superpositions of a sigmoidal function

Math. Control Signals Syst.

The limit points of (optimistic) gradient descent in min-max optimization

Neural-network-based approximations for solving partial differential equations

Commun. Numer. Methods Eng.

Adaptive subgradient methods for online learning and stochastic optimization

J. Mach. Learn. Res.

Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

Commun. Math. Stat.

Integrating machine learning with physics-based modeling

A priori estimates of the population risk for residual networks

A priori estimates of the population risk for two-layer neural networks

Commun. Math. Sci.

Barron spaces and the compositional function spaces for neural network models

Constr. Approx.

Exponential convergence of the deep neural network approximation for analytic functions

Sci. China Math.

The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems

Commun. Math. Stat.