SelectNet: Self-paced learning for high-dimensional partial differential equations
Introduction
High-dimensional partial differential equations (PDEs) are important tools in physical, financial, and biological models [39], [20], [64], [22], [61]. However, developing numerical methods for high-dimensional PDEs has been challenging due to the curse of dimensionality in the discretization of the problem. For example, in traditional methods such as finite difference methods and finite element methods, degree of freedom is required for a d-dimensional problem if we set N grid points or basis functions in each direction to achieve accuracy. Even if d becomes moderately large, the exponential growth in the dimension d makes traditional methods immediately computationally intractable.
Recent research of the approximation theory of deep neural networks (DNNs) shows that deep network approximation is a powerful tool for mesh-free function parametrization. The research on the approximation theory of neural networks traces back to the pioneering work [9], [26], [1] on the universal approximation of shallow networks with sigmoid activation functions. The recent research focus was on the approximation rate of DNNs for various function spaces in terms of the number of network parameters showing that deep networks are more powerful than shallow networks in approximation efficiency. For example, smooth functions [44], [42], [62], [18], [47], [60], [16], [15], [17], piecewise smooth functions [51], band-limited functions [49], continuous functions [63], [55], [54]. The reader is referred to [54] for the explicit characterization of the approximation error for networks with an arbitrary width and depth.
In particular, deep network approximation can lessen or overcome the curse of dimensionality under certain circumstances, making it an attractive tool for solving high-dimensional problems. For functions admitting an integral representation with a one-dimensional integral kernel, no curse of dimensionality in the approximation rate can be shown via establishing the connection of network approximation with the Monte Carlo sampling or equivalently the law of large numbers [1], [16], [15], [17], [49]. Based on the Kolmogorov-Arnold superposition theorem, for general continuous functions, [45], [24] showed that three-layer neural networks with advanced activation functions can avoid the curse of dimensionality and the total number of parameters required is only ; [48] proves that deep ReLU network approximation can lessen the curse of dimensionality, if target functions are restricted to a space related to the constructive proof of the Kolmogorov-Arnold superposition theorem in [4]. If the approximation error is only concerned on a low-dimensional manifold, there is no curse of dimensionality for deep network approximation in terms of the approximation error [7], [5], [54]. Finally, there is also extensive research showing that deep network approximation can overcome the curse of dimensionality when they are applied to approximation certain PDE solutions, e.g. [27], [29].
As an efficient function parametrization tool, neural networks have been applied to solve PDEs via various approaches. Early work in [38] applies neural networks to approximate PDE solutions defined on grid points. Later in [11], [36], DNNs are employed to approximate solutions in the whole domain, and PDEs are solved by minimizing the discrete residual error in the -norm at prescribed collocation points. DNNs coupled with boundary governing terms by design can satisfy boundary conditions [46]. Nevertheless, designing boundary governing terms is usually difficult for complex geometry. Another approach to enforcing boundary conditions is to add boundary errors to the loss function as a penalized term and minimize it as well as the PDE residual error [23], [37]. The second technique is in the same spirit of least squares methods in finite element methods and is more convenient in implementation. Therefore, it has been widely utilized for PDEs with complex domains. However, network computation was usually expensive, limiting the applications of network-based PDE solvers. Thanks to the development of GPU-based parallel computing over the last two decades, which greatly boosts the network computation, network-based PDE solvers were revisited recently and have become a popular tool, especially for high-dimensional problems [13], [19], [25], [33], [58], [3], [65], [40], [2], [29], [28], [6], [53], [41]. Nevertheless, most network-based PDE solvers suffer from robustness issues: their convergence is slow and might not be guaranteed even within a simple class of PDEs.
To ease the issue above, we introduce a novel self-paced learning framework, SelectNet, to adaptively choose training samples in the least squares model. Self-paced learning [35] is a recently raised learning technique that can choose a part of the training samples for actual training over time. Specifically, for a training data set with n samplings, self-paced learning uses a vector to indicate whether each training sample should be included in the current training stage. The philosophy of self-paced learning is to simulate human beings' learning style, which tends to learn easier aspects of a learning task first and deal with more complicated samples later. Based on self-paced learning, a novel technique for selected sampling is put forward, which uses a selection neural network instead of the 0-1 selection vector v. Hence, it learns to avoid redundant training information and speeds up the convergence of learning outcomes. This idea is further improved in [30] by introducing a DNN to select training data for image classification. Among similar works, a state-of-the-art algorithm named SelectNet is proposed in [43] for image classification, especially for imbalanced data problems. Based on the observation that samples near the singularity of the PDE solution are rare compared to samples from the regular part, we extend the SelectNet [43] to network-based least squares models, especially for PDE solutions with certain irregularity. As we shall see later, numerical results show that the proposed model is competitive with the traditional (basic) least squares model for analytic solutions, and it outperforms others for low-regularity solutions, in the aspect of the convergence speed. It is worth noting that our proposed SelectNet model is essentially tuning the weights of training points to realize the adaptive sampling. Another approach is to change the distribution of training points, such as the residual-based adaptive refinement method [32].
The organization of this paper is as follows. In Section 2, we introduce the least squares methods and formulate the corresponding optimization model. In Section 3, we present the SelectNet model in detail. In Section 4, we put forward the error estimates of the basic and SelectNet models. In Section 5, we discuss the network implementation in the proposed model. In Section 6, we present ample numerical experiments for various equations to validate our model. We conclude with some remarks in the final section.
Section snippets
Least squares methods for PDEs
In this work, we aim at solving the following (initial) boundary value problems, giving a bounded domain :
- •
elliptic equations
- •
parabolic equations
- •
hyperbolic equations
SelectNet model
The network-based least squares model has been applied to solve certain high-dimensional PDEs successfully. However, its convergence is slow and might not be guaranteed. To ease this issue, we introduce a novel self-paced learning framework, SelectNet, to adaptively choose training samples in the least squares model. The basic philosophy is to mimic the human cognitive process for more efficient learning: learning first from easier examples and slowly exploring more complicated ones. The
Error estimates
In this section, theoretical analysis is presented to show the solution errors of the basic and SelectNet models are bounded by the loss function (mean square of the residual). Specifically, we will take the elliptic PDE with Neumann boundary condition as an example. The conclusion can be generalized for other well-posed PDEs by similar argument. Consider where Ω is an open subset of whose boundary ∂Ω is smooth; , , is a given function
Network architecture
The proposed framework is independent of the choice of DNNs. Advanced network design may improve the accuracy and convergence of the proposed framework, which would be interesting for future work.
In this paper, feedforward neural networks will be repeatedly applied. Let denote such a network with an input x and parameters θ, then it is defined recursively as follows: where σ is an application-dependent nonlinear activation function, and θ
Numerical experiments
In this section, the proposed SelectNet model is tested on several PDE examples, including elliptic/parabolic and linear/nonlinear high-dimensional problems. Other network-based methods are also implemented for comparison. For all methods, we choose the feedforward architecture with activation for the solution network. Additionally, for SelectNet, we choose feedforward architecture with ReLU activation for the selection network. AdamGrad [12] is employed to solve the
Conclusion
In this work, we improve the network-based least squares models on generic PDEs by introducing a selection network for selected sampling in the optimization process. The objective is to place higher weights on the sampling points having larger point-wise residual errors, and correspondingly we propose the SelectNet model that is a min-max optimization. In the implementation, both the solution and selection functions are approximated by feedforward neural networks, which are trained
CRediT authorship contribution statement
Yiqi Gu: Investigation, Software, Visualization, Writing – original draft. Haizhao Yang: Conceptualization, Investigation, Methodology, Writing – review & editing. Chao Zhou: Investigation, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
Y. G. was partially supported by the Ministry of Education in Singapore under the grant MOE2018-T2-2-147 and MOE AcRF R-146-000-271-112. H. Y. was partially supported by the US National Science Foundation under award DMS-1945029. C. Z. was partially supported by the Ministry of Education in Singapore under the grant MOE AcRF R-146-000-271-112 and by NSFC under the grant award 11871364.
References (65)
- et al.
A unified deep artificial neural network approach to partial differential equations in complex geometries
Neurocomputing
(2018) - et al.
Approximation capability of two hidden layer feedforward neural networks with fixed weights
Neurocomputing
(2018) - et al.
Multilayer feedforward networks are universal approximators
Neural Netw.
(1989) - et al.
Neural algorithm for solving differential equations
J. Comput. Phys.
(1990) - et al.
Robust model-order reduction of complex biological processes
J. Process Control
(2002) - et al.
Lower bounds for approximation by MLP neural networks
Neurocomputing
(1999) - et al.
Numerical solution for high order differential equations using a hybrid neural network-optimization method
Appl. Math. Comput.
(2006) - et al.
Error bounds for deep ReLU networks using the Kolmogorov–Arnold superposition theorem
Neural Netw.
(2020) - et al.
Optimal approximation of piecewise smooth functions using deep ReLU neural networks
Neural Netw.
(2018) - et al.
Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations
J. Comput. Phys.
(2019)
Nonlinear approximation via compositions
Neural Netw.
Dgm: a deep learning algorithm for solving partial differential equations
J. Comput. Phys.
Error bounds for approximations with deep ReLU networks
Neural Netw.
Weak adversarial networks for high-dimensional partial differential equations
J. Comput. Phys.
Universal approximation bounds for superpositions of a sigmoidal function
IEEE Trans. Inf. Theory
Deep splitting method for parabolic PDEs
Enhanced expressive power and fast training of neural networks by random projections
Multi-scale deep neural networks for solving high dimensional PDEs
Construction of neural networks for realization of localized deep learning
Front. Appl. Math. Stat.
Importance sampling for minibatches
J. Mach. Learn. Res.
Approximation by superpositions of a sigmoidal function
Math. Control Signals Syst.
The limit points of (optimistic) gradient descent in min-max optimization
Neural-network-based approximations for solving partial differential equations
Commun. Numer. Methods Eng.
Adaptive subgradient methods for online learning and stochastic optimization
J. Mach. Learn. Res.
Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations
Commun. Math. Stat.
Integrating machine learning with physics-based modeling
A priori estimates of the population risk for residual networks
A priori estimates of the population risk for two-layer neural networks
Commun. Math. Sci.
Barron spaces and the compositional function spaces for neural network models
Constr. Approx.
Exponential convergence of the deep neural network approximation for analytic functions
Sci. China Math.
The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems
Commun. Math. Stat.
Cited by (48)
Artificial neural network solver for time-dependent Fokker–Planck equations
2023, Applied Mathematics and ComputationDiscovering governing equations in discrete systems using PINNs
2023, Communications in Nonlinear Science and Numerical SimulationGradient and uncertainty enhanced sequential sampling for global fit
2023, Computer Methods in Applied Mechanics and EngineeringDAS-PINNs: A deep adaptive sampling method for solving high-dimensional partial differential equations
2023, Journal of Computational PhysicsActive learning based sampling for high-dimensional nonlinear partial differential equations
2023, Journal of Computational PhysicsCitation Excerpt :Therefore, it is imperative to speed up the convergence. A recent study [15] proposed a self-paced learning framework, that was inspired by an algorithm named “SelectNet” [32] originally for image classification, to ease the issue of slow convergence. In [15], new neural networks called selection networks were introduced to be trained simultaneously with the residual model based solution network.
A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks
2023, Computer Methods in Applied Mechanics and EngineeringCitation Excerpt :Different methods have been developed to automatically tune these weights and balance the losses [13–15]. Moreover, different weights for each loss term could be set at every training point [8,16–18]. For problems in a large domain, the decomposition of the spatio-temporal domain accelerates the training of PINNs and improves their accuracy [19–21].
- 1
On leave from Department of Mathematics, National University of Singapore.