SelectNet: Self-paced learning for high-dimensional partial differential equations

https://doi.org/10.1016/j.jcp.2021.110444Get rights and content

Highlights

  • Haizhao Yang Department of Mathematics, Purdue University November 20, 2020.

  • Self-paced learning for neural network-based PDE solvers.

  • Automatic importance sampling for solutions with irregularity.

  • Efficient solutions of high dimensional and nonlinear PDEs.

Abstract

The least squares method with deep neural networks as function parametrization has been applied to solve certain high-dimensional partial differential equations (PDEs) successfully; however, its convergence is slow and might not be guaranteed even within a simple class of PDEs. To improve the convergence of the network-based least squares model, we introduce a novel self-paced learning framework, SelectNet, which quantifies the difficulty of training samples, treats samples equally in the early stage of training, and slowly explores more challenging samples, e.g., samples with larger residual errors, mimicking the human cognitive process for more efficient learning. In particular, a selection network and the PDE solution network are trained simultaneously; the selection network adaptively weighting the training samples of the solution network achieving the goal of self-paced learning. Numerical examples indicate that the proposed SelectNet model outperforms existing models on the convergence speed and the convergence robustness, especially for low-regularity solutions.

Introduction

High-dimensional partial differential equations (PDEs) are important tools in physical, financial, and biological models [39], [20], [64], [22], [61]. However, developing numerical methods for high-dimensional PDEs has been challenging due to the curse of dimensionality in the discretization of the problem. For example, in traditional methods such as finite difference methods and finite element methods, O(Nd) degree of freedom is required for a d-dimensional problem if we set N grid points or basis functions in each direction to achieve O(1N) accuracy. Even if d becomes moderately large, the exponential growth Nd in the dimension d makes traditional methods immediately computationally intractable.

Recent research of the approximation theory of deep neural networks (DNNs) shows that deep network approximation is a powerful tool for mesh-free function parametrization. The research on the approximation theory of neural networks traces back to the pioneering work [9], [26], [1] on the universal approximation of shallow networks with sigmoid activation functions. The recent research focus was on the approximation rate of DNNs for various function spaces in terms of the number of network parameters showing that deep networks are more powerful than shallow networks in approximation efficiency. For example, smooth functions [44], [42], [62], [18], [47], [60], [16], [15], [17], piecewise smooth functions [51], band-limited functions [49], continuous functions [63], [55], [54]. The reader is referred to [54] for the explicit characterization of the approximation error for networks with an arbitrary width and depth.

In particular, deep network approximation can lessen or overcome the curse of dimensionality under certain circumstances, making it an attractive tool for solving high-dimensional problems. For functions admitting an integral representation with a one-dimensional integral kernel, no curse of dimensionality in the approximation rate can be shown via establishing the connection of network approximation with the Monte Carlo sampling or equivalently the law of large numbers [1], [16], [15], [17], [49]. Based on the Kolmogorov-Arnold superposition theorem, for general continuous functions, [45], [24] showed that three-layer neural networks with advanced activation functions can avoid the curse of dimensionality and the total number of parameters required is only O(d); [48] proves that deep ReLU network approximation can lessen the curse of dimensionality, if target functions are restricted to a space related to the constructive proof of the Kolmogorov-Arnold superposition theorem in [4]. If the approximation error is only concerned on a low-dimensional manifold, there is no curse of dimensionality for deep network approximation in terms of the approximation error [7], [5], [54]. Finally, there is also extensive research showing that deep network approximation can overcome the curse of dimensionality when they are applied to approximation certain PDE solutions, e.g. [27], [29].

As an efficient function parametrization tool, neural networks have been applied to solve PDEs via various approaches. Early work in [38] applies neural networks to approximate PDE solutions defined on grid points. Later in [11], [36], DNNs are employed to approximate solutions in the whole domain, and PDEs are solved by minimizing the discrete residual error in the L2-norm at prescribed collocation points. DNNs coupled with boundary governing terms by design can satisfy boundary conditions [46]. Nevertheless, designing boundary governing terms is usually difficult for complex geometry. Another approach to enforcing boundary conditions is to add boundary errors to the loss function as a penalized term and minimize it as well as the PDE residual error [23], [37]. The second technique is in the same spirit of least squares methods in finite element methods and is more convenient in implementation. Therefore, it has been widely utilized for PDEs with complex domains. However, network computation was usually expensive, limiting the applications of network-based PDE solvers. Thanks to the development of GPU-based parallel computing over the last two decades, which greatly boosts the network computation, network-based PDE solvers were revisited recently and have become a popular tool, especially for high-dimensional problems [13], [19], [25], [33], [58], [3], [65], [40], [2], [29], [28], [6], [53], [41]. Nevertheless, most network-based PDE solvers suffer from robustness issues: their convergence is slow and might not be guaranteed even within a simple class of PDEs.

To ease the issue above, we introduce a novel self-paced learning framework, SelectNet, to adaptively choose training samples in the least squares model. Self-paced learning [35] is a recently raised learning technique that can choose a part of the training samples for actual training over time. Specifically, for a training data set with n samplings, self-paced learning uses a vector v{0,1}n to indicate whether each training sample should be included in the current training stage. The philosophy of self-paced learning is to simulate human beings' learning style, which tends to learn easier aspects of a learning task first and deal with more complicated samples later. Based on self-paced learning, a novel technique for selected sampling is put forward, which uses a selection neural network instead of the 0-1 selection vector v. Hence, it learns to avoid redundant training information and speeds up the convergence of learning outcomes. This idea is further improved in [30] by introducing a DNN to select training data for image classification. Among similar works, a state-of-the-art algorithm named SelectNet is proposed in [43] for image classification, especially for imbalanced data problems. Based on the observation that samples near the singularity of the PDE solution are rare compared to samples from the regular part, we extend the SelectNet [43] to network-based least squares models, especially for PDE solutions with certain irregularity. As we shall see later, numerical results show that the proposed model is competitive with the traditional (basic) least squares model for analytic solutions, and it outperforms others for low-regularity solutions, in the aspect of the convergence speed. It is worth noting that our proposed SelectNet model is essentially tuning the weights of training points to realize the adaptive sampling. Another approach is to change the distribution of training points, such as the residual-based adaptive refinement method [32].

The organization of this paper is as follows. In Section 2, we introduce the least squares methods and formulate the corresponding optimization model. In Section 3, we present the SelectNet model in detail. In Section 4, we put forward the error estimates of the basic and SelectNet models. In Section 5, we discuss the network implementation in the proposed model. In Section 6, we present ample numerical experiments for various equations to validate our model. We conclude with some remarks in the final section.

Section snippets

Least squares methods for PDEs

In this work, we aim at solving the following (initial) boundary value problems, giving a bounded domain ΩRd:

  • elliptic equationsDxu(x)=f(x), in Ω,Bxu(x)=g0(x), on Ω;

  • parabolic equationsu(x,t)tDxu(x,t)=f(x,t), in Ω×(0,T),Bxu(x,t)=g0(x,t), on Ω×(0,T),u(x,0)=h0(x), in Ω;

  • hyperbolic equations2u(x,t)t2Dxu(x,t)=f(x,t), in Ω×(0,T),Bxu(x,t)=g0(x,t), on Ω×(0,T),u(x,0)=h0(x),u(x,0)t=h1(x) in Ω;

where u is the solution function; f, g0, h0, h1 are given data functions; Dx is a spatial differential

SelectNet model

The network-based least squares model has been applied to solve certain high-dimensional PDEs successfully. However, its convergence is slow and might not be guaranteed. To ease this issue, we introduce a novel self-paced learning framework, SelectNet, to adaptively choose training samples in the least squares model. The basic philosophy is to mimic the human cognitive process for more efficient learning: learning first from easier examples and slowly exploring more complicated ones. The

Error estimates

In this section, theoretical analysis is presented to show the solution errors of the basic and SelectNet models are bounded by the loss function (mean square of the residual). Specifically, we will take the elliptic PDE with Neumann boundary condition as an example. The conclusion can be generalized for other well-posed PDEs by similar argument. Consider{Δu+cu=f, in Ω,un=g, on Ω, where Ω is an open subset of Rd whose boundary ∂Ω is C1 smooth; fL2(Ω), gL2(Ω), c(x)σ>0 is a given function

Network architecture

The proposed framework is independent of the choice of DNNs. Advanced network design may improve the accuracy and convergence of the proposed framework, which would be interesting for future work.

In this paper, feedforward neural networks will be repeatedly applied. Let ϕ(x;θ) denote such a network with an input x and parameters θ, then it is defined recursively as follows:x0=x,xl+1=σ(Wlxl+bl),l=0,1,,L1,ϕ(x;θ)=WLxL+bL, where σ is an application-dependent nonlinear activation function, and θ

Numerical experiments

In this section, the proposed SelectNet model is tested on several PDE examples, including elliptic/parabolic and linear/nonlinear high-dimensional problems. Other network-based methods are also implemented for comparison. For all methods, we choose the feedforward architecture with activation σ(x)=max(x3,0) for the solution network. Additionally, for SelectNet, we choose feedforward architecture with ReLU activation for the selection network. AdamGrad [12] is employed to solve the

Conclusion

In this work, we improve the network-based least squares models on generic PDEs by introducing a selection network for selected sampling in the optimization process. The objective is to place higher weights on the sampling points having larger point-wise residual errors, and correspondingly we propose the SelectNet model that is a min-max optimization. In the implementation, both the solution and selection functions are approximated by feedforward neural networks, which are trained

CRediT authorship contribution statement

Yiqi Gu: Investigation, Software, Visualization, Writing – original draft. Haizhao Yang: Conceptualization, Investigation, Methodology, Writing – review & editing. Chao Zhou: Investigation, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Y. G. was partially supported by the Ministry of Education in Singapore under the grant MOE2018-T2-2-147 and MOE AcRF R-146-000-271-112. H. Y. was partially supported by the US National Science Foundation under award DMS-1945029. C. Z. was partially supported by the Ministry of Education in Singapore under the grant MOE AcRF R-146-000-271-112 and by NSFC under the grant award 11871364.

References (65)

  • Zuowei Shen et al.

    Nonlinear approximation via compositions

    Neural Netw.

    (2019)
  • Justin Sirignano et al.

    Dgm: a deep learning algorithm for solving partial differential equations

    J. Comput. Phys.

    (2018)
  • Dmitry Yarotsky

    Error bounds for approximations with deep ReLU networks

    Neural Netw.

    (2017)
  • Yaohua Zang et al.

    Weak adversarial networks for high-dimensional partial differential equations

    J. Comput. Phys.

    (2020)
  • A.R. Barron

    Universal approximation bounds for superpositions of a sigmoidal function

    IEEE Trans. Inf. Theory

    (May 1993)
  • Christian Beck et al.

    Deep splitting method for parabolic PDEs

  • J. Braun, M. Griebel, On a constructive proof of Kolmogorov's superposition theorem, preprint, SFB 611,...
  • Jian-Feng Cai et al.

    Enhanced expressive power and fast training of neural networks by random projections

  • Wei Cai et al.

    Multi-scale deep neural networks for solving high dimensional PDEs

  • Charles K. Chui et al.

    Construction of neural networks for realization of localized deep learning

    Front. Appl. Math. Stat.

    (2018)
  • Dominik Csiba et al.

    Importance sampling for minibatches

    J. Mach. Learn. Res.

    (January 2018)
  • G. Cybenko

    Approximation by superpositions of a sigmoidal function

    Math. Control Signals Syst.

    (Feb 1989)
  • Constantinos Daskalakis et al.

    The limit points of (optimistic) gradient descent in min-max optimization

  • M.W.M.G. Dissanayake et al.

    Neural-network-based approximations for solving partial differential equations

    Commun. Numer. Methods Eng.

    (1994)
  • John Duchi et al.

    Adaptive subgradient methods for online learning and stochastic optimization

    J. Mach. Learn. Res.

    (July 2011)
  • E. Weinan et al.

    Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

    Commun. Math. Stat.

    (Dec 2017)
  • E. Weinan et al.

    Integrating machine learning with physics-based modeling

  • E. Weinan et al.

    A priori estimates of the population risk for residual networks

  • E. Weinan et al.

    A priori estimates of the population risk for two-layer neural networks

    Commun. Math. Sci.

    (2019)
  • E. Weinan et al.

    Barron spaces and the compositional function spaces for neural network models

    Constr. Approx.

    (2020)
  • E. Weinan et al.

    Exponential convergence of the deep neural network approximation for analytic functions

    Sci. China Math.

    (2018)
  • E. Weinan et al.

    The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems

    Commun. Math. Stat.

    (2018)
  • Cited by (48)

    • Discovering governing equations in discrete systems using PINNs

      2023, Communications in Nonlinear Science and Numerical Simulation
    • Gradient and uncertainty enhanced sequential sampling for global fit

      2023, Computer Methods in Applied Mechanics and Engineering
    • Active learning based sampling for high-dimensional nonlinear partial differential equations

      2023, Journal of Computational Physics
      Citation Excerpt :

      Therefore, it is imperative to speed up the convergence. A recent study [15] proposed a self-paced learning framework, that was inspired by an algorithm named “SelectNet” [32] originally for image classification, to ease the issue of slow convergence. In [15], new neural networks called selection networks were introduced to be trained simultaneously with the residual model based solution network.

    • A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks

      2023, Computer Methods in Applied Mechanics and Engineering
      Citation Excerpt :

      Different methods have been developed to automatically tune these weights and balance the losses [13–15]. Moreover, different weights for each loss term could be set at every training point [8,16–18]. For problems in a large domain, the decomposition of the spatio-temporal domain accelerates the training of PINNs and improves their accuracy [19–21].

    View all citing articles on Scopus
    1

    On leave from Department of Mathematics, National University of Singapore.

    View full text