Solving inverse-PDE problems with physics-aware neural networks

https://doi.org/10.1016/j.jcp.2021.110414Get rights and content

Highlights

  • We present a novel hybrid framework that enables discovery of unknown fields in inverse partial differential problems.

  • We implement trainable finite discretization solver layers that are composable with pre-existing neural layers.

  • The network can be pre-trained in a self-supervised fashion and used on unseen data without further training.

  • This framework enables consideration of domain specific knowledge about the unknown fields.

  • In contrast to constrained optimization methods, the loss function is simply difference between data and prediction.

Abstract

We propose a novel composite framework to find unknown fields in the context of inverse problems for partial differential equations (PDEs). We blend the high expressibility of deep neural networks as universal function estimators with the accuracy and reliability of existing numerical algorithms for partial differential equations as custom layers in semantic autoencoders. Our design brings together techniques of computational mathematics, machine learning and pattern recognition under one umbrella to incorporate domain-specific knowledge and physical constraints to discover the underlying hidden fields. The network is explicitly aware of the governing physics through a hard-coded PDE solver layer in contrast to most existing methods that incorporate the governing equations in the loss function or rely on trainable convolutional layers to discover proper discretizations from data. This subsequently focuses the computational load to only the discovery of the hidden fields and therefore is more data efficient. We call this architecture Blended inverse-PDE networks (hereby dubbed BiPDE networks) and demonstrate its applicability for recovering the variable diffusion coefficient in Poisson problems in one and two spatial dimensions, as well as the diffusion coefficient in the time-dependent and nonlinear Burgers' equation in one dimension. We also show that the learned hidden parameters are robust to added noise on input data.

Introduction

Inverse differential problems, where given a set of measurements one seeks a set of optimal parameters in a governing differential equation, arise in numerous scientific and technological domains. Some well-known applications include X-ray tomography [20], [55], ultrasound [83], MRI imaging [36], and transport in porous media [35]. Moreover, modeling and control of dynamic complex systems is a common problem in a broad range of scientific and engineering domains, with examples ranging from understanding the motion of bacteria colonies in low Reynolds number flows [70], to the control of spinning rotorcrafts in high speed flights [30], [31]. Other applications in medicine, navigation, manufacturing, etc. need estimation of the unknown parameters in real-time, e.g. in electroporation [91], [53] the pulse optimizer has to be informed about tissue parameters in microsecond time. On the other hand, high resolution data-sets describing spatiotemporal evolution of complex systems are becoming increasingly available by advanced multi-scale numerical simulations (see e.g. [53], [52]). These advances have become possible partly due to recent developments in discretization techniques for nonlinear partial differential equations with sharp boundaries (see e.g. the reviews [25], [24]). However, solving these inverse problems poses substantial computational and mathematical challenges that makes it difficult to infer reliable parameters from limited data and in real-time.

The problem can be mathematically formulated as follows. Let the values of u=u(t,x1,,xn) be given by a set of measurements, which may include noise. Knowing that u satisfies the partial differential equation:ut=f(t,x1,,xn;u,ux1,uxn;2ux1x1,2ux1xn;;c), find the hidden fields stored in c, where the hidden fields can be constant or variable coefficients (scalars, vectors or tensors).

Deep neural networks have, rather recently, attracted considerable attention for data modeling in a vast range of scientific domains, in part due to freely available modern deep learning libraries (in particular TensorFlow [1]). For example, deep neural networks have shown astonishing success in emulating sophisticated simulations [29], [90], [89], [10], [75], discovering governing differential equations from data [67], [7], [47], [72], as well as potential applications to study and improve simulations of multiphase flows [25]. We refer the reader to [60], [61] for a comprehensive survey of interplays between numerical approximation, statistical inference and learning. However, these architectures require massive datasets and extensive computations to train numerous hidden weights and biases. Therefore, reducing complexity of deep neural network architectures for inverse problems poses a significant practical challenge for many applications in physical sciences, especially when the collection of large datasets is a prohibitive task [66]. One remedy to reduce the network size is to embed the knowledge from existing mathematical models [79] or known physical laws within a neural network architecture [45], [23]. Along these lines, semantic autoencoders were recently proposed by Aragon-Calvo [2], where they replaced the decoder stage of an autoencoder architecture with a given physical law that can reproduce the provided input data given a physically meaningful set of parameters. The encoder is then constrained to discover optimal values for these parameters, which can be extracted from the bottleneck of the network after training. We shall emphasize that this approach reduces the size of the unknown model parameters, and that the encoder can be used independently to infer hidden parameters in real time, while adding interpretability to deep learning frameworks. Inspired by their work, we propose to blend traditional numerical solver algorithms with custom deep neural network architectures to solve inverse PDE problems more efficiently, and with higher accuracy.

Recently, the most widely used approach for solving forward and inverse partial differential equations using neural networks has been the constrained optimization technique. These algorithms augment the cost function with terms that describe the PDE, its boundary and its initial conditions, while the neural network acts as a surrogate for the solution field. Depending on how the derivatives in the PDEs are computed, there may be two general classes of methods that we review in the next paragraph.

In the first class, spatial differentiations in the PDE are performed exclusively using automatic differentiation, while temporal differentiation may be handled using the traditional Runge-Kutta schemes (called discrete time models) or using automatic differentiations (called continuous time models) [68]. In these methods, automatic differentiation computes gradients of the output of a neural network with respect to its input variables. Hence, the input must always be the independent variables, i.e. the input coordinates x, time and the free parameters. In this regard, network optimization aims to calibrate the weights and biases such that the neural network outputs the closest approximation of the solution of a PDE; this is enforced through a regularized loss function. An old idea that was first proposed by Lagaris et al. (1998) [43]. In 2015, the general framework of solving differential equations as a learning problem was proposed by Owhadi [57], [58], [59] which revived interest in using neural networks for solving differential equations in recent years. Raissi et al. (2017) [68], [69] presented the regularized loss function framework under the name physics informed neural networks or PINNs and applied it to time-dependent PDEs. Ever since, other authors have mostly adopted PINNs, see e.g. [76], [4]. The second class of constrained optimization methods was proposed by Xu and Darve [88] who examined the possibility of directly using pre-existing finite discretization schemes within the loss function.

An alternative approach for solving PDE systems is through explicit embedding of the governing equations inside the architecture of deep neural networks via convolutional layers, activation functions or augmented neural networks. Below we review some of these methods:

  • A famous approach is PDE-Net [47], [46] which relies on the idea of numerical approximation of differential operators by convolutions. Therefore, PDE-Nets use convolution layers with trainable and constrained kernels that mimic differential operators (such as Ux,Uy,Uxx,) whose outputs are fed to a (symbolic) multilayer neural network that models the nonlinear response function in the PDE system, i.e. the right hand side in Ut=F(U,Ux,Uy,Uxx,). Importantly, PDE-Nets can only support explicit time integration methods, such as the forward Euler method [47]. Moreover, because the differential operators are being learned from data samples, these methods have hundreds of thousands of trainable parameters that demand hundreds of data samples; e.g. see section 3.1 in [47] that uses 20 δt-blocks with 17,000 parameters in each block, and use 560 data samples for training.

  • Berg and Nyström [6] (hereby BN17) proposed an augmented design by using neural networks to estimate PDE parameters whose output is fed into a forward finite element PDE solver, while the adjoint PDE problem is employed to compute gradients of the loss function with respect to weights and biases of the network using automatic differentiation. Even though their loss function is a simple L2-norm functional, the physics is not localized in the structure of the neural network as the adjoint PDE problem is also employed for the optimization process. It is important to recognize that in their approach the numerical solver is a separate computational object than the neural network, therefore computing gradients of error functional with respect to the network parameters has to be done explicitly through the adjoint PDE problem. Moreover, their design can not naturally handle trainable parameters in the numerical discretization itself, a feature that is useful for some meshless numerical schemes. In contrast, in BiPDEs the numerical solver is a computational layer added in the neural network architecture and naturally supports trainable parameters in the numerical scheme. For example in the meshless method developed in section 4 we leverage this unique feature of BiPDEs to also train for shape parameters and interpolation seed locations of the numerical scheme besides the unknown diffusion coefficient.

  • Dal Santos et al. [16] proposed an embedding of a reduced basis solver as activation function in the last layer of a neural network. Their architecture resembles an autoencoder in which the decoder is the reduced basis solver and the parameters at the bottleneck “are the values of the physical parameters themselves or the affine decomposition coefficients of the differential operators” [16].

  • Lu et al. [48] proposed an unsupervised learning technique using variational autoencoders to extract physical parameters (not inhomogeneous spatial fields) from noisy spatiotemporal data. Again the encoder extracts physical parameters and the decoder propagates an initial condition forward in time given the extracted parameters. These authors use convolutional layers both in the encoder to extract features as well as in the decoder with recurrent loops to propagate solutions in time; i.e. the decoder leverages the idea of estimating differential operators with convolutions. Similar to PDE-Nets, this architecture is also a “PDE-integrator with explicit time stepping”, and also they need as few as 10 samples in the case of Kuramoto-Sivashinsky problem.

In these methods, a recurring idea is treating latent space variables of autoencoders as physical parameters passed to a physical model decoder. This basic idea pre-dates the literature on solving PDE problems and has been used in many different domains. Examples include Aragon-Calvo [2] who developed a galaxy model fitting algorithm using semantic autoencoders, or Google Tensorflow Graphics [82] which is a well-known application of this idea for scene reconstruction.

Basic criteria of developing numerical schemes for solving partial differential equations are consistency and convergence of the method, i.e. increasing resolution of data should yield better results. Not only there is no guarantee that approximating differential operators through learning convolution kernels or performing automatic differentiations provide a consistent or even stable numerical method, but also the learning of convolution kernels to approximate differential operators requires more data and therefore yield less data-efficient methods. Therefore it seems reasonable to explore the idea of blending classic numerical discretization methods in neural network architectures, hence informing the neural network about proper discretization methods. This is the focus of the present manuscript.

In the present work, we discard the framework of constrained optimization altogether and instead choose to explicitly blend fully traditional finite discretization schemes as the decoder layer in semantic autoencoder architectures. In our approach, the loss function is only composed of the difference between the actual data and the predictions of the solver layer, but contrary to BN17 [6] we do not consider the adjoint PDE problem to compute gradients of the error functional with respect to network parameters. This is due to the fact that in our design the numerical solver is a custom layer inside the neural network through which backpropagation occurs naturally. This is also in contrast to PINNs where the entire PDE, its boundary and its initial conditions are reproduced by the output of a neural network by adding them to the loss function. Importantly, the encoder learns an approximation of the inverse transform in a self-supervised fashion that can be used to evaluate the hidden fields underlying unseen data without any further optimization. Moreover, the proposed framework is versatile as it allows for straightforward consideration of other domain-specific knowledge such as symmetries or constraints on the hidden field. In this work, we develop this idea for stationary and time-dependent PDEs on structured and unstructured grids and on noisy data using mesh-based and mesh-less numerical discretization methods.

A full PDE solver is implemented as a custom layer inside the architecture of semantic autoencoders to solve inverse-PDE problems in a self-supervised fashion. Technically this is different than other works that implement a propagator decoder by manipulating activation functions or kernels/biases of convolutional layers, or those that feed the output of a neural network to a separate numerical solver such as in BN17 which requires the burden of considering the adjoint problem in order to compute partial differentiations. The novelties and features of this framework are summarized below:

  • 1.

    General discretizations. We do not limit numerical discretization of differential equations to only finite differences that are emulated by convolution operations, our approach is more general and permits employing more sophisticated numerical schemes such as meshless discretizations. It is a more general framework that admits any existing discretization method directly in a decoder stage.

  • 2.

    Introducing solver layers. All the information about the PDE system is only localized in a solver layer; i.e. we do not inform the optimizer or the loss function with the adjoint PDE problem, or engineer regularizers or impose extra constraints on the kernels of convolutions, or define exotic activation functions as reviewed above. In other words, PDE solvers are treated as custom layers similar to convolution operations that are implemented in convolutional layers. An important aspect is the ability to employ any of the usual loss functions used in deep learning, for example we arbitrarily used mean absolute error or mean squared error in our examples.

  • 3.

    Blending meshless methods with trainable parameters. Another unique proposal made in this work is the use of Radial Basis Function (RBF) based PDE solver layers as a natural choice to blend with deep neural networks. Contrary to other works, the neural network is not only used as an estimator for the unknown field but also it is tasked to optimize the shape parameters and interpolation points of the RBF scheme. In fact, our meshless decoder is not free of trainable parameters similar to reviewed works, instead shape parameters and seed locations are trainable parameters that define the RBF discretization, this is analogous to convolutional layers with trainable weights/biases that are used in machine learning domain. In fact this presents an example of neural networks complementing numerical discretization schemes. Choosing optimal shape parameters or seed locations is an open question in the field of RBF-based PDE solvers and here we show neural networks can be used to optimally define these discretization parameters.

  • 4.

    Explicit/implicit schemes. Most of the existing frameworks only accept explicit numerical discretizations in time, however our design naturally admits implicit methods as well. Using implicit methods allows taking bigger timesteps for stiff problems such as the diffusion problem, hence not only providing faster inverse-PDE solvers, but also present more robust/stable inverse PDE solvers.

  • 5.

    Data efficient. Our design lowers the computational cost as a result of reusing classical numerical algorithms for PDEs during the learning process, which focuses provided data to infer the actual unknowns in the problem, i.e. reduces the load of learning a discretization scheme from scratch.

  • 6.

    Physics informed. Domain-specific knowledge about the unknown fields, such as symmetries or specialized basis functions, can be directly employed within our design.

  • 7.

    Inverse transform. After training, the encoder can be used independently as a real-time estimator for unknown fields, i.e. without further optimization. In other words, the network can be pre-trained and then used to infer unknown fields in real-time applications.

Section snippets

Blended inverse-PDE network (BiPDE-Net)

The basic idea is to embed a numerical solver into a deep learning architecture to recover unknown functions in inverse-PDE problems, and all the information about the governing PDE system is only encoded inside the DNN architecture as a solver layer. In this section we describe our proposed architectures for inverse problems in one and two spatial dimensions.

Mesh-based BiPDE: finite differences

We consider a variable coefficient Poisson problem in one and two spatial dimensions as well as the one dimensional nonlinear Burger's equation as an example of a nonlinear dynamic PDE problem with a scalar unknown parameter.

Mesh-less BiPDE: multi-quadratic radial basis functions

Not only are direct computations of partial derivatives from noisy data extremely challenging, in many real world applications, measurements can only be made on scattered point clouds. Tikhonov regularization type approaches have been devised to avoid difficulties arising from high sensitivity of differencing operations on noisy data [14], [11], [78]; for neural network based approaches, see [49], [73]. Recently, Trask et al. [81] have proposed an efficient framework for learning from

Conclusion

We introduced BiPDE networks, a natural architecture to infer hidden parameters in partial differential equations given a limited number of observations. We showed that this approach is versatile as it can be easily applied to arbitrary static or nonlinear time-dependent inverse-PDE problems. We showed the performance of this design on multiple inverse Poisson problems in one and two spatial dimensions as well as on the non-linear time-dependent Burgers' equation in one spatial dimension.

CRediT authorship contribution statement

Samira Pakravan: Conceptualization, Investigation, Methodology, Software, Visualization, Writing – original draft. Pouria A. Mistani: Conceptualization, Investigation, Methodology, Software, Writing – review & editing. Miguel A. Aragon-Calvo: Conceptualization, Methodology, Software, Writing – review & editing. Frederic Gibou: Conceptualization, Funding acquisition, Methodology, Software, Supervision, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This research was supported by ARO W911NF-16-1-0136 and ONR N00014-17-1-2676.

References (91)

  • Z. Long et al.

    PDE-Net 2.0: learning PDEs from data with a numeric-symbolic hybrid deep network

    J. Comput. Phys.

    (2019)
  • P. Markelj et al.

    A review of 3d/2d registration methods for image-guided interventions

    Med. Image Anal.

    (2012)
  • P. Mistani et al.

    The island dynamics model on parallel quadtree grids

    J. Comput. Phys.

    (2018)
  • P. Mistani et al.

    A parallel Voronoi-based approach for mesoscale simulations of cell aggregate electropermeabilization

    J. Comput. Phys.

    (2019)
  • R.J. Prokop et al.

    A survey of moment-based techniques for unoccluded object representation and recognition

    CVGIP, Graph. Models Image Process.

    (1992)
  • M. Raissi et al.

    Hidden physics models: machine learning of nonlinear partial differential equations

    J. Comput. Phys.

    (2018)
  • M. Raissi et al.

    Machine learning of linear differential equations using Gaussian processes

    J. Comput. Phys.

    (2017)
  • M. Sari et al.

    A sixth-order compact finite difference scheme to the numerical solutions of Burgers' equation

    Appl. Math. Comput.

    (2009)
  • C.W. Shu et al.

    Efficient implementation of essentially non-oscillatory shock capturing schemes

    J. Comput. Phys.

    (1988)
  • J. Sirignano et al.

    DGM: a deep learning algorithm for solving partial differential equations

    J. Comput. Phys.

    (2018)
  • J.J. Stickel

    Data smoothing and numerical differentiation by a regularization method

    Comput. Chem. Eng.

    (2010)
  • P. Stinis et al.

    Enforcing constraints for interpolation and extrapolation in generative adversarial networks

    J. Comput. Phys.

    (2019)
  • H. Xie et al.

    A meshless method for Burgers' equation using MQ-RBF and high-order temporal approximation

    Applied Mathematical Modelling

    (2013)
  • M. Abadi et al.

    Tensorflow: large-scale machine learning on heterogeneous distributed systems

  • M.A. Aragon-Calvo

    Self-supervised learning with physics-aware neural networks, I: galaxy model fitting

  • R.R. Bailey et al.

    Orthogonal moment features for use with parametric and non-parametric classifiers

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1996)
  • L. Bar et al.

    Unsupervised deep learning algorithm for PDE-based forward and inverse problems

  • J. Berg et al.

    Neural network augmented inverse problems for PDEs

  • D.S. Broomhead et al.

    Radial basis functions, multi-variable functional interpolation and adaptive networks

    (1988)
  • A. Chandrasekaran et al.

    Solving the electronic structure problem with machine learning

    NPJ Comput. Mater.

    (2019)
  • R. Chartrand

    Numerical differentiation of noisy, nonsmooth data

    ISRN Appl. Math.

    (2011)
  • F. Chollet, et al., Keras,...
  • B.C. Csáji

    Approximation with artificial neural networks

    Faculty of Sciences, Etvs Lornd University, Hungary

    (2001)
  • J. Cullum

    Numerical differentiation and regularization

    SIAM J. Numer. Anal.

    (1971)
  • G. Cybenko

    Approximation by superpositions of a sigmoidal function

    Math. Control Signals Syst.

    (1989)
  • J. Darbon et al.

    Overcoming the curse of dimensionality for some Hamilton–Jacobi partial differential equations via neural network architectures

    Res. Math. Sci.

    (2020)
  • L. Debnath

    Nonlinear Partial Differential Equations for Scientists and Engineers

    (2011)
  • S. Dong et al.

    The Zernike expansion—an example of a merit function for 2D/3D registration based on orthogonal functions

  • C.L. Epstein

    Introduction to the Mathematics of Medical Imaging

    (2007)
  • R. Franke

    Scattered data interpolation: tests of some methods

    Math. Comput.

    (1982)
  • D.V. Gaitonde et al.

    High-order schemes for Navier-Stokes equations: algorithm and implementation into FDL3DI

    (1998)
  • Z. Geng et al.

    Coercing machine learning to output physically accurate results

    J. Comput. Phys.

    (2019)
  • R.H. Hahnloser et al.

    Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit

    Nature

    (2000)
  • J. Han et al.

    Solving high-dimensional partial differential equations using deep learning

    Proc. Natl. Acad. Sci. USA

    (2018)
  • R.L. Hardy

    Multiquadric equations of topography and other irregular surfaces

    J. Geophys. Res.

    (1971)
  • Cited by (0)

    1

    Equal contribution.

    View full text