Solving inverse-PDE problems with physics-aware neural networks
Introduction
Inverse differential problems, where given a set of measurements one seeks a set of optimal parameters in a governing differential equation, arise in numerous scientific and technological domains. Some well-known applications include X-ray tomography [20], [55], ultrasound [83], MRI imaging [36], and transport in porous media [35]. Moreover, modeling and control of dynamic complex systems is a common problem in a broad range of scientific and engineering domains, with examples ranging from understanding the motion of bacteria colonies in low Reynolds number flows [70], to the control of spinning rotorcrafts in high speed flights [30], [31]. Other applications in medicine, navigation, manufacturing, etc. need estimation of the unknown parameters in real-time, e.g. in electroporation [91], [53] the pulse optimizer has to be informed about tissue parameters in microsecond time. On the other hand, high resolution data-sets describing spatiotemporal evolution of complex systems are becoming increasingly available by advanced multi-scale numerical simulations (see e.g. [53], [52]). These advances have become possible partly due to recent developments in discretization techniques for nonlinear partial differential equations with sharp boundaries (see e.g. the reviews [25], [24]). However, solving these inverse problems poses substantial computational and mathematical challenges that makes it difficult to infer reliable parameters from limited data and in real-time.
The problem can be mathematically formulated as follows. Let the values of be given by a set of measurements, which may include noise. Knowing that u satisfies the partial differential equation: find the hidden fields stored in c, where the hidden fields can be constant or variable coefficients (scalars, vectors or tensors).
Deep neural networks have, rather recently, attracted considerable attention for data modeling in a vast range of scientific domains, in part due to freely available modern deep learning libraries (in particular TensorFlow [1]). For example, deep neural networks have shown astonishing success in emulating sophisticated simulations [29], [90], [89], [10], [75], discovering governing differential equations from data [67], [7], [47], [72], as well as potential applications to study and improve simulations of multiphase flows [25]. We refer the reader to [60], [61] for a comprehensive survey of interplays between numerical approximation, statistical inference and learning. However, these architectures require massive datasets and extensive computations to train numerous hidden weights and biases. Therefore, reducing complexity of deep neural network architectures for inverse problems poses a significant practical challenge for many applications in physical sciences, especially when the collection of large datasets is a prohibitive task [66]. One remedy to reduce the network size is to embed the knowledge from existing mathematical models [79] or known physical laws within a neural network architecture [45], [23]. Along these lines, semantic autoencoders were recently proposed by Aragon-Calvo [2], where they replaced the decoder stage of an autoencoder architecture with a given physical law that can reproduce the provided input data given a physically meaningful set of parameters. The encoder is then constrained to discover optimal values for these parameters, which can be extracted from the bottleneck of the network after training. We shall emphasize that this approach reduces the size of the unknown model parameters, and that the encoder can be used independently to infer hidden parameters in real time, while adding interpretability to deep learning frameworks. Inspired by their work, we propose to blend traditional numerical solver algorithms with custom deep neural network architectures to solve inverse PDE problems more efficiently, and with higher accuracy.
Recently, the most widely used approach for solving forward and inverse partial differential equations using neural networks has been the constrained optimization technique. These algorithms augment the cost function with terms that describe the PDE, its boundary and its initial conditions, while the neural network acts as a surrogate for the solution field. Depending on how the derivatives in the PDEs are computed, there may be two general classes of methods that we review in the next paragraph.
In the first class, spatial differentiations in the PDE are performed exclusively using automatic differentiation, while temporal differentiation may be handled using the traditional Runge-Kutta schemes (called discrete time models) or using automatic differentiations (called continuous time models) [68]. In these methods, automatic differentiation computes gradients of the output of a neural network with respect to its input variables. Hence, the input must always be the independent variables, i.e. the input coordinates x, time and the free parameters. In this regard, network optimization aims to calibrate the weights and biases such that the neural network outputs the closest approximation of the solution of a PDE; this is enforced through a regularized loss function. An old idea that was first proposed by Lagaris et al. (1998) [43]. In 2015, the general framework of solving differential equations as a learning problem was proposed by Owhadi [57], [58], [59] which revived interest in using neural networks for solving differential equations in recent years. Raissi et al. (2017) [68], [69] presented the regularized loss function framework under the name physics informed neural networks or PINNs and applied it to time-dependent PDEs. Ever since, other authors have mostly adopted PINNs, see e.g. [76], [4]. The second class of constrained optimization methods was proposed by Xu and Darve [88] who examined the possibility of directly using pre-existing finite discretization schemes within the loss function.
An alternative approach for solving PDE systems is through explicit embedding of the governing equations inside the architecture of deep neural networks via convolutional layers, activation functions or augmented neural networks. Below we review some of these methods:
- •
A famous approach is PDE-Net [47], [46] which relies on the idea of numerical approximation of differential operators by convolutions. Therefore, PDE-Nets use convolution layers with trainable and constrained kernels that mimic differential operators (such as ) whose outputs are fed to a (symbolic) multilayer neural network that models the nonlinear response function in the PDE system, i.e. the right hand side in . Importantly, PDE-Nets can only support explicit time integration methods, such as the forward Euler method [47]. Moreover, because the differential operators are being learned from data samples, these methods have hundreds of thousands of trainable parameters that demand hundreds of data samples; e.g. see section 3.1 in [47] that uses 20 δt-blocks with parameters in each block, and use 560 data samples for training.
- •
Berg and Nyström [6] (hereby BN17) proposed an augmented design by using neural networks to estimate PDE parameters whose output is fed into a forward finite element PDE solver, while the adjoint PDE problem is employed to compute gradients of the loss function with respect to weights and biases of the network using automatic differentiation. Even though their loss function is a simple -norm functional, the physics is not localized in the structure of the neural network as the adjoint PDE problem is also employed for the optimization process. It is important to recognize that in their approach the numerical solver is a separate computational object than the neural network, therefore computing gradients of error functional with respect to the network parameters has to be done explicitly through the adjoint PDE problem. Moreover, their design can not naturally handle trainable parameters in the numerical discretization itself, a feature that is useful for some meshless numerical schemes. In contrast, in BiPDEs the numerical solver is a computational layer added in the neural network architecture and naturally supports trainable parameters in the numerical scheme. For example in the meshless method developed in section 4 we leverage this unique feature of BiPDEs to also train for shape parameters and interpolation seed locations of the numerical scheme besides the unknown diffusion coefficient.
- •
Dal Santos et al. [16] proposed an embedding of a reduced basis solver as activation function in the last layer of a neural network. Their architecture resembles an autoencoder in which the decoder is the reduced basis solver and the parameters at the bottleneck “are the values of the physical parameters themselves or the affine decomposition coefficients of the differential operators” [16].
- •
Lu et al. [48] proposed an unsupervised learning technique using variational autoencoders to extract physical parameters (not inhomogeneous spatial fields) from noisy spatiotemporal data. Again the encoder extracts physical parameters and the decoder propagates an initial condition forward in time given the extracted parameters. These authors use convolutional layers both in the encoder to extract features as well as in the decoder with recurrent loops to propagate solutions in time; i.e. the decoder leverages the idea of estimating differential operators with convolutions. Similar to PDE-Nets, this architecture is also a “PDE-integrator with explicit time stepping”, and also they need as few as 10 samples in the case of Kuramoto-Sivashinsky problem.
Basic criteria of developing numerical schemes for solving partial differential equations are consistency and convergence of the method, i.e. increasing resolution of data should yield better results. Not only there is no guarantee that approximating differential operators through learning convolution kernels or performing automatic differentiations provide a consistent or even stable numerical method, but also the learning of convolution kernels to approximate differential operators requires more data and therefore yield less data-efficient methods. Therefore it seems reasonable to explore the idea of blending classic numerical discretization methods in neural network architectures, hence informing the neural network about proper discretization methods. This is the focus of the present manuscript.
In the present work, we discard the framework of constrained optimization altogether and instead choose to explicitly blend fully traditional finite discretization schemes as the decoder layer in semantic autoencoder architectures. In our approach, the loss function is only composed of the difference between the actual data and the predictions of the solver layer, but contrary to BN17 [6] we do not consider the adjoint PDE problem to compute gradients of the error functional with respect to network parameters. This is due to the fact that in our design the numerical solver is a custom layer inside the neural network through which backpropagation occurs naturally. This is also in contrast to PINNs where the entire PDE, its boundary and its initial conditions are reproduced by the output of a neural network by adding them to the loss function. Importantly, the encoder learns an approximation of the inverse transform in a self-supervised fashion that can be used to evaluate the hidden fields underlying unseen data without any further optimization. Moreover, the proposed framework is versatile as it allows for straightforward consideration of other domain-specific knowledge such as symmetries or constraints on the hidden field. In this work, we develop this idea for stationary and time-dependent PDEs on structured and unstructured grids and on noisy data using mesh-based and mesh-less numerical discretization methods.
A full PDE solver is implemented as a custom layer inside the architecture of semantic autoencoders to solve inverse-PDE problems in a self-supervised fashion. Technically this is different than other works that implement a propagator decoder by manipulating activation functions or kernels/biases of convolutional layers, or those that feed the output of a neural network to a separate numerical solver such as in BN17 which requires the burden of considering the adjoint problem in order to compute partial differentiations. The novelties and features of this framework are summarized below:
- 1.
General discretizations. We do not limit numerical discretization of differential equations to only finite differences that are emulated by convolution operations, our approach is more general and permits employing more sophisticated numerical schemes such as meshless discretizations. It is a more general framework that admits any existing discretization method directly in a decoder stage.
- 2.
Introducing solver layers. All the information about the PDE system is only localized in a solver layer; i.e. we do not inform the optimizer or the loss function with the adjoint PDE problem, or engineer regularizers or impose extra constraints on the kernels of convolutions, or define exotic activation functions as reviewed above. In other words, PDE solvers are treated as custom layers similar to convolution operations that are implemented in convolutional layers. An important aspect is the ability to employ any of the usual loss functions used in deep learning, for example we arbitrarily used mean absolute error or mean squared error in our examples.
- 3.
Blending meshless methods with trainable parameters. Another unique proposal made in this work is the use of Radial Basis Function (RBF) based PDE solver layers as a natural choice to blend with deep neural networks. Contrary to other works, the neural network is not only used as an estimator for the unknown field but also it is tasked to optimize the shape parameters and interpolation points of the RBF scheme. In fact, our meshless decoder is not free of trainable parameters similar to reviewed works, instead shape parameters and seed locations are trainable parameters that define the RBF discretization, this is analogous to convolutional layers with trainable weights/biases that are used in machine learning domain. In fact this presents an example of neural networks complementing numerical discretization schemes. Choosing optimal shape parameters or seed locations is an open question in the field of RBF-based PDE solvers and here we show neural networks can be used to optimally define these discretization parameters.
- 4.
Explicit/implicit schemes. Most of the existing frameworks only accept explicit numerical discretizations in time, however our design naturally admits implicit methods as well. Using implicit methods allows taking bigger timesteps for stiff problems such as the diffusion problem, hence not only providing faster inverse-PDE solvers, but also present more robust/stable inverse PDE solvers.
- 5.
Data efficient. Our design lowers the computational cost as a result of reusing classical numerical algorithms for PDEs during the learning process, which focuses provided data to infer the actual unknowns in the problem, i.e. reduces the load of learning a discretization scheme from scratch.
- 6.
Physics informed. Domain-specific knowledge about the unknown fields, such as symmetries or specialized basis functions, can be directly employed within our design.
- 7.
Inverse transform. After training, the encoder can be used independently as a real-time estimator for unknown fields, i.e. without further optimization. In other words, the network can be pre-trained and then used to infer unknown fields in real-time applications.
Section snippets
Blended inverse-PDE network (BiPDE-Net)
The basic idea is to embed a numerical solver into a deep learning architecture to recover unknown functions in inverse-PDE problems, and all the information about the governing PDE system is only encoded inside the DNN architecture as a solver layer. In this section we describe our proposed architectures for inverse problems in one and two spatial dimensions.
Mesh-based BiPDE: finite differences
We consider a variable coefficient Poisson problem in one and two spatial dimensions as well as the one dimensional nonlinear Burger's equation as an example of a nonlinear dynamic PDE problem with a scalar unknown parameter.
Mesh-less BiPDE: multi-quadratic radial basis functions
Not only are direct computations of partial derivatives from noisy data extremely challenging, in many real world applications, measurements can only be made on scattered point clouds. Tikhonov regularization type approaches have been devised to avoid difficulties arising from high sensitivity of differencing operations on noisy data [14], [11], [78]; for neural network based approaches, see [49], [73]. Recently, Trask et al. [81] have proposed an efficient framework for learning from
Conclusion
We introduced BiPDE networks, a natural architecture to infer hidden parameters in partial differential equations given a limited number of observations. We showed that this approach is versatile as it can be easily applied to arbitrary static or nonlinear time-dependent inverse-PDE problems. We showed the performance of this design on multiple inverse Poisson problems in one and two spatial dimensions as well as on the non-linear time-dependent Burgers' equation in one spatial dimension.
CRediT authorship contribution statement
Samira Pakravan: Conceptualization, Investigation, Methodology, Software, Visualization, Writing – original draft. Pouria A. Mistani: Conceptualization, Investigation, Methodology, Software, Writing – review & editing. Miguel A. Aragon-Calvo: Conceptualization, Methodology, Software, Writing – review & editing. Frederic Gibou: Conceptualization, Funding acquisition, Methodology, Software, Supervision, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This research was supported by ARO W911NF-16-1-0136 and ONR N00014-17-1-2676.
References (91)
- et al.
Pattern recognition with moment invariants: a comparative study and new results
Pattern Recognit.
(1991) - et al.
Data-driven discovery of PDEs in complex datasets
Journal of Computational Physics
(2019) A Mathematical Model Illustrating the Theory of Turbulence
(1948)- et al.
Data driven approximation of parametrized PDEs by reduced basis and neural networks
J. Comput. Phys.
(2020) - et al.
A review of level-set methods and some recent applications
J. Comput. Phys.
(2018) - et al.
Sharp interface approaches and deep learning techniques for multiphase flows
J. Comput. Phys.
(2019) - et al.
An efficient numerical scheme for Burgers' equation
Appl. Math. Comput.
(1998) Multiquadrics—a scattered data approximation scheme with applications to computational fluid-dynamics—I: surface approximations and partial derivative estimates
Comput. Math. Appl.
(1990)Multiquadrics—a scattered data approximation scheme with applications to computational fluid-dynamics—II: solutions to parabolic, hyperbolic and elliptic partial differential equations
Comput. Math. Appl.
(1990)- et al.
Machine learning strategies for systems with invariance properties
J. Comput. Phys.
(2016)
PDE-Net 2.0: learning PDEs from data with a numeric-symbolic hybrid deep network
J. Comput. Phys.
A review of 3d/2d registration methods for image-guided interventions
Med. Image Anal.
The island dynamics model on parallel quadtree grids
J. Comput. Phys.
A parallel Voronoi-based approach for mesoscale simulations of cell aggregate electropermeabilization
J. Comput. Phys.
A survey of moment-based techniques for unoccluded object representation and recognition
CVGIP, Graph. Models Image Process.
Hidden physics models: machine learning of nonlinear partial differential equations
J. Comput. Phys.
Machine learning of linear differential equations using Gaussian processes
J. Comput. Phys.
A sixth-order compact finite difference scheme to the numerical solutions of Burgers' equation
Appl. Math. Comput.
Efficient implementation of essentially non-oscillatory shock capturing schemes
J. Comput. Phys.
DGM: a deep learning algorithm for solving partial differential equations
J. Comput. Phys.
Data smoothing and numerical differentiation by a regularization method
Comput. Chem. Eng.
Enforcing constraints for interpolation and extrapolation in generative adversarial networks
J. Comput. Phys.
A meshless method for Burgers' equation using MQ-RBF and high-order temporal approximation
Applied Mathematical Modelling
Tensorflow: large-scale machine learning on heterogeneous distributed systems
Self-supervised learning with physics-aware neural networks, I: galaxy model fitting
Orthogonal moment features for use with parametric and non-parametric classifiers
IEEE Trans. Pattern Anal. Mach. Intell.
Unsupervised deep learning algorithm for PDE-based forward and inverse problems
Neural network augmented inverse problems for PDEs
Radial basis functions, multi-variable functional interpolation and adaptive networks
Solving the electronic structure problem with machine learning
NPJ Comput. Mater.
Numerical differentiation of noisy, nonsmooth data
ISRN Appl. Math.
Approximation with artificial neural networks
Faculty of Sciences, Etvs Lornd University, Hungary
Numerical differentiation and regularization
SIAM J. Numer. Anal.
Approximation by superpositions of a sigmoidal function
Math. Control Signals Syst.
Overcoming the curse of dimensionality for some Hamilton–Jacobi partial differential equations via neural network architectures
Res. Math. Sci.
Nonlinear Partial Differential Equations for Scientists and Engineers
The Zernike expansion—an example of a merit function for 2D/3D registration based on orthogonal functions
Introduction to the Mathematics of Medical Imaging
Scattered data interpolation: tests of some methods
Math. Comput.
High-order schemes for Navier-Stokes equations: algorithm and implementation into FDL3DI
Coercing machine learning to output physically accurate results
J. Comput. Phys.
Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit
Nature
Solving high-dimensional partial differential equations using deep learning
Proc. Natl. Acad. Sci. USA
Multiquadric equations of topography and other irregular surfaces
J. Geophys. Res.
Cited by (0)
- 1
Equal contribution.