Bayesian inversion for electromyography using low-rank tensor formats

Anna Rörich; Tim A. Werthmann; Dominik Göddeke; Lars Grasedyck

doi:10.1088/1361-6420/abd85a

1. Introduction

In clinical applications, surface electromyographic (EMG) data are a widely used source of information about the muscular and nervous system. For example, EMG data are a valuable source of information in neurology, movement analysis, rehabilitation medicine or the development of biofeedback techniques. To this end, different models have been developed to simulate and understand EMG data, see, e.g., [26].

Using EMG measurements, we focus on reconstructing the intracellular conductivity of biological tissue. As the conductivity provides information about the structure of this tissue, we make an important step towards a non-invasive and radiation-free imaging method. Furthermore, reliable estimates on the conductivity from patient-specific EMG measurements can advance the personalized treatment.

Computed EMG data is, however, highly sensitive to changes in the conductivity, see, e.g., [19]. In addition, reconstructing data from (surface) measurements is an inverse problem [16]. Since the measurement error is unknown, we model it as a stochastic quantity and include it into the EMG model. This results in a probabilization of the whole EMG model. Consequently, the solution of the inverse EMG problem also becomes probabilistic.

For solving this probabilistic inverse problem, in section 2, we use a Bayesian ansatz, cf. [4, 31], that searches for the probability distribution of the parameters for given measurements, the so-called posterior distribution. This ansatz has the advantage that the posterior distribution quantifies the uncertainty within instances of the reconstructed parameters.

Discretizing the posterior distribution means drawing a finite number of samples from the posterior which includes solving the (discrete) forward EMG problem for different parameter samples to check the fidelity of each sample.

As solving the forward EMG problem is expensive using classical methods, we aim at precomputing the solution of the forward problem for all parameters at the same time. This results in a parameter-dependent linear system of equations, i.e., A(p)ϕ(p) = b(p) for an operator A, a solution ϕ, and a right-hand side b depending on parameters p = (p⁽¹⁾, p⁽²⁾, ..., p^(d)). After discretizing the parameters in the sense that we allow each parameter p^(j), j = 1, ..., d, to take n different values from its domain, solving the linear system for every combination of parameters implies solving n^d linear systems. This exponential scaling in the dimension d of the parameter space is commonly known as the curse of dimensionality which renders classical methods for d ≫ 2 infeasible.

To represent these parameter-dependent linear systems, we use low-rank tensor formats, cf. [3, 11], which we recapitulate in section 3. Solving these linear systems within these formats allows us to evaluate the parameter-dependent forward problem fast.

In particular, our main contributions to solve this Bayesian inverse EMG problem and to represent the forward problem in a data-sparse way using low-rank tensor formats are:

We prove the well-posedness of our particular Bayesian inverse EMG problem in section 4 and show that modeling the measurement error leads to a natural regularization of the inverse problem.
We derive a discretization of the parameter-dependent operator and the right-hand side in section 5 and prove a data-sparse representation of this discretization using low-rank tensor formats. This method allows us to solve the parameter-dependent linear system fast.
Combining this data-sparse representation with a standard Metropolis–Hastings algorithm in section 6 allows us to solve the Bayesian inverse EMG problem efficiently.

In section 7, we present our numerical experiments that support our theoretical analysis and indicate that the Markov chain constructed by the Metropolis–Hastings algorithm using low-rank tensor formats behaves like the Markov chain constructed by a standard algorithm (SA). Further, we observe a speedup of more than 600 using low-rank tensor formats compared to a SA.

In section 8, we discuss some related work, and in section 9, we conclude that mathematical theory and an efficient representation of the parameter-dependent solution, which allows us to generate samples fast, leads to an efficient algorithm to solve the Bayesian inverse problem.

2. The Bayesian inverse electromyographic problem

In order to define our Bayesian inverse EMG problem, we briefly discuss the structure of skeletal muscles and summarize a forward model of surface EMG signals in the following.

A skeletal muscle is composed of bundles of cells, the so-called muscle fibers. These muscle fibers are the active contractile tissue of a body that react to electrical stimuli.

To model EMG signals, we follow the physical structure of a skeletal muscle beginning with the electrical behavior of a single muscle fiber and then describing the electrical behavior of a skeletal muscle by assembling the muscle fibers.

An electrical stimulus from the spinal cord influences the chemo-electrical behavior of the innervated muscle fibers ${D}_{\text{F},j}\subseteq \mathbb{R}$ , j = 1, ..., N_MF, for ${N}_{\text{MF}}\in \mathbb{N}$ muscle fibers. These electrical fluctuations travel along the muscle fibers as action potentials (APs), propagate through the muscle, and are measured at $M\in \mathbb{N}$ measuring points summarized in $\mathbf{x}\in {\mathbb{R}}^{M{\times}3}$ .

We apply the widely used model by Rosenfalck [28] to model the muscle fiber AP:

$\begin{equation}{v}_{\text{m},j}\left(s\right)={r}_{1,j}{s}^{3}\enspace \mathrm{exp}\left(-{r}_{2,j}s\right)-{r}_{3,j}\quad \text{for}\enspace s\in {D}_{\text{F},j},\enspace j=1,\dots ,{N}_{\text{MF}}.\end{equation} \tag{ 1 }$

Here, ${r}_{1,j},{r}_{2,j},{r}_{3,j}\in \mathbb{R}$ are known, fixed constants, and the spatial coordinate s can be rewritten as s = u_j t using the AP velocities u_j and time t.

To assemble a three-dimensional skeletal muscle ${D}_{\text{M}}\subseteq {\mathbb{R}}^{3}$ from the one-dimensional muscle fibers ${D}_{\text{F},j}\subseteq \mathbb{R}$ , a transfer operator is needed. Thus, we introduce the smoothing operator $S:{D}_{\text{F},j}\to {\mathbb{R}}^{3}$ with

$\begin{equation}S\left({v}_{\text{m},j}\right)\left(x\right)={v}_{\text{m},j}\left({\pi }_{j}\left(x\right)\right)\mathrm{exp}\left(-\frac{\beta }{2}{\Vert}x-{\pi }_{j}\left(x\right){{\Vert}}_{{\mathbb{R}}^{3}}^{2}\right),\end{equation} \tag{ 2 }$

where $\beta \in \mathbb{R}$ is a smoothing parameter and π_j: D_M → D_F,j is the orthogonal projection of a muscle tissue point x ∈ D_M onto the muscle fiber D_F,j with starting point ${y}_{j}\in {\mathbb{R}}^{3}$ and direction ${ \overrightarrow {d}}_{j}\in {\mathbb{R}}^{3}$ . Note that the muscle fiber directions ${ \overrightarrow {d}}_{j}$ in general depend on $x\in {\mathbb{R}}^{3}$ and are known for the forward problem, e.g., through a medical imaging technique. The projection reads

$\begin{equation}{\pi }_{j}\left(x\right)={y}_{j}+\frac{{\left(x-{y}_{j}\right)}^{\top }{ \overrightarrow {d}}_{j}}{{ \overrightarrow {d}}_{j}^{\top }{ \overrightarrow {d}}_{j}}{ \overrightarrow {d}}_{j}.\end{equation} \tag{ 3 }$

Applying the smoothing operator to the muscle fibers yields ${\cup }_{j=1}^{{N}_{\text{MF}}}S\left({D}_{\text{F},j}\right)={D}_{\text{M}}$ , and we obtain the membrane potential ${V}_{\text{m}}\left(x\right)={\sum }_{j=1}^{{N}_{\text{MF}}}S\left({v}_{\text{m},j}\right)\left(x\right)$ .

The bidomain equation, as stated in [26], models the propagation of the membrane potential V_m through a skeletal muscle by

$\begin{equation}\nabla \cdot \left(\left({\sigma }_{\text{i}}+{\sigma }_{\text{e}}\right)\nabla {\phi }_{\text{e}}\right)=-\nabla \cdot \left({\sigma }_{\text{i}}\nabla {V}_{\text{m}}\right)\quad \text{in}\enspace {D}_{\text{M}},\end{equation} \tag{ 4 }$

where ϕ_e denotes the extracellular electrical potential, and σ_i, σ_e are the intra- and extracellular electrical conductivities. Additionally, no-flow boundary conditions are introduced at the domain boundary. A zero-mean integral condition is used to ensure uniqueness of the solution.

The above model can easily be extended by the electrophysiology of surrounding connective tissue and bones, and a model of force generation and the corresponding continuum mechanics, see [26] and the references therein. Within our setting, the muscle geometry and the structure of the tissue remain unchanged in time.

Note that we model the conductivities as matrices, e.g., ${\sigma }_{\text{i}}\in {\mathbb{R}}^{3{\times}3}$ , where each matrix entry ${\left({\sigma }_{\text{i}}\right)}_{j,k}$ quantifies the conductivity of the tissue in the x_j–x_k-direction for j, k = 1, 2, 3. In particular, the eigenvector of σ_i that belongs to the largest eigenvalue represents the orientation of the underlying muscle fiber, and the largest eigenvalue corresponds to the longitudinal conductivity of the underlying muscle fiber. Note that the conductivity of a muscle fiber in transversal direction is much smaller. This relation enables us to draw conclusions about the structure of muscular tissue from its intracellular conductivity. To verify our ansatz described in the following sections, we restrict ourselves to diagonal conductivity matrices, i.e., the corresponding eigenvectors are the unit vectors ${ \overrightarrow {e}}_{j}\in {\mathbb{R}}^{3}$ for j = 1, 2, 3. Consequently, the muscle fiber direction is one of these unit vectors.

A reasonable assumption on σ_i is that it is bounded, i.e., there exist constants s₋ > 0 and s₊ < ∞ such that s₋ ⩽ σ_i ⩽ s₊ holds componentwise. Physically this corresponds to the tissue neither being fully insulating nor superconducting. Formalizing these considerations leads to the assumption $p{:=}\left({\left({\sigma }_{\text{i}}\right)}_{1,1},{\left({\sigma }_{\text{i}}\right)}_{2,2},{\left({\sigma }_{\text{i}}\right)}_{3,3}\right)\in {\left[{s}_{-},{s}_{+}\right]}^{3}{=:}\mathcal{J}$ .

For simplicity, we encapsulate the above models in the definition of the observation operator

$\begin{equation}{\mathcal{G}}_{\mathbf{x}}:\mathcal{J}\to {\mathbb{R}}^{M}\quad \text{with}\enspace p{\mapsto}\phi \left(\mathbf{x}\right),\end{equation} \tag{ 5 }$

which maps the diagonal entries p of a given intracellular conductivity σ_i to the calculated electrical potential ϕ(x) at measuring points $\mathbf{x}\in {\mathbb{R}}^{M{\times}3}$ .

To complete the forward EMG model, we include the inevitable measurement error which is unknown but is usually assumed to be additive and to follow a normal distribution. Hence, the measurement error is modeled as a random variable $\eta :{\Omega}\to {\mathbb{R}}^{M}$ on a complete probability space $\left({\Omega},\mathcal{F},P\right)$ with $\eta \sim \mathcal{N}\left(0,{\Xi}\right)$ and covariance matrix ${\Xi}=\mathrm{diag}\left(\xi ,\dots ,\xi \right)\in {\mathbb{R}}^{M{\times}M}$ . Adding the measurement error to (5) yields the model for EMG data

$\begin{equation}{\phi }_{\text{EMG}}^{\text{comp}}\left(p\right)={\phi }_{\text{EMG}}^{\text{comp}}\left(p,\mathbf{x},\omega \right){:=}{\mathcal{G}}_{\mathbf{x}}\left(p\right)+\eta \left(\omega \right)\in {\mathbb{R}}^{M}.\end{equation} \tag{ 6 }$

Solving (6) for p, as in the inverse problem setting, shows that p must be a random variable as well. For emphasizing the randomness of p, we write p = p(ω).

A naive inversion of the probabilistic forward problem would be to search for a $p\left(\omega \right)\in \mathcal{J}$ such that ${\phi }_{\text{EMG}}^{\text{comp}}\left(p\left(\omega \right)\right)={\phi }_{\text{EMG}}^{\text{meas}}$ for given measurements ${\phi }_{\text{EMG}}^{\text{meas}}\in {\mathbb{R}}^{M}$ . This problem formulation searches for particular realizations of the random variable p that, however, misrepresents the behavior of the probabilistic inverse EMG problem. Hence, we need a more appropriate problem formulation.

We consider a function space Bayesian formulation which aims at calculating the probability distribution of p for given data ${\phi }_{\text{EMG}}^{\text{meas}}$ .

To follow this approach, we assume that the entries of p are uncorrelated and equip $\mathcal{J}$ with the product σ-algebra ${\Theta}{:=}{\bigotimes}_{j=1}^{3}\mathcal{B}\left(\left[{s}_{-},{s}_{+}\right]\right)$ , where $\mathcal{B}\left(\left[{s}_{-},{s}_{+}\right]\right)$ is the Borel-σ-algebra on [s₋, s₊]. Subsequently, the product probability measure $\rho {:=}{\bigotimes}_{j=1}^{3}\mathrm{d}{\lambda }_{j}$ is defined on the measurable space $\left(\mathcal{J},{\Theta}\right)$ with dλ_j denoting the normalized Lebesgue measure on [s₋, s₊], similar to [18, 29]. Note that ρ is the probability law of the random variable p, since the diagonal entries p of the intracellular conductivity σ_i are uncorrelated. The Lebesgue measure indicates that the entries of p are uniformly distributed on [s₋, s₊]. In the Bayesian context, ρ is called the prior measure or short prior, because it describes the behavior of p prior to having any knowledge about the conductivity, e.g., from measurements.

The Bayesian inverse EMG problem searches for the conditional probability distribution ρ^EMG of p given EMG measurements ${\phi }_{\text{EMG}}^{\text{meas}}$ . We prove the existence of the posterior distribution ρ^EMG in section 4.

For solving our Bayesian inverse EMG problem, we use a Metropolis–Hastings algorithm, see, e.g., [27]. A Metropolis–Hastings algorithm is an acceptance-rejection algorithm that draws samples from the posterior distribution by solving the EMG forward problem for different realizations of p and comparing the results. If the proposal is accepted by an acceptance strategy a, it becomes part of a Markov chain. Otherwise, the old sample will be kept and a new proposal will be drawn.

In [4], the acceptance strategy $a\left(p,\tilde {p}\right){:=}\mathrm{min}\left\{1,\mathrm{exp}\left({\Phi}\left(p\right)-{\Phi}\left(\tilde {p}\right)\right)\right\}$ with the potential ${\Phi}:\mathcal{J}{\times}{\mathbb{R}}^{M}\to \mathbb{R}$ defined by

$\begin{equation}{\Phi}\left(p,{\phi }_{\text{EMG}}^{\text{meas}}\right){:=}\frac{1}{2}{\Vert}{\phi }_{\text{EMG}}^{\text{meas}}-{\mathcal{G}}_{\mathbf{x}}\left(p\right){{\Vert}}_{{\Xi}}^{2}-\frac{1}{2}{\Vert}{\phi }_{\text{EMG}}^{\text{meas}}{{\Vert}}_{{\Xi}}^{2}\end{equation} \tag{ 7 }$

and Ξ-norm ${\Vert}\mathbf{v}{{\Vert}}_{{\Xi}}{:=}{\Vert}{{\Xi}}^{-\frac{1}{2}}\mathbf{v}{{\Vert}}_{{\mathbb{R}}^{M}}$ for all $\mathbf{v}\in {\mathbb{R}}^{M}$ was derived such that the resulting Markov chain is reversible with respect to the prior ρ. This yields the convergence of the Metropolis–Hastings algorithm.

We rewrite the acceptance strategy:

$\begin{align*}\hfill a\left(p,\tilde {p}\right)& =\mathrm{min}\left\{1,\mathrm{exp}\left({\Phi}\left(p\right)-{\Phi}\left(\tilde {p}\right)\right)\right\}\hfill \\ \hfill & =\mathrm{min}\left\{1,\frac{\mathrm{exp}\left(\frac{1}{2}{\Vert}{\phi }_{\text{EMG}}^{\text{meas}}-{\mathcal{G}}_{\mathbf{x}}\left(p\right){{\Vert}}_{{\Xi}}^{2}\right)}{\mathrm{exp}\left(\frac{1}{2}{\Vert}{\phi }_{\text{EMG}}^{\text{meas}}-{\mathcal{G}}_{\mathbf{x}}\left(\tilde {p}\right){{\Vert}}_{{\Xi}}^{2}\right)}\right\}\hfill \\ \hfill & \quad \begin{cases}=1\enspace \quad \text{if}\enspace {\Vert}{\phi }_{\text{EMG}}^{\text{meas}}-{\mathcal{G}}_{\mathbf{x}}\left(\tilde {p}\right){{\Vert}}_{{\Xi}}^{2}{\leqslant}{\Vert}{\phi }_{\text{EMG}}^{\text{meas}}-{\mathcal{G}}_{\mathbf{x}}\left(p\right){{\Vert}}_{{\Xi}}^{2},\quad \hfill \\ {< }1\enspace \quad \text{otherwise}.\quad \hfill \end{cases}\hfill \end{align*}$

Consequently, a new proposal will always be accepted, if it produces a smaller error than the last accepted sample, and will otherwise be rejected with probability 1 − a, i.e., the old sample will be kept with probability 1 − a.

3. Low-rank tensor formats

Evaluating the acceptance strategy in every step of the Metropolis–Hastings algorithm requires the evaluation of the observation operator ${\mathcal{G}}_{\mathbf{x}}$ , i.e., the solution of the forward EMG problem, for a new set of diagonal entries p of the intracellular conductivity. Consequently, we need a way to compute these solutions fast. We use low-rank tensor formats to accelerate these computations and motivate these formats using an example, analog to [10].

We consider the scaling of a discrete operator A_h by a parameter p_h(j), j = 1, ..., n with $n\in \mathbb{N}$ , i.e., p_h(j)A_h. We assume that the right-hand side b_h is constant for all p_h(j). Using classical methods, we would need to solve the following linear system:

$\begin{equation*}\left(\begin{matrix}\hfill {p}_{h}\left(1\right){A}_{h}\hfill & \hfill 0\hfill & \hfill \dots \hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill {p}_{h}\left(2\right){A}_{h}\hfill & \hfill \ddots \hfill & \hfill {\vdots}\hfill \\ \hfill {\vdots}\hfill & \hfill \ddots \hfill & \hfill \ddots \hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill \dots \hfill & \hfill 0\hfill & \hfill {p}_{h}\left(n\right){A}_{h}\hfill \end{matrix}\right)\left(\begin{matrix}\hfill {\phi }_{h}\left({p}_{h}\left(1\right)\right)\hfill \\ \hfill {\phi }_{h}\left({p}_{h}\left(2\right)\right)\hfill \\ \hfill {\vdots}\hfill \\ \hfill {\phi }_{h}\left({p}_{h}\left(n\right)\right)\hfill \end{matrix}\right)=\left(\begin{matrix}\hfill {b}_{h}\hfill \\ \hfill {b}_{h}\hfill \\ \hfill {\vdots}\hfill \\ \hfill {b}_{h}\hfill \end{matrix}\right).\end{equation*}$

Using the Kronecker product to reformulate this system

$\begin{equation*}\left(\mathrm{diag}\left({p}_{h}\left(1\right),{p}_{h}\left(2\right),\dots ,{p}_{h}\left(n\right)\right)\otimes {A}_{h}\right){\phi }_{h}\left({p}_{h}\right)={\left(1,\dots ,1\right)}^{\top }\otimes {b}_{h},\end{equation*}$

we achieve a data-sparse representation. We use a generalization of this representation to derive a data-sparse representation of the parameter-dependent forward EMG problem which can be interpreted as the CANDECOMP/PARAFAC, or short CP, representation introduced in [2, 17].

Definition 3.1 (CP vector and CP operator). A CP representation of a tensor $\mathbf{b}\in {\mathbb{R}}^{{n}_{1}{\times}\cdots {\times}{n}_{d}}$ , with representation rank $r\in {\mathbb{N}}_{0}$ , is defined as

$\begin{equation}\mathbf{b}=\sum\limits _{k=1}^{r}\underset{\ell =1}{\overset{d}{\bigotimes}}{b}_{k}^{\left(\ell \right)}\quad \text{with}\enspace {b}_{k}^{\left(\ell \right)}\in {\mathbb{R}}^{{n}_{\ell }}.\end{equation} \tag{ 8 }$

We call each $\ell \in \mathcal{D}{:=}\left\{1,\dots ,d\right\}$ mode and d the dimension. The minimal r, such that (8) holds, is called the CP rank of b and in this case (8) is called the CP decomposition of b. We call a tensor of the form (8) a CP vector.

A CP representation of a tensor operator A from ${\mathbb{R}}^{{n}_{1}{\times}\cdots {\times}{n}_{d}}$ to ${\mathbb{R}}^{{n}_{1}{\times}\cdots {\times}{n}_{d}}$ , with representation rank r and dimension d, is defined as

$\begin{equation}\mathbf{A}=\sum\limits _{k=1}^{r}\underset{\ell =1}{\overset{d}{\bigotimes}}{A}_{k}^{\left(\ell \right)}\quad \text{with}\enspace {A}_{k}^{\left(\ell \right)}\in {\mathbb{R}}^{{n}_{\ell }{\times}{n}_{\ell }}.\end{equation} \tag{ 9 }$

We call a tensor of the form (9) a CP operator.

Note that the ${b}_{k}^{\left(\ell \right)}$ in (8) are vectors and that the ${A}_{k}^{\left(\ell \right)}$ in (9) are matrices. Therefore, a CP vector b is a sum of rank r Kronecker products of d vectors, and a CP operator A is a summation over Kronecker products of matrices. Thus, using definition 3.1, there exist CP vectors and CP operators of any dimension and rank.

A big advantage of the CP format is the data-sparsity in case of a small representation rank r, since a tensor $\mathbf{b}\in {\mathbb{R}}^{{n}_{1}{\times}\cdots {\times}{n}_{d}}$ of the form (8) has storage cost in $\mathcal{O}\left(r{\sum }_{\ell =1}^{d}{n}_{\ell }\right)\approx \mathcal{O}\left(rdn\right)$ compared to $\mathcal{O}\left({\prod }_{\ell =1}^{d}{n}_{\ell }\right)\approx \mathcal{O}\left({n}^{d}\right)$ with $n={\mathrm{max}}_{\ell \in \mathcal{D}}\enspace {n}_{\ell }$ .

Therefore it is desirable to represent the operator and the right-hand side of the forward EMG problem data-sparse using low-rank tensor formats. To compute the solution of the discrete forward EMG problem, we need to solve linear systems within low-rank tensor formats. An algorithm that can calculate the inverse of an operator with rank r > 1 in a direct way is unknown.

Consider, e.g., a CP operator A of dimension 1 and rank 2, i.e., A = A₁ + A₂, with ${A}_{1},{A}_{2}\in {\mathbb{R}}^{n{\times}n}$ . Then, finding a direct inverse of A in the CP format means finding matrices C_j and D_j such that ${\mathbf{A}}^{-1}={\left({A}_{1}+{A}_{2}\right)}^{-1}={\sum }_{j=1}^{J}{C}_{j}^{-1}+{D}_{j}^{-1}$ should hold for some rank $J\in \mathbb{N}$ . Since such a property is unknown even for matrix summations [25], it is also unknown in the more general tensor case.

**Figure 1.** Dimension tree for dimension d = 4.
Download figure:
Standard image High-resolution image

We therefore need iterative solvers and thus arithmetic operations within low-rank tensor formats. These arithmetic operations often lead to an increase of the representation rank.

Consider, e.g., a CP operator of dimension 2 and rank 3, i.e., $\mathbf{A}={\sum }_{i=1}^{3}{A}_{i}^{\left(1\right)}\bigotimes{A}_{i}^{\left(2\right)}$ and a CP vector of dimension 2 and rank 2, i.e., $\mathbf{x}={\sum }_{j=1}^{2}{x}_{j}^{\left(1\right)}\bigotimes{x}_{j}^{\left(2\right)}$ . Then, the application of A to x yields $\mathbf{A}\mathbf{x}=\left({\sum }_{i=1}^{3}{A}_{i}^{\left(1\right)}\bigotimes{A}_{i}^{\left(2\right)}\right)\left({\sum }_{j=1}^{2}{x}_{j}^{\left(1\right)}\bigotimes{x}_{j}^{\left(2\right)}\right)={\sum }_{i=1}^{3}{\sum }_{j=1}^{2}{A}_{i}^{\left(1\right)}{x}_{j}^{\left(1\right)}\bigotimes{A}_{i}^{\left(2\right)}{x}_{j}^{\left(2\right)}={\sum }_{k=1}^{6}{y}_{k}^{\left(1\right)}\bigotimes{y}_{k}^{\left(2\right)}$ with ${y}_{k}^{\left(\nu \right)}{:=}{A}_{i}^{\left(\nu \right)}{x}_{j}^{\left(\nu \right)}$ for k = i + 3(j − 1) and ν = 1, 2. Therefore Ax is a CP vector of representation rank 6(=2 × 3).

The above example shows that we need a truncation of a tensor to lower rank, i.e., an approximation with a tensor of lower rank. To guarantee the convergence of iterative methods, we have to guarantee that the truncation error is small enough, cf. [14].

The set of CP tensors of rank r is, however, not closed which makes the approximation of a CP tensor of rank r an ill-posed problem, cf. [30]. Therefore, we cannot guarantee that the truncation error will be small enough to yield convergence of the iterative method. To overcome this drawback, we use the hierarchical Tucker format to represent and compute the solution of a linear system.

The general idea of the hierarchical Tucker format, which was first introduced in [15] and further analyzed in [9], is to define a hierarchy among the modes $\mathcal{D}=\left\{1,\dots ,d\right\}$ . To do so, we define the so-called dimension tree analogously to [9, definition 3.1].

Definition 3.2 (Dimension tree). A dimension tree $\mathcal{T}$ for dimension $d\in \mathbb{N}$ is a binary tree with nodes labeled by non-empty subsets of $\mathcal{D}$ . Its root is labeled with $\mathcal{D}$ , each leaf node is labeled with a single-element subset $z=\left\{\ell \right\}\subseteq \mathcal{D}$ , and each inner node is labeled with the disjoint union of its two children. We will identify a node with its label z and therefore write $z\in \mathcal{T}$ .

Figure 1 shows an example of a dimension tree for d = 4. The labels of dimension trees lead to the corresponding matricization for each node which we define as in [9, definition 3.3]:

Definition 3.3 (Matricization and vectorization). Let $\boldsymbol{\phi }\in {\mathbb{R}}^{{n}_{1}{\times}\cdots {\times}{n}_{d}}$ , $z\subseteq \mathcal{D}$ with z ≠ ∅, and $g{:=}\mathcal{D}{\backslash}z$ . The matricization of ϕ corresponding to z is defined as ${\boldsymbol{\phi }}^{\left(z\right)}\in {\mathbb{R}}^{{n}_{z}{\times}{n}_{g}}$ , where n_z := ∏_ℓ∈z n_ℓ and n_g := ∏_ℓ∈g n_ℓ, with ${\boldsymbol{\phi }}^{\left(z\right)}\left[{\left({i}_{j}\right)}_{j\in z},{\left({i}_{j}\right)}_{j\in g}\right]{:=}\boldsymbol{\phi }\left[{i}_{1},\dots ,{i}_{d}\right]$ for all $i={\left({i}_{j}\right)}_{j\in \mathcal{D}}$ . In particular, ${\boldsymbol{\phi }}^{\left(\mathcal{D}\right)}\in {\mathbb{R}}^{{n}_{1}\cdots {n}_{d}}$ holds, which can also be interpreted as the vectorization of ϕ .

A matricization can be interpreted as an unfolding of the tensor as illustrated in figure 2. Based on the concept of matricizations the hierarchical Tucker rank is defined accordingly to [9, definition 3.4]:

Definition 3.4 (Hierarchical Tucker rank). Let $\boldsymbol{\phi }\in {\mathbb{R}}^{{n}_{1}{\times}\cdots {\times}{n}_{d}}$ and $\mathcal{T}$ be a dimension tree. The hierarchical Tucker rank of ϕ is defined as ${\text{rank}}_{\mathcal{T}}\left(\boldsymbol{\phi }\right){:=}{\left({r}_{z}\right)}_{z\in \mathcal{T}}$ , where r_z := rank( ϕ ^(z)) denotes the matrix rank of the matricization ϕ ^(z) for all $z\in \mathcal{T}$ .

The set of tensors with hierarchical Tucker rank node-wise bounded by ${\left({r}_{z}\right)}_{z\in \mathcal{T}}$ is defined as $\mathcal{H}-Tucker\left(\mathcal{T},{\left({r}_{z}\right)}_{z\in \mathcal{T}}\right){:=}\left\{\boldsymbol{\gamma }\in {\mathbb{R}}^{{n}_{1}{\times}\cdots {\times}{n}_{d}}\vert \text{rank}\left({\boldsymbol{\gamma }}^{\left(z\right)}\right){\leqslant}{r}_{z}\quad \text{for}\enspace \text{all}\enspace z\in \mathcal{T}\right\}$ .

Using the dimension tree, the concept of matricization, and the hierarchical Tucker rank, one can define the representation of a tensor within the hierarchical Tucker format, cf. [9, definition 3.6]. The memory required for a hierarchical Tucker representation, with dimension tree $\mathcal{T}$ and representation rank ${\left({r}_{z}\right)}_{z\in \mathcal{T}}$ , of a tensor $\boldsymbol{\phi }\in {\mathbb{R}}^{{n}_{1}{\times}\cdots {\times}{n}_{d}}$ for $n={\mathrm{max}}_{\ell \in \mathcal{D}}\enspace {n}_{\ell }$ and $r={\mathrm{max}}_{z\in \mathcal{T}}\enspace {r}_{z}$ is given by $\mathcal{O}\left(rdn+{r}^{3}d\right)$ , cf. [9, lemma 3.7]. The existence of a truncation method of a low-rank tensor $\boldsymbol{\phi }\in \mathcal{H}-Tucker\left(\mathcal{T},\enspace {\left({r}_{z}\right)}_{z\in \mathcal{T}}\right)$ down to lower rank ${\left({\tilde {r}}_{z}\right)}_{z\in \mathcal{T}}$ with an arithmetic cost in $\mathcal{O}\left({r}^{2}dn+{r}^{4}d\right)$ was proven in [9]. The resulting approximation $\tilde {\boldsymbol{\phi }}{:=}\text{truncate}\left(\boldsymbol{\phi }\right)\in \mathcal{H}-Tucker\left(\mathcal{T},{\left({\tilde {r}}_{z}\right)}_{z\in \mathcal{T}}\right)$ fulfills the quasi-optimal error estimation

$\begin{equation*}{\Vert}\boldsymbol{\phi }-\tilde {\boldsymbol{\phi }}{\Vert}{\leqslant}\sqrt{2d-3}\underset{\boldsymbol{\gamma }\in \mathcal{H}-Tucker\left(\mathcal{T},{\left({\tilde {r}}_{z}\right)}_{z\in \mathcal{T}}\right)}{\mathrm{inf}}{\Vert}\boldsymbol{\phi }-\boldsymbol{\gamma }{\Vert}.\end{equation*}$

Further, we can transfer a CP representation of a tensor vector or tensor operator with CP rank r into a hierarchical Tucker representation with rank node-wise bounded by r, cf. [13, theorem 11.17]. Following this approach, we represent the operator and the right-hand side in the hierarchical Tucker format. For solving parameter-dependent linear problems in the hierarchical Tucker format, we use the preconditioned conjugate gradients (PCG) method. In algorithm 1 the PCG method is briefly introduced similar to [21, algorithm 2].

Algorithm 1. Preconditioned conjugate gradients method with truncation.

Input: CP operator A, CP vector b, CP rank 1 preconditioner M, initial guess ϕ ₍₀₎ in the hierarchical Tucker format

Output: Approximate solution ϕ in the hierarchical Tucker format of Aϕ = b

1: ${\boldsymbol{\rho }}_{\left(0\right)}=\text{truncate}\left(\mathbf{b}-\mathbf{A}{\boldsymbol{\phi }}_{\left(0\right)}\right)$ ${\boldsymbol{\rho }}_{\left(0\right)}=\text{truncate}\left(\mathbf{b}-\mathbf{A}{\boldsymbol{\phi }}_{\left(0\right)}\right)$

2: ζ ₍₀₎ = M⁻¹ ρ ₍₀₎

3: π ₍₀₎ = ζ ₍₀₎

4: ${\boldsymbol{\theta }}_{\left(0\right)}=\text{truncate}\left(\mathbf{A}{\boldsymbol{\pi }}_{\left(0\right)}\right)$ ${\boldsymbol{\theta }}_{\left(0\right)}=\text{truncate}\left(\mathbf{A}{\boldsymbol{\pi }}_{\left(0\right)}\right)$

5: k = 0

6: while $\frac{{\Vert}{\boldsymbol{\rho }}_{\left(k\right)}{\Vert}}{{\Vert}\mathbf{b}{\Vert}}{ >}\varepsilon$ $\frac{{\Vert}{\boldsymbol{\rho }}_{\left(k\right)}{\Vert}}{{\Vert}\mathbf{b}{\Vert}}{ >}\varepsilon$ and k < k_max do

7: ${\boldsymbol{\phi }}_{\left(k+1\right)}=\text{truncate}\left({\boldsymbol{\phi }}_{\left(k\right)}+\frac{\left\langle {\boldsymbol{\rho }}_{\left(k\right)},{\boldsymbol{\pi }}_{\left(k\right)}\right\rangle }{\left\langle {\boldsymbol{\theta }}_{\left(k\right)},{\boldsymbol{\pi }}_{\left(k\right)}\right\rangle }{\boldsymbol{\pi }}_{\left(k\right)}\right)$ ${\boldsymbol{\phi }}_{\left(k+1\right)}=\text{truncate}\left({\boldsymbol{\phi }}_{\left(k\right)}+\frac{\left\langle {\boldsymbol{\rho }}_{\left(k\right)},{\boldsymbol{\pi }}_{\left(k\right)}\right\rangle }{\left\langle {\boldsymbol{\theta }}_{\left(k\right)},{\boldsymbol{\pi }}_{\left(k\right)}\right\rangle }{\boldsymbol{\pi }}_{\left(k\right)}\right)$

8: ${\boldsymbol{\rho }}_{\left(k+1\right)}=\text{truncate}\left(\mathbf{b}-\mathbf{A}{\boldsymbol{\phi }}_{\left(k+1\right)}\right)$ ${\boldsymbol{\rho }}_{\left(k+1\right)}=\text{truncate}\left(\mathbf{b}-\mathbf{A}{\boldsymbol{\phi }}_{\left(k+1\right)}\right)$

9: ζ _(k+1) = M⁻¹ ρ _(k+1)

10: ${\boldsymbol{\pi }}_{\left(k+1\right)}=\text{truncate}\left({\boldsymbol{\zeta }}_{\left(k+1\right)}-\frac{\left\langle {\boldsymbol{\theta }}_{\left(k\right)},{\boldsymbol{\zeta }}_{\left(k+1\right)}\right\rangle }{\left\langle {\boldsymbol{\theta }}_{\left(k\right)},{\boldsymbol{\pi }}_{\left(k\right)}\right\rangle }{\boldsymbol{\pi }}_{\left(k\right)}\right)$ ${\boldsymbol{\pi }}_{\left(k+1\right)}=\text{truncate}\left({\boldsymbol{\zeta }}_{\left(k+1\right)}-\frac{\left\langle {\boldsymbol{\theta }}_{\left(k\right)},{\boldsymbol{\zeta }}_{\left(k+1\right)}\right\rangle }{\left\langle {\boldsymbol{\theta }}_{\left(k\right)},{\boldsymbol{\pi }}_{\left(k\right)}\right\rangle }{\boldsymbol{\pi }}_{\left(k\right)}\right)$

11: ${\boldsymbol{\theta }}_{\left(k+1\right)}=\text{truncate}\left(\mathbf{A}{\boldsymbol{\pi }}_{\left(k+1\right)}\right)$ ${\boldsymbol{\theta }}_{\left(k+1\right)}=\text{truncate}\left(\mathbf{A}{\boldsymbol{\pi }}_{\left(k+1\right)}\right)$

12: k = k + 1

13: end while

The PCG method in algorithm 1 approximates the solution of a parameter-dependent linear system numerically within the hierarchical Tucker format if the tensor operator A is positive definite and symmetric. In [12, lemma 5] the authors proved that this algorithm converges if the truncation error ɛ is small enough. Algorithm 1 comprises additions and inner products of two tensors in hierarchical Tucker format which have an arithmetic cost in $\mathcal{O}\left(dn{r}^{2}+\mathrm{d}{r}^{4}\right)$ , application of an operator which has an arithmetic cost in $\mathcal{O}\left(d{n}^{2}r\right)$ , and evaluation of an entry of the represented tensor which has an arithmetic cost in $\mathcal{O}\left(d{r}^{3}\right)$ . Hence, for small rank r most of the operations needed for the PCG method scale linearly in the dimension d and the mode size n, thus yielding an efficient method to solve parameter-dependent linear systems using low-rank tensor formats.

This means that, if we are able to prove the existence of a low-rank representation of the operator and right-hand side of the forward EMG problem, we can compute the solution of the linear system data-sparse and fast within the hierarchical Tucker format.

Finding conditions that guarantee the existence of a low-rank approximation for a given tensor is a research topic of its own [1, 3, 22]. This goes beyond the scope of this article, and we thus assume that the solution of the parameter-dependent EMG forward problem has a low-rank approximation. This is backed up by the numerical experiments in section 7.

**Figure 2.** Visual representation of a matricization.
Download figure:
Standard image High-resolution image

4. The Bayesian inverse EMG problem

We present our first main contribution: the proof of the well-posedness of the Bayesian inverse EMG problem discussed in section 2. Note that the proof of the well-posedness is valid for any bounded conductivity σ_i that can be represented through parameters $p\in \mathcal{J}$ for any parameter space $\mathcal{J}$ . For diagonal conductivities these parameters are the diagonal entries of σ_i and $\mathcal{J}={\left[{s}_{-},{s}_{+}\right]}^{3}$ . In the more general case of space-dependent intracellular conductivities, the parameters can be chosen as the coefficients of a Karhun–Loève expansion of σ_i(x, ω), see, e.g., [4, 18]. The following proof thus holds for both, space-independent and space-dependent, conductivities.

First we prove the existence of the posterior distribution ρ^EMG of parameters p given measurements ${\phi }_{\text{EMG}}^{\text{meas}}$ for a prior ρ using the infinite-dimensional version of Bayes' theorem for inverse problems [4, theorem 3.4].

Theorem 4.1 (Bayes' theorem for our inverse EMG problem). Let ${\mathbb{Q}}_{0}$ and ${\mathbb{Q}}_{p}$ denote the measures with distribution $\mathcal{N}\left(0,{\Xi}\right)$ and $\mathcal{N}\left({\mathcal{G}}_{\mathbf{x}}\left(p\right),{\Xi}\right)$ . Then,

B.1
The scaling factor $Z{:=}{\int }_{\mathcal{J}}\mathrm{exp}\left(-{\Phi}\left(p;{\phi }_{\text{EMG}}^{\text{meas}}\right)\right)\rho \left(\mathrm{d}p\right)$ is positive ${\mathbb{Q}}_{0}$ -almost surely,
B.2
The potential ${\Phi}:\mathcal{J}{\times}{\mathbb{R}}^{M}\to \mathbb{R}$ , as defined in (7), is ν₀-measurable with product measure ${\nu }_{0}\left(\mathrm{d}p,\mathrm{d}\phi \right){:=}\rho \left(\mathrm{d}p\right){\mathbb{Q}}_{0}\left(\mathrm{d}\phi \right)$ ,
B.3
For ${\phi }_{\text{EMG}}^{\text{meas}}$ the conditional distribution ρ^EMG exists, ρ^EMG is absolutely continuous with respect to ρ, and
$\begin{equation*}\frac{\mathrm{d}{\rho }^{\text{EMG}}}{\mathrm{d}\rho }\left(p\right)=\frac{1}{Z}\enspace \mathrm{exp}\left(-{\Phi}\left(p;{\phi }_{\text{EMG}}^{\text{meas}}\right)\right)\end{equation*}$
ν-almost surely with the product measure $\nu \left(\mathrm{d}p,\mathrm{d}\phi \right){:=}\rho \left(\mathrm{d}p\right){\mathbb{Q}}_{p}\left(\mathrm{d}\phi \right)$ .

To prove the above theorem, we need the boundedness and Lipschitz continuity of the observation operator as stated in the following lemma:

Lemma 4.2. The observation operator is bounded and Lipschitz continuous with respect to p, i.e., there exist constants 0 < C, L_p < ∞ such that

$\begin{equation}{\Vert}{\mathcal{G}}_{\mathbf{x}}\left(p\right){{\Vert}}_{{\mathbb{R}}^{M}}{\leqslant}C\end{equation} \tag{ 10 }$

$\begin{equation}{\Vert}{\mathcal{G}}_{\mathbf{x}}\left({p}_{1}\right)-{\mathcal{G}}_{\mathbf{x}}\left({p}_{2}\right){{\Vert}}_{{\mathbb{R}}^{M}}{\leqslant}{L}_{p}{\Vert}{p}_{1}-{p}_{2}{{\Vert}}_{\infty }\end{equation} \tag{ 11 }$

for all $p,{p}_{1},{p}_{2}\in \mathcal{J}$ .

The proof consists of basic calculations and estimations on the weak form of the deterministic EMG forward problem and is thus left to the reader.

Proof of theorem 4.1. The proof is based on the proof of the measurability of the potential Φ. Since B.1 and B.2 are the assumptions required for the Bayes theorem in [4, theorem 3.4] to hold, B.3 follows directly once B.1 and B.2 are proven. As the ν₀-measurability of Φ, meaning that Φ is ρ-measurable in p and ${\mathbb{Q}}_{0}$ -measurable in ${\phi }_{\text{EMG}}^{\text{meas}}$ , follows from the Lipschitz continuity of the corresponding mappings, we show that

(a)
Φ is Lipschitz continuous with respect to p and
(b)
Φ is Lipschitz continuous with respect to ${\phi }_{\text{EMG}}^{\text{meas}}$ .

Note that we also need the Lipschitz continuity of Φ to prove that the posterior depends continuously on the measurement data in theorem 4.4. For ease of notations, we introduce the shorthand ${\langle u,v\rangle }_{{\Xi}}{:=}\langle {{\Xi}}^{-\frac{1}{2}}u,{{\Xi}}^{-\frac{1}{2}}v\rangle$ for $u,v\in {\mathbb{R}}^{M}$ and neglect the second argument of the potential Φ.

(a)
Let ${p}_{1},{p}_{2}\in \mathcal{J}$ with p₁ ≠ p₂, and (TI) and (HI) denote the triangle and Hölder's inequality. Using lemma 4.2, we have
$\begin{align*}\hfill \vert {\Phi}\left({p}_{1}\right)-{\Phi}\left({p}_{2}\right)\vert & =\frac{1}{2}\left\vert {\langle \mathcal{G}\left({p}_{1}\right),\mathcal{G}\left({p}_{1}\right)\rangle }_{{\Xi}}-{\langle \mathcal{G}\left({p}_{2}\right),\mathcal{G}\left({p}_{2}\right)\rangle }_{{\Xi}}\right.\hfill \\ \hfill & \quad +\left.2{\langle {\phi }_{\text{EMG}}^{\text{meas}},\mathcal{G}\left({p}_{2}\right)-\mathcal{G}\left({p}_{1}\right)\rangle }_{{\Xi}}\right\vert \hfill \\ \hfill & \underset{\left(\text{HI}\right)}{\left(\text{TI}\right){{\leqslant}}}\frac{1}{2}\left({\Vert}\mathcal{G}\left({p}_{1}\right){{\Vert}}_{{\Xi}}{\Vert}\mathcal{G}\left({p}_{1}\right)-\mathcal{G}\left({p}_{2}\right){{\Vert}}_{{\Xi}}\right.\hfill \\ \hfill & \quad +\left.{\Vert}\mathcal{G}\left({p}_{1}\right)-\mathcal{G}\left({p}_{2}\right){{\Vert}}_{{\Xi}}{\Vert}\mathcal{G}\left({p}_{2}\right){{\Vert}}_{{\Xi}}\right)\hfill \\ \hfill & \quad +{\Vert}{\phi }_{\text{EMG}}^{\text{meas}}{{\Vert}}_{{\Xi}}{\Vert}\mathcal{G}\left({p}_{2}\right)-\mathcal{G}\left({p}_{1}\right){{\Vert}}_{{\Xi}}\hfill \\ \hfill & \left(10\right){{\leqslant}}C{\Vert}\mathcal{G}\left({p}_{2}\right)-\mathcal{G}\left({p}_{1}\right){{\Vert}}_{{\mathbb{R}}^{M}}\left(11\right){{\leqslant}}\enspace C{L}_{p}{\Vert}{p}_{1}-{p}_{2}{{\Vert}}_{\infty }.\hfill \end{align*}$
(b)
For ${\phi }_{1},{\phi }_{2}\in {\mathbb{R}}^{M}$ with ϕ₁ ≠ ϕ₂ we express the norms in the definition of Φ as scalar products obtaining
$\begin{align}\hfill \vert {\Phi}\left(p,{\phi }_{1}\right)-{\Phi}\left(p,{\phi }_{2}\right)\vert & =\frac{1}{2}\left\vert {\Vert}{\phi }_{1}-\mathcal{G}\left(p\right){{\Vert}}_{{\Xi}}^{2}-{\Vert}{\phi }_{1}{{\Vert}}_{{\Xi}}^{2}-{\Vert}{\phi }_{2}-\mathcal{G}\left(p\right){{\Vert}}_{{\Xi}}^{2}+{\Vert}{\phi }_{2}{{\Vert}}_{{\Xi}}^{2}\right\vert \hfill \\ \hfill & =\left\vert {\langle \left({\phi }_{2}-{\phi }_{1}\right),\mathcal{G}\left(p\right)\rangle }_{{\Xi}}\right\vert \left(\text{HI}\right){{\leqslant}}{\Vert}{\phi }_{2}-{\phi }_{1}{{\Vert}}_{{\Xi}}{\Vert}\mathcal{G}\left(p\right){{\Vert}}_{{\Xi}}\hfill \\ \hfill & \left(10\right){{\leqslant}}\underset{{=:}{L}_{\phi }}{\underbrace{C{\Vert}{\Xi}{{\Vert}}_{\infty }^{2}}}{\Vert}{\phi }_{1}-{\phi }_{2}{{\Vert}}_{{\mathbb{R}}^{M}}.\hfill \end{align} \tag{ 12 }$

This concludes the proof. □

The well-posedness of the Bayesian inverse EMG problem also includes the continuity of the posterior ρ^EMG with respect to the data ${\phi }_{\text{EMG}}^{\text{meas}}$ . Therefore, we need to define a metric on the space of measures. Similar to [4, 18] we choose the Hellinger metric.

Definition 4.3 (Hellinger metric). Let μ₁ and μ₂ denote two probability measures that are absolutely continuous with respect to a probability measure ζ. The Hellinger metric of μ₁ and μ₂ is then defined as

$\begin{equation*}{d}_{\text{Hell}}\left({\mu }_{1},{\mu }_{2}\right){:=}{\left(\frac{1}{2}\int {\left(\sqrt{\frac{\mathrm{d}{\mu }_{1}}{\mathrm{d}\zeta }}-\sqrt{\frac{\mathrm{d}{\mu }_{2}}{\mathrm{d}\zeta }}\right)}^{2}\enspace \mathrm{d}\zeta \right)}^{\frac{1}{2}}.\end{equation*}$

With the help of the Hellinger metric we now prove the Lipschitz continuity of the posterior ρ^EMG with respect to measured EMG data.

Theorem 4.4. Let ρ^EMG denote the solution of our Bayesian inverse EMG problem given by theorem 4.1. Then ρ^EMG depends Lipschitz continuously on the measured data ${\phi }_{\text{EMG}}^{\text{meas}}$ with respect to the Hellinger metric. This means there exists a positive constant L > 0 such that

$\begin{equation}{d}_{\text{Hell}}\left({\rho }_{1}^{\text{EMG}},{\rho }_{2}^{\text{EMG}}\right){\leqslant}L{\Vert}{\phi }_{1}-{\phi }_{2}{{\Vert}}_{{\Xi}}\end{equation} \tag{ 13 }$

holds for all ${\phi }_{1},{\phi }_{2}\in {\mathbb{R}}^{M}$ and the posterior distributions ${\rho }_{1}^{\text{EMG}}$ and ${\rho }_{2}^{\text{EMG}}$ of σ given ϕ₁ and ϕ₂.

To prove the above theorem, we need the following lemma:

Lemma 4.5. The scaling factor $Z\left(\phi \right)={\int }_{\mathcal{J}}\mathrm{exp}\left(-{\Phi}\left(p,\phi \right)\right)\enspace \mathrm{d}\rho \left(p\right)$ is Lipschitz continuous in ϕ, i.e., there exists a constant L_Z > 0 such that

$\begin{equation}\vert Z\left({\phi }_{1}\right)-Z\left({\phi }_{2}\right)\vert {\leqslant}{L}_{\text{Z}}{\Vert}{\phi }_{1}-{\phi }_{2}{{\Vert}}_{{\Xi}}\end{equation} \tag{ 14 }$

holds for all ${\phi }_{1},{\phi }_{2}\in {\mathbb{R}}^{M}$ with ϕ₁ ≠ ϕ₂.

The statement follows from lemma 4.2 with basic calculations and estimations and is thus left to the reader.

Proof of theorem 4.4. Let ${\rho }_{1}^{\text{EMG}},{\rho }_{2}^{\text{EMG}}$ denote the solutions of the Bayesian inverse EMG problem for given measurements ϕ₁ ≠ ϕ₂. For simplicity, we write Φ_j := Φ(p, ϕ_j) and Z_j := Z(ϕ_j), j = 1, 2. We estimate the Hellinger distance between the two posterior distributions using Young's inequality (YI), the Lipschitz continuity of the exponential function and the inverse of the square root on bounded domains with constants L_e and L_sqrt and lemma 4.5:

$\begin{align*}2{d}_{\text{Hell}}{\left({\rho }_{1}^{\text{EMG}},{\rho }_{2}^{\text{EMG}}\right)}^{2}& =\enspace {\int }_{\mathcal{J}}{\left[{\left(\frac{1}{{Z}_{1}}\enspace \mathrm{exp}\left(-{{\Phi}}_{1}\right)\right)}^{\frac{1}{2}}-{\left(\frac{1}{{Z}_{2}}\enspace \mathrm{exp}\left(-{{\Phi}}_{2}\right)\right)}^{\frac{1}{2}}\right]}^{2}\enspace \mathrm{d}\rho \left(p\right)\\ & \underset{\left(\mathrm{exp},{\left(\enspace \right)}^{-\frac{1}{2}}\text{Lip.}\right)}{\overset{\left(\text{YI}\right)}{{\leqslant}}}\enspace 2{\int }_{\mathcal{J}}\frac{1}{{Z}_{1}}{L}_{\text{e}}^{2}{\left\vert {{\Phi}}_{1}-{{\Phi}}_{2}\right\vert }^{2}\enspace \mathrm{d}\rho \left(p\right)\\ & \quad +2{\int }_{\mathcal{J}}{L}_{\text{sqrt}}^{2}{\left\vert {Z}_{1}-{Z}_{2}\right\vert }^{2}\enspace \mathrm{exp}\left(-{{\Phi}}_{2}\right)\enspace \mathrm{d}\rho \left(p\right)\\ & \left(14\right){{\leqslant}}\enspace 2{\int }_{\mathcal{J}}\frac{1}{{Z}_{1}}{L}_{\text{e}}^{2}{L}_{\phi }^{2}{\Vert}{\phi }_{1}-{\phi }_{2}{{\Vert}}_{{\Xi}}^{2}\enspace \mathrm{d}\rho \left(p\right)\\ & \quad +2{\int }_{\mathcal{J}}{L}_{\text{sqrt}}^{2}{L}_{\text{Z}}^{2}{\Vert}{\phi }_{1}-{\phi }_{2}{{\Vert}}_{{\Xi}}^{2}\enspace \mathrm{exp}\left(-{{\Phi}}_{2}\right)\enspace \mathrm{d}\rho \left(p\right)\\ & =\enspace 2\left({L}_{\text{e}}^{2}{L}_{\phi }^{2}\frac{1}{{Z}_{1}}+{L}_{\text{sqrt}}^{2}{L}_{\text{Z}}^{2}{Z}_{2}\right){\Vert}{\phi }_{1}-{\phi }_{2}{{\Vert}}_{{\Xi}}^{2}.\end{align*}$

As Z₁ > 0 holds, it follows that $\frac{1}{{Z}_{1}}{< }\infty$ . It thus remains to prove that Z₂ < ∞ which is a consequence of $\mathcal{G}$ being bounded and $\rho \left(\mathcal{J}\right)=1$ . The assertion follows with Lipschitz constant ${L}_{\rho }^{2}{:=}{L}_{\text{e}}^{2}{L}_{\phi }^{2}\frac{1}{{Z}_{1}}+{L}_{\text{sqrt}}^{2}{L}_{\text{Z}}^{2}{Z}_{2}$ . □

Remark 4.6. The estimate in (13) also describes the behavior of the posterior with respect to the discretization of the underlying equations.

Recapitulating theorems 4.1 and 4.4 shows that modeling the measurement error as a stochastic quantity leads to a regularization of our inverse EMG problem, see also [4].

5. Discretization and tensorization

As described in section 2, we compute the posterior distribution ρ^EMG using a Metropolis–Hastings algorithm. We obtain an approximation of the posterior by drawing a finite number of samples. Additionally, we discretize the forward operator ${\mathcal{G}}_{\mathbf{x}}$ as follows. In accordance with section 4, we show a discretization for the more general case of space-dependent intracellular conductivities and mention that this discretization simplifies slightly for the space-independent case.

With $x={\left({x}_{1},{x}_{2},{x}_{3}\right)}^{\top }\in {D}_{\text{M}}$ the left-hand side of equation (4) reads

$\begin{equation}A{\phi }_{\text{e}}{:=}\nabla \cdot \left(\left({\sigma }_{\text{i}}\left(x\right)+{\sigma }_{\text{e}}\right)\nabla {\phi }_{\text{e}}\left(x\right)\right)=\sum\limits _{j=1}^{3}\frac{\partial }{\partial {x}_{j}}\left(\left({\sigma }_{\text{i}}\left(x\right)+{\sigma }_{\text{e}}\right)\frac{\partial }{\partial {x}_{j}}{\phi }_{\text{e}}\left(x\right)\right)\end{equation} \tag{ 15 }$

and the right-hand side is given by

$\begin{equation}b{:=}-\nabla \cdot \left({\sigma }_{\text{i}}\left(x\right)\nabla {V}_{\text{m}}\left(x\right)\right)=\sum\limits _{j=1}^{3}-\frac{\partial }{\partial {x}_{j}}\left({\sigma }_{\text{i}}\left(x\right)\frac{\partial }{\partial {x}_{j}}{V}_{\text{m}}\left(x\right)\right).\end{equation} \tag{ 16 }$

Since our forward solver uses a finite difference discretization, we consider the same discretization using centered differences of second order, and therefore assume that ϕ_e ∈ C⁴(D_M) and σ_i ∈ C¹(D_M). This is reasonable under our assumptions. Our theoretical and numerical results directly generalize to, e.g., finite element discretizations of arbitrary but given muscle geometries. The practical realization is future work.

In the following we use h = (h_M, h_t, h_σ) to indicate the discretization of the muscle geometry by h_M, the time by h_t and the parameter space by h_σ. We denote the grid points by ( ${x}_{{j}_{1}}$ , ${x}_{{j}_{2}}$ , ${x}_{{j}_{3}}$ ), j_k = 0, ..., n, for $n\in \mathbb{N}$ and a discrete conductivity at grid point ( ${x}_{{j}_{1}}$ , ${x}_{{j}_{2}}$ , ${x}_{{j}_{3}}$ ) by ${\sigma }_{{j}_{1},{j}_{2},{j}_{3}}$ .

Theorem 5.1. For

$\begin{equation}B\phi {:=}\nabla \cdot \left(\sigma \left(x\right)\nabla \phi \left(x\right)\right)=\sum\limits _{j=1}^{3}\frac{\partial }{\partial {x}_{j}}\left(\sigma \left(x\right)\frac{\partial }{\partial {x}_{j}}\phi \left(x\right)\right)\end{equation} \tag{ 17 }$

a second-order consistent stencil is given by

$\begin{align*}\hfill & \left[\begin{matrix}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill \frac{{\sigma }_{j,j,j-1}+{\sigma }_{j,j,j}}{2{h}_{\text{M}}^{2}}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{matrix}\right]\;\text{in}\;\text{the}\;\text{first}\;\text{plane,}\;\text{in}\;\text{the}\;\text{second}\;\text{plane}\;\text{by}\hfill \\ \hfill & \left[\begin{matrix}\hfill 0\hfill & \hfill \frac{{\sigma }_{j,j-1,j}+{\sigma }_{j,j,j}}{2{h}_{\text{M}}^{2}}\hfill & \hfill 0\hfill \\ \hfill \frac{{\sigma }_{j-1,j,j}+{\sigma }_{j,j,j}}{2{h}_{\text{M}}^{2}} \hfill & \hfill -\frac{{\sigma }_{j-1,j,j}+{\sigma }_{j,j-1,j}+{\sigma }_{j,j,j-1}+6{\sigma }_{j,j,j}+{\sigma }_{j,j,j+1}+{\sigma }_{j,j+1,j}+{\sigma }_{j+1,j,j}}{2{h}_{\text{M}}^{2}} \hfill & \hfill \frac{{\sigma }_{j,j,j}+{\sigma }_{j+1,j,j}}{2{h}_{\text{M}}^{2}}\hfill \\ \hfill 0\hfill & \hfill \frac{{\sigma }_{j,j,j}+{\sigma }_{j,j+1,j}}{2{h}_{\text{M}}^{2}}\hfill & \hfill 0\hfill \end{matrix}\right]\hfill \\ \hfill & \text{and}\;\left[\begin{matrix}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill \frac{{\sigma }_{j,j,j}+{\sigma }_{j,j,j+1}}{2{h}_{\text{M}}^{2}}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{matrix}\right]\;\text{in}\;\text{the}\;\text{third}\;\text{plane.}\hfill \end{align*}$

Proof. Because of the Kronecker product structure of (17) the statement follows from the one-dimensional case. There, Taylor's theorem and equating the coefficients of

$\begin{align*}\hfill {\left(B\phi \right)}_{j}& =\left({\sigma }^{\prime }\left({x}_{j}\right){\phi }_{j}^{\prime }+\sigma \left({x}_{j}\right){\phi }_{j}^{{\prime\prime}}\right)\quad \;\text{and}\;\hfill \\ \hfill {\left({B}_{h}{\phi }_{h}\right)}_{j}& =\frac{1}{{h}_{\text{M}}^{2}}\left(-{\tilde {\sigma }}_{j}{\phi }_{j-1}+\left({\tilde {\sigma }}_{j}+{\tilde {\sigma }}_{j+1}\right){\phi }_{j}-{\tilde {\sigma }}_{j+1}{\phi }_{j+1}\right),\hfill \end{align*}$

yields ${\tilde {\sigma }}_{j}=\frac{{\sigma }_{j-1}+{\sigma }_{j}}{2}$ for a second-order consistent stencil given by

$\begin{equation*}\frac{1}{{h}_{\text{M}}^{2}}\left[\begin{matrix}\hfill \frac{{\sigma }_{j-1}+{\sigma }_{j}}{2}-\frac{{\sigma }_{j-1}+2{\sigma }_{j}+{\sigma }_{j+1}}{2}\frac{{\sigma }_{j}+{\sigma }_{j+1}}{2}\hfill \end{matrix}\right],\end{equation*}$

immediately finishing the proof. □

Next, we derive an affine representation of the discrete operator and prove a low-rank tensor format representation of the operator and the right-hand side of the forward EMG problem. This is our second main contribution.

Corollary 5.2. An affine representation of the discrete operator in the three-dimensional case is given by

$\begin{multline*}{c}\frac{{\sigma }_{j,j-1,j}}{{h}_{\text{M}}^{2}}{M}_{j,j-1,j}+\frac{{\sigma }_{j,j,j+1}}{{h}_{\text{M}}^{2}}{M}_{j,j,j+1}+\frac{{\sigma }_{j-1,j,j}}{{h}_{\text{M}}^{2}}{M}_{j-1,j,j}\\ \qquad +\frac{{\sigma }_{j,j,j}}{{h}_{\text{M}}^{2}}{M}_{j,j,j}+\frac{{\sigma }_{j+1,j,j}}{{h}_{\text{M}}^{2}}{M}_{j+1,j,j}+\frac{{\sigma }_{j,j,j-1}}{{h}_{\text{M}}^{2}}{M}_{j,j,j-1}+\frac{{\sigma }_{j,j+1,j}}{{h}_{\text{M}}^{2}}{M}_{j,j+1,j},\end{multline*}$

where in the first plane the stencil is given by

$\begin{align*}\hfill {M}_{j,j-1,j}^{\left(:,:,1\right)}& ={M}_{j,j,j+1}^{\left(:,:,1\right)}={M}_{j-1,j,j}^{\left(:,:,1\right)}={M}_{j,j,j}^{\left(:,:,1\right)}={M}_{j+1,j,j}^{\left(:,:,1\right)}=\left[\begin{matrix}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{matrix}\right],\hfill \\ \hfill {M}_{j,j,j-1}^{\left(:,:,1\right)}& =\left[\begin{matrix}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill \frac{1}{2}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{matrix}\right],\qquad {M}_{j,j-1,j}^{\left(:,:,1\right)}=\left[\begin{matrix}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{matrix}\right],\hfill \end{align*}$

in the second plane by

$\begin{align*}\hfill {M}_{j,j-1,j}^{\left(:,:,2\right)}& =\left[\begin{matrix}\hfill 0\hfill & \hfill \frac{1}{2}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill -\frac{1}{2}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{matrix}\right],\qquad {M}_{j,j,j+1}^{\left(:,:,2\right)}=\left[\begin{matrix}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill -\frac{1}{2}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{matrix}\right],\hfill \\ \hfill {M}_{j-1,j,j}^{\left(:,:,2\right)}& =\left[\begin{matrix}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill \frac{1}{2}\hfill & \hfill -\frac{1}{2}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{matrix}\right],\qquad {M}_{j,j,j}^{\left(:,:,2\right)}=\left[\begin{matrix}\hfill 0\hfill & \hfill \frac{1}{2}\hfill & \hfill 0\hfill \\ \hfill \frac{1}{2}\hfill & \hfill -3\hfill & \hfill \frac{1}{2}\hfill \\ \hfill 0\hfill & \hfill \frac{1}{2}\hfill & \hfill 0\hfill \end{matrix}\right],\hfill \\ \hfill {M}_{j+1,j,j}^{\left(:,:,2\right)}& =\left[\begin{matrix}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill -\frac{1}{2}\hfill & \hfill \frac{1}{2}\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{matrix}\right],\qquad {M}_{j,j,j-1}^{\left(:,:,2\right)}=\left[\begin{matrix}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill -\frac{1}{2}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{matrix}\right],\qquad {M}_{j,j-1,j}^{\left(:,:,2\right)}=\left[\begin{matrix}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill -\frac{1}{2}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill \frac{1}{2}\hfill & \hfill 0\hfill \end{matrix}\right],\hfill \end{align*}$

and in the third plane by

$\begin{align*}\hfill {M}_{j,j-1,j}^{\left(:,:,3\right)}& =\left[\begin{matrix}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{matrix}\right],\qquad {M}_{j,j,j+1}^{\left(:,:,3\right)}=\left[\begin{matrix}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill \frac{1}{2}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{matrix}\right],\hfill \\ \hfill {M}_{j-1,j,j}^{\left(:,:,3\right)}& ={M}_{j,j,j}^{\left(:,:,3\right)}={M}_{j+1,j,j}^{\left(:,:,3\right)}={M}_{j,j,j-1}^{\left(:,:,3\right)}={M}_{j,j-1,j}^{\left(:,:,3\right)}=\left[\begin{matrix}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{matrix}\right].\hfill \end{align*}$

Proof. Follows from theorem 5.1 with linearity. □

We define ${A}_{h}^{\left(0\right)}{:=}{A}_{h,{\sigma }_{\text{e}}}$ denoting the discrete operator given by theorem 5.1 for constant ${\sigma }_{\text{e}}\in {\mathbb{R}}^{3{\times}3}$ and ${A}_{h,{j}_{1},{j}_{2},{j}_{3}}$ denoting the discrete operator given by the stencil ${M}_{{j}_{1},{j}_{2},{j}_{3}}$ from corollary 5.2. Then the discrete operator of (15) is given by

$\begin{equation*}{A}_{h}{:=}{A}_{h,{\sigma }_{\text{e}}}+\sum\limits _{{j}_{1}=1}^{{m}_{1}}\sum\limits _{{j}_{2}=1}^{{m}_{2}}\sum\limits _{{j}_{3}=1}^{{m}_{3}}{\sigma }_{{j}_{1},{j}_{2},{j}_{3}}{A}_{h,{j}_{1},{j}_{2},{j}_{3}}.\end{equation*}$

Using the vectorizations $\text{vec}\left({A}_{h,{j}_{1},{j}_{2},{j}_{3}}\right){=:}{A}_{h}^{\left(k\right)}$ and $\text{vec}\left({\sigma }_{{j}_{1},{j}_{2},{j}_{3}}\right){=:}{p}^{\left(k\right)}$ , see definition 3.3, yields a parameter-dependent affine structure of the form

$\begin{equation*}{A}_{h}\left(p\right){:=}{A}_{h}^{\left(0\right)}+\sum\limits _{k=1}^{d}{p}^{\left(k\right)}{A}_{h}^{\left(k\right)}\end{equation*}$

with p := (p⁽¹⁾, ..., p^(d)), where each ${A}_{h}^{\left(k\right)}$ is constant, i.e., ${A}_{h}^{\left(k\right)}$ is parameter-independent.

We now take a closer look at the right-hand side and discretize the time variable t in (1) using equidistant time steps t_j = jh_t, j = 0, ..., t_max for time step size h_t. Multiplying this with the AP velocities u_k, k = 1, ..., N_MF, we achieve s_j = u_k t_j for the discretization of the muscle fiber coordinate s.

Furthermore, we remark that the linear dependency of the right-hand side on the intracellular conductivity is obvious under our assumptions, which include that the muscle fiber direction is one of the standard unit vectors, i.e., $\overrightarrow {d}={ \overrightarrow {e}}_{j}$ , j = 1, 2, or 3. If V_m is independent of σ_i, the structure of the right-hand side is the same as the structure of the operator. Then we see the linear structure of (16) that has the form

$\begin{equation*}{b}_{h}\left(p\right){:=}\sum\limits _{k=1}^{d}{p}^{\left(k\right)}{b}_{h}^{\left(k\right)}.\end{equation*}$

How to represent an arbitrary right-hand side in a parameter-dependent way is ongoing research.

We now discretize the parameter space by choosing a finite number of parameters ${p}_{h}{:=}\left({p}_{h}^{\left(1\right)},\dots ,{p}_{h}^{\left(\ell \right)},\dots ,{p}_{h}^{\left(d\right)}\right)$ from a discrete set ${\mathcal{J}}_{h}$ . We fix discrete values for all ${p}_{h}^{\left(\ell \right)}$ , i.e., ${p}_{h}^{\left(\ell \right)}\in \left\{{p}_{h}^{\left(\ell \right)}\left(1\right),{p}_{h}^{\left(\ell \right)}\left(2\right),\dots ,{p}_{h}^{\left(\ell \right)}\left({n}_{\ell }\right)\right\}$ , and reformulate our problem as:

$\begin{equation}\text{Solve}\enspace {A}_{h}\left({p}_{h}\right){\phi }_{h}\left({p}_{h},t\right)={b}_{h}\left({p}_{h},t\right)\quad \text{for}\enspace \text{all}\enspace {p}_{h}\in {\mathcal{J}}_{h}.\end{equation} \tag{ 18 }$

Assuming that each parameter ${p}_{h}^{\left(\ell \right)}$ can take n_ℓ different values, applying classical methods one has to solve a system of ${\prod }_{\ell =1}^{d}{n}_{\ell }\approx {n}^{d}$ linear equations. To overcome the curse of dimensionality in this case, we exploit the structure of the linear system, see section 3. We find a data-sparse representation of the problem that allows us to solve the parameter-dependent system for all ${p}_{h}\in {\mathcal{J}}_{h}$ simultaneously, analogously to [10]. For computing the solution of (18) for all possible ${p}_{h}\in {\mathcal{J}}_{h}$ , we define a large block-diagonal system with the operator

$\begin{equation*}\mathbf{A}{:=}\left(\begin{matrix}\hfill {A}_{1}^{\left(0\right)}\hfill & \hfill 0\hfill & \hfill \dots \hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill {A}_{2}^{\left(0\right)}\hfill & \hfill \ddots \hfill & \hfill {\vdots}\hfill \\ \hfill {\vdots}\hfill & \hfill \ddots \hfill & \hfill \ddots \hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill \dots \hfill & \hfill 0\hfill & \hfill {A}_{n}^{\left(0\right)}\hfill \end{matrix}\right){=:}\text{blkdiag}\left({A}_{1}^{\left(0\right)},\dots ,{A}_{n}^{\left(0\right)}\right),\end{equation*}$

where ${A}_{j}^{\left(0\right)}={A}_{h}^{\left(0\right)}+{\sum }_{\ell =1}^{d}{p}_{h}^{\left(\ell \right)}\left(j\right){A}_{h}^{\left(\ell \right)}$ denotes the jth diagonal block.

The memory requirement to store A, however, grows exponentially in n and thus, even for moderate values of d and n_ℓ, a classical representation of our problem is infeasible. Therefore, we reformulate the problem using the notation ${A}_{j}^{\left(m\right)}={\sum }_{\ell =m}^{d}{p}_{h}^{\left(\ell \right)}\left(j\right){A}_{h}^{\left(\ell \right)}$ , m = 1, ..., d, and ${\text{Id}}_{{n}_{k}}$ denoting the identity in ${\mathbb{R}}^{{n}_{k}{\times}{n}_{k}}$ , and achieve:

$\begin{align*}\hfill \mathbf{A}& =\text{blkdiag}\left({A}_{h}^{\left(0\right)}+{A}_{1}^{\left(1\right)},{A}_{h}^{\left(0\right)}+{A}_{2}^{\left(1\right)},\dots ,{A}_{h}^{\left(0\right)}+{A}_{n}^{\left(1\right)}\right)\hfill \\ \hfill & =\text{blkdiag}\left({A}_{h}^{\left(0\right)},{A}_{h}^{\left(0\right)},\dots ,{A}_{h}^{\left(0\right)}\right)\hfill \\ \hfill & \quad +\text{blkdiag}\left({p}_{h}^{\left(1\right)}\left(1\right){A}_{h}^{\left(1\right)},{p}_{h}^{\left(1\right)}\left(2\right){A}_{h}^{\left(1\right)},\dots ,{p}_{h}^{\left(1\right)}\left({n}_{1}\right){A}_{h}^{\left(1\right)}\right)\hfill \\ \hfill & \quad +\text{blkdiag}\left({A}_{1}^{\left(2\right)},{A}_{2}^{\left(2\right)},\dots ,{A}_{n}^{\left(2\right)}\right)\hfill \\ \hfill & ={\text{Id}}_{{n}_{d}}\otimes \cdots \otimes {\text{Id}}_{{n}_{2}}\otimes {\text{Id}}_{{n}_{1}}\otimes {A}^{\left(0\right)}\hfill \\ \hfill & \quad +{\text{Id}}_{{n}_{d}}\otimes \cdots \otimes {\text{Id}}_{{n}_{2}}\otimes \mathrm{diag}\left({p}_{h}^{\left(1\right)}\right)\otimes {A}_{h}^{\left(1\right)}\hfill \\ \hfill & \quad +\cdots +\mathrm{diag}\left({p}_{h}^{\left(d\right)}\right)\otimes \cdots \otimes {\text{Id}}_{{n}_{2}}\otimes {\text{Id}}_{{n}_{1}}\otimes {A}_{h}^{\left(d\right)}.\hfill \end{align*}$

This leads to the following data-sparse CP representation of the operator

$\begin{equation*}\mathbf{A}=\sum\limits _{k=0}^{d}\underset{\ell =0}{\overset{d}{\bigotimes}}{A}_{h}^{\left(k\right)}\left(\ell \right)\quad \text{where}\enspace {A}_{h}^{\left(k\right)}\left(\ell \right)=\begin{cases}_{h}^{\left(\ell \right)}\hfill & \text{if}\enspace \ell =d,\hfill \\ \mathrm{diag}\left({p}_{h}^{\left(\ell \right)}\right)\hfill & \text{if}\enspace \ell +k=d\enspace \text{and}\enspace k\ne 0,\hfill \\ {\text{Id}}_{{n}_{d-k}}\hfill & \text{otherwise}\hfill \end{cases}\end{equation*}$

with discrete parameters ${p}_{h}^{\left(\ell \right)}=\left({p}_{h}^{\left(\ell \right)}\left(1\right),\dots ,{p}_{h}^{\left(\ell \right)}\left({n}_{\ell }\right)\right)$ . Similar results can be obtained for the right-hand side.

Concluding, we represent the operator and the right-hand side of (18) exactly using low-rank tensor formats. Further, we approximate the solution of (18) using the hierarchical Tucker format in algorithm 1.

6. The tensorized Metropolis–Hastings algorithm

Having proved the theory for our Bayesian inverse EMG problem and a low-rank tensor representation of the operator and right-hand side of the discrete forward EMG problem, we now derive our final main contribution: A fast tensorized Metropolis–Hastings algorithm. Therefore, we combine the precomputation of the forward EMG problem described in section 2 for all parameters simultaneously using the hierarchical Tucker format and algorithm 1 with the Metropolis–Hastings sampling, as shown in algorithm 2.

Algorithm 2. Tensorized Metropolis–Hastings.

Input: Starting point p_h,(1) for the Markov chain, sampling radius δ

Output: A Markov chain p_h

1: Precompute $\mathcal{G}\left({p}_{h}\right)$ $\mathcal{G}\left({p}_{h}\right)$ for all ${p}_{h}\in {\mathcal{J}}_{h}$ ${p}_{h}\in {\mathcal{J}}_{h}$ using tensor formats

2: for j = 1, ..., J − 1 do

3: Propose ${\tilde {p}}_{h}\sim \mathcal{U}\left(\left[{p}_{h,\left(j\right)}-\delta ,{p}_{h,\left(j\right)}+\delta \right]\cap {\mathcal{J}}_{h}\right)$ ${\tilde {p}}_{h}\sim \mathcal{U}\left(\left[{p}_{h,\left(j\right)}-\delta ,{p}_{h,\left(j\right)}+\delta \right]\cap {\mathcal{J}}_{h}\right)$ independent of p_h,(j)

4: Draw $c\sim \mathcal{U}\left(0,1\right)$ $c\sim \mathcal{U}\left(0,1\right)$

5: if $c{\leqslant}a\left(\mathcal{G}\left({p}_{h,\left(j\right)}\right),\mathcal{G}\left({\tilde {p}}_{h}\right)\right)$ $c{\leqslant}a\left(\mathcal{G}\left({p}_{h,\left(j\right)}\right),\mathcal{G}\left({\tilde {p}}_{h}\right)\right)$ then

6: ${p}_{h,\left(j+1\right)}={\tilde {p}}_{h}$ ${p}_{h,\left(j+1\right)}={\tilde {p}}_{h}$

7: else

8: p_h,(j+1) = p_h,(j)

9: end if

10: end for

To be more precise, we first choose a fixed number of samples $J\in \mathbb{N}$ that have to be drawn during the sampling process. We then precompute the solution of the parameter-dependent forward EMG problem on a discrete set ${\mathcal{J}}_{h}$ in the hierarchical Tucker format using the PCG method from algorithm 1 and store the data-sparse solution. Recall that storing the solution of the parameter-dependent problem for all parameters is only feasible within data-sparse formats like the hierarchical Tucker format.

Doing so enables us to evaluate the precomputed tensor solution with arithmetic cost in $\mathcal{O}\left(nd{r}^{3}\right)$ and evaluate this solution fast instead of solving the discretized forward EMG problem in every iteration in line 5 of the algorithm. Note that we draw new samples ${\tilde {p}}_{h}$ uniformly from an interval with radius δ around the last accepted sample intersected with the discrete set ${\mathcal{J}}_{h}$ to account for the local behavior of the potential Φ and to accelerate convergence.

We assume that the cost of drawing one sample from the posterior distribution equals the solution time T_s of the discretized forward EMG problem for the standard Metropolis–Hastings algorithm and the evaluation time T_e of the precomputed tensor solution for the tensorized Metropolis–Hastings algorithm. Thus, the runtime of the standard Metropolis–Hastings algorithm is JT_s, while the runtime of the tensorized algorithm (TA) is the sum of the precomputation time T_p and the evaluation times, i.e., T_p + JT_e. We notice that asymptotically the speedup $\frac{J{T}_{\text{s}}}{{T}_{\text{p}}+J{T}_{\text{e}}}$ is limited by $\frac{{T}_{\text{s}}}{{T}_{\text{e}}}$ for J → ∞.

Based on our mathematical theory we expect that the Markov chains constructed by both algorithms behave similarly. This is due to the fact that we exactly represent the operator and the right-hand side of the forward EMG problem for all discrete parameter combinations within the hierarchical Tucker format. Additionally, we compute the tensor solution using algorithm 1 with specified truncation accuracy, resulting in an error-controlled approximation.

7. Numerical experiments

We illustrate our method for the inverse EMG problem with numerical experiments. We conduct all experiments in Matlab using the KerMor framework⁴ and the htucker toolbox [20]. Throughout our experiments we use the following default settings.

The geometry that we use is a muscle cuboid of size 4 cm × 2 cm × 1 cm that is equipped with 30 × 30 muscle fibers. The muscle geometry is discretized using the grid size ${h}_{\text{M}}=\frac{1}{3}$ while the muscle fibers are discretized using 30 grid points, and we use 101 time steps. We fix the extracellular conductivity at σ_e = diag(6.7, 6.7, 6.7). As reference conductivity we choose p^ref = (0.893, 8.930, 0.893), i.e., the muscle fiber direction is the second unit vector and the muscle fibers are aligned parallel to the second coordinate axis. We allow the muscle fiber direction to be one of the three unit vectors. As upper bound on the conductivity we define s₊ = 10 and s₋ = 0.001 as lower bound which we also set as the discretization step size in the parameter space, i.e., h_σ = s₋.

For computing the tensor solution of (18), we use algorithm 1. There we set k_max = 15, ɛ = 1 × 10⁻⁴ and we truncate to a relative accuracy of 1 × 10⁻⁶. As preconditioner we define $\mathbf{M}{:=}{\text{Id}}_{{n}_{d}}\otimes \cdots \otimes {\text{Id}}_{{n}_{1}}\otimes {A}_{h}^{\left(0\right)}$ , since we observed similar convergence behavior and similar runtimes of the algorithm independent of the chosen low-rank tensor preconditioner, see, e.g., [21], in former experiments. We compute the tensor solution on a suitable conductivity grid with grid size h_σ and ${A}_{h}^{\left(0\right)}$ using the conductivity at the midpoint of that grid. For handling the time-dependency in the right-hand side, we solve the corresponding linear system for all time steps simultaneously. This leads to a tensor of size 364 × 101 × 4000 × 4001 × 4000.

For sampling from the posterior distribution of intracellular conductivity given EMG measurements, we use algorithm 2. There we set the total number of samples to 500 000 and use Gaussian noise with ξ = 2.0. The algorithm draws a conductivity proposal in a sampling radius δ = 1.5 around the last accepted sample. As default we draw the initial guess from a uniform distribution on an interval with radius δ around the reference solution, and we discard the first 200 samples as burn-in. These choices proved reasonable in our parameter studies. Additionally, we modify the algorithm such that it also samples the muscle fiber direction as one of the unit vectors.

We call algorithm 2 using the Matlab build-in QR decomposition to solve the forward problem for the proposed conductivity in each iteration the SA, and we call algorithm 2 using the precomputed tensor solution the TA.

**Figure 3.** Relative singular values of the corresponding matricization of the low-rank solution of the forward EMG problem.
Download figure:
Standard image High-resolution image

7.1. Rank of the hierarchical Tucker format solution

In our first numerical experiment, we examine the hierarchical Tucker rank, see definition 3.4, of the tensor solution of the linear system to support our assumption that the solution is well approximated with low rank. Further, a small rank is important for efficient arithmetic operations as some of these operations in low-rank tensor formats scale in $\mathcal{O}\left({r}^{4}\right)$ , see section 3. Therefore, in figure 3 we show a logarithmic-linear plot of the relative singular values for the corresponding matricizations of the solution of the forward problem using our default setting. We observe that the rank of the matricization remains smaller than 6 in the parameter space, i.e., the rank of the matricizations corresponding to {3}, {4}, and {5}. We also see that the rank of the matricization corresponding to {2} is 55 while the rank of the matricization corresponding to {1} is 343. We expect that the matricization corresponding to {1} has full rank since this separates the spatial dimension, i.e., {1}, and the time dimension, i.e., {2}, and since each time step yields its own right-hand side. Using tensor formats, we reduce the theoretical storage cost of the full tensor from 1.88 × 10¹⁰ MB(≈18 800 000 GB) to 4.41 MB counting the storage cost for 1 entry as 64 bit.

7.2. Comparison of the tensorized algorithm and the standard algorithm

For the validation of our TA, we compare its statistical behavior to the standard Metropolis–Hastings algorithm. We run both algorithms in our default setting for the reference conductivities ${p}_{1}^{\text{ref}}=\left(0.893,8.930,0.893\right)$ and ${p}_{2}^{\text{ref}}=\left(0.893,0.893,8.930\right)$ .

We present the acceptance rates $\frac{\#\text{samples}\;\text{acc.}}{\#\text{samples}\;\text{drawn}}$ , the mean absolute deviations (MADs) $\frac{1}{\#\text{samples}\;\text{acc.}}{\sum }_{k=1}^{\#\text{samples}\;\text{acc.}}\vert {p}_{\left(k\right)}-\bar{p}\vert$ and variance $\frac{1}{\#\text{samples}\;\text{acc.}-1}{\sum }_{k=1}^{\#\text{samples}\;\text{acc.}}{\left({p}_{\left(k\right)}-\bar{p}\right)}^{2}$ of the accepted diagonal entries of the conductivities in table 1. For the reference values ${p}_{1}^{\text{ref}}$ and ${p}_{2}^{\text{ref}}$ we observe that both methods have similar acceptance rates. We further notice that the SA and TA have a comparable reliability, i.e., comparable MAD and variance.

Table 1. Comparison of the SA and the TA with 500 000 drawn samples, Gaussian noise with ξ = 2.0, and sampling radius δ = 1.5.

	${p}_{1}^{\text{ref}}$		${p}_{2}^{\text{ref}}$
	SA	TA	SA	TA
Acceptance rate (%)	5.39	5.38	5.79	5.79
MAD(p(1))	0.86	0.86	0.50	0.50
MAD(p(2))	0.44	0.44	0.24	0.24
MAD(p(3))	0.17	0.17	0.55	0.55
Var(p(1))	1.05	1.05	0.38	0.38
Var(p(2))	0.29	0.29	0.09	0.09
Var(p(3))	0.04	0.04	0.44	0.44

We conclude that the sampling process of both algorithms is similar and that our tensor approach is therefore a promising ansatz to accelerate the SA if the discretization error of the forward problem is small. In this case, we furthermore reason that our results indicate that the tensor solution of the forward EMG problem is indeed a good approximation to the solution that we obtain using the Matlab build-in QR decomposition. We highlight that these results are in line with our theoretical findings from section 5.

7.3. Speedup tests

First, we examine the speedup $\frac{\text{runtime}\;\text{SA}}{\text{runtime}\;\text{TA}}$ of our tensor method compared to the standard method for fixed discretization grid size h_M and varying number of samples. Therefore, we run both algorithms in the default setting for 125 samples and double the number of samples until we reach 128 000 samples. We present the speedup of the TA compared to the SA in figure 4.

**Figure 4.** Speedup of the TA compared to the SA for fixed grid size and a varying number of samples.
Download figure:
Standard image High-resolution image

We observe that the speedup curve grows steadily and flattens as the number of samples increases. This is due to the fact that the influence of the precomputation time of the TA, which is ${\bar{T}}_{\text{p}}\approx 13.71$ s on average, decreases with growing number of samples. As mentioned in section 6, the speedup is bounded by the quotient $\frac{{T}_{\text{s}}}{{T}_{\text{e}}}$ . We insert the average time ${\bar{T}}_{\text{s}}\approx 0.14\enspace \text{s}$ needed for one sample using the SA and the average time ${\bar{T}}_{\text{e}}\approx 0.0037\text{s}$ needed for one sample using the TA and obtain an upper bound of 39.60 for the speedup. For 128 000 samples the speedup is 36.77, which corresponds to a runtime of 5.27 h using the SA, compared to 0.14 h(≈8.59 min) using the TA.

Further, we run both algorithms in the default setting for grid sizes ${h}_{\text{M}}=\frac{1}{3},\frac{1}{6},\frac{1}{9},\frac{1}{12}$ . Furthermore, to reduce the overall computation time, we reduce the number of samples to 100. We use our findings from section 6 to extrapolate the measured sampling times to 100 000 samples. To be more precise, we first compute the average time for drawing one sample with both algorithms, then scale this number by 100 000 to achieve estimates on T_e for the TA and T_s for the SA and add the measured precomputation time T_p for the TA. Note that the precomputation time is independent of the number of samples. Figure 5 shows the speedup resulting from this extrapolation.

As expected, we observe that the speedup in figure 5 grows steadily and is unbounded in contrast to the speedup for fixed grid size and increasing number of samples. For ${h}_{\text{M}}=\frac{1}{12}$ we observe a speedup of 650.81 which corresponds to a runtime of 2.87 h for the TA compared to a runtime of 1864.91 h(≈77.70 d) for the SA.

We expect that the TA outperforms the SA for realistic muscle geometries or fine grid sizes. Furthermore, we conclude that using the TA enables us to solve problems that are infeasible to solve using the SA, in reasonable time.

8. Related work

Surface EMG signals have been used to localize the innervation zones of skeletal muscle, see, e.g., [32] and the references therein. Furthermore, researchers are interested in denoising surface EMG signals, i.e., in reducing crosstalk of neighboring muscles or neighboring muscle regions, see [23]. In [24], regularization methods for inverse problems are used to reduce crosstalk in surface EMG signals.

To overcome the ill-posedness of inverse problems, regularization methods like the Tikhonov regularization are a widely used ansatz, see, e.g., [7] and the references therein. The Tikhonov regularization was used in [8, 33] to reconstruct the electrical conductivity of biological tissue from EMG measurements. Moreover, in [33] model order reduction was used to accelerate the computations.

For other Bayesian inverse problems different approaches to speedup the sampling process have been examined, e.g., in [18] the authors used a method based on polynomial chaos expansions to construct a surrogate of the forward problem. In [29] quasi Monte Carlo methods and multilevel Monte Carlo methods were used to accelerate the convergence of the sampling algorithm.

Furthermore, low-rank tensor methods were examined in the context of Bayesian inverse problems. In [5] low-rank tensor formats were used to compute a surrogate of the target distribution. There the authors directly approximated the target distribution using a generalization of the cross approximation. In [6] the authors used low-rank tensor formats to approximate the stochastic Galerkin solution of the parameter-dependent forward problem to achieve a discrete representation of the posterior distribution.

9. Conclusion

Applying mathematical theory results in an efficient algorithm to solve the Bayesian inverse EMG problem. Proving the well-posedness of the Bayesian inverse EMG problem guarantees the convergence of this algorithm. Further, proving a data-sparse representation of the forward EMG problem allows for the efficient precomputation of the parameter-dependent forward solution. The presented numerical experiments support this mathematical theory but also indicate that a high number of samples is required to obtain accurate results. The sampling algorithm which uses the data-sparse representation of the forward EMG problem computes this high number of samples in a reasonable time.

The mathematical theory of the Bayesian inverse problem holds for general symmetric positive definite conductivities and thus for arbitrary muscle fiber directions. The low-rank representation of the forward EMG problem, however, holds for fixed muscle fiber directions only. In our numerical experiments the speedup using tensor methods enables solving problems with grid sizes that are infeasible using classical methods. Therefore, future work is the generalization of the low-rank representation of the right-hand side of the forward EMG problem to arbitrary muscle fiber directions. This generalization could enable the computation of realistic problems in medical applications and lead to a non-invasive and radiation-free imaging method.

Acknowledgments

We thank Maren Klever for her critical reading of and suggestions for this article. We thank the anonymous referees for helping to improve the article by their suggestions. This research was partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy—EXC-2075—390740016. L. Grasedyck and T. A. Werthmann have been supported by the DFG within the DFG priority program 1886 (SPPPoly) under Grant No. GR-3179/5-1.

Bayesian inversion for electromyography using low-rank tensor formats

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. The Bayesian inverse electromyographic problem

3. Low-rank tensor formats

4. The Bayesian inverse EMG problem

5. Discretization and tensorization

6. The tensorized Metropolis–Hastings algorithm

7. Numerical experiments

7.1. Rank of the hierarchical Tucker format solution

7.2. Comparison of the tensorized algorithm and the standard algorithm

7.3. Speedup tests

8. Related work

9. Conclusion

Acknowledgments

Footnotes

Bayesian inversion for electromyography using low-rank tensor formats

Article metrics

Submit

Permissions

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. The Bayesian inverse electromyographic problem

3. Low-rank tensor formats

4. The Bayesian inverse EMG problem

5. Discretization and tensorization

6. The tensorized Metropolis–Hastings algorithm

7. Numerical experiments

7.1. Rank of the hierarchical Tucker format solution

7.2. Comparison of the tensorized algorithm and the standard algorithm

7.3. Speedup tests

8. Related work

9. Conclusion

Acknowledgments

Footnotes