Brought to you by:
Paper

Bayesian inversion for electromyography using low-rank tensor formats

, , and

Published 31 March 2021 © 2021 IOP Publishing Ltd
, , Citation Anna Rörich et al 2021 Inverse Problems 37 055003 DOI 10.1088/1361-6420/abd85a

0266-5611/37/5/055003

Abstract

The reconstruction of the structure of biological tissue using electromyographic (EMG) data is a non-invasive imaging method with diverse medical applications. Mathematically, this process is an inverse problem. Furthermore, EMG data are highly sensitive to changes in the electrical conductivity that describes the structure of the tissue. Modeling the inevitable measurement error as a stochastic quantity leads to a Bayesian approach. Solving the discretized Bayesian inverse problem means drawing samples from the posterior distribution of parameters, e.g., the conductivity, given measurement data. Using, e.g., a Metropolis–Hastings algorithm for this purpose involves solving the forward problem for different parameter combinations which requires a high computational effort. Low-rank tensor formats can reduce this effort by providing a data-sparse representation of all occurring linear systems of equations simultaneously and allow for their efficient solution. The application of Bayes' theorem proves the well-posedness of the Bayesian inverse problem. The derivation and proof of a low-rank representation of the forward problem allow for the precomputation of all solutions of this problem under certain assumptions, resulting in an efficient and theory-based sampling algorithm. Numerical experiments support the theoretical results, but also indicate that a high number of samples is needed to obtain reliable estimates for the parameters. The Metropolis–Hastings sampling algorithm, using the precomputed forward solution in a tensor format, draws this high number of samples and therefore enables solving problems which are infeasible using classical methods.

Export citation and abstract BibTeX RIS

1. Introduction

In clinical applications, surface electromyographic (EMG) data are a widely used source of information about the muscular and nervous system. For example, EMG data are a valuable source of information in neurology, movement analysis, rehabilitation medicine or the development of biofeedback techniques. To this end, different models have been developed to simulate and understand EMG data, see, e.g., [26].

Using EMG measurements, we focus on reconstructing the intracellular conductivity of biological tissue. As the conductivity provides information about the structure of this tissue, we make an important step towards a non-invasive and radiation-free imaging method. Furthermore, reliable estimates on the conductivity from patient-specific EMG measurements can advance the personalized treatment.

Computed EMG data is, however, highly sensitive to changes in the conductivity, see, e.g., [19]. In addition, reconstructing data from (surface) measurements is an inverse problem [16]. Since the measurement error is unknown, we model it as a stochastic quantity and include it into the EMG model. This results in a probabilization of the whole EMG model. Consequently, the solution of the inverse EMG problem also becomes probabilistic.

For solving this probabilistic inverse problem, in section 2, we use a Bayesian ansatz, cf. [4, 31], that searches for the probability distribution of the parameters for given measurements, the so-called posterior distribution. This ansatz has the advantage that the posterior distribution quantifies the uncertainty within instances of the reconstructed parameters.

Discretizing the posterior distribution means drawing a finite number of samples from the posterior which includes solving the (discrete) forward EMG problem for different parameter samples to check the fidelity of each sample.

As solving the forward EMG problem is expensive using classical methods, we aim at precomputing the solution of the forward problem for all parameters at the same time. This results in a parameter-dependent linear system of equations, i.e., A(p)ϕ(p) = b(p) for an operator A, a solution ϕ, and a right-hand side b depending on parameters p = (p(1), p(2), ..., p(d)). After discretizing the parameters in the sense that we allow each parameter p(j), j = 1, ..., d, to take n different values from its domain, solving the linear system for every combination of parameters implies solving nd linear systems. This exponential scaling in the dimension d of the parameter space is commonly known as the curse of dimensionality which renders classical methods for d ≫ 2 infeasible.

To represent these parameter-dependent linear systems, we use low-rank tensor formats, cf. [3, 11], which we recapitulate in section 3. Solving these linear systems within these formats allows us to evaluate the parameter-dependent forward problem fast.

In particular, our main contributions to solve this Bayesian inverse EMG problem and to represent the forward problem in a data-sparse way using low-rank tensor formats are:

  • We prove the well-posedness of our particular Bayesian inverse EMG problem in section 4 and show that modeling the measurement error leads to a natural regularization of the inverse problem.
  • We derive a discretization of the parameter-dependent operator and the right-hand side in section 5 and prove a data-sparse representation of this discretization using low-rank tensor formats. This method allows us to solve the parameter-dependent linear system fast.
  • Combining this data-sparse representation with a standard Metropolis–Hastings algorithm in section 6 allows us to solve the Bayesian inverse EMG problem efficiently.

In section 7, we present our numerical experiments that support our theoretical analysis and indicate that the Markov chain constructed by the Metropolis–Hastings algorithm using low-rank tensor formats behaves like the Markov chain constructed by a standard algorithm (SA). Further, we observe a speedup of more than 600 using low-rank tensor formats compared to a SA.

In section 8, we discuss some related work, and in section 9, we conclude that mathematical theory and an efficient representation of the parameter-dependent solution, which allows us to generate samples fast, leads to an efficient algorithm to solve the Bayesian inverse problem.

2. The Bayesian inverse electromyographic problem

In order to define our Bayesian inverse EMG problem, we briefly discuss the structure of skeletal muscles and summarize a forward model of surface EMG signals in the following.

A skeletal muscle is composed of bundles of cells, the so-called muscle fibers. These muscle fibers are the active contractile tissue of a body that react to electrical stimuli.

To model EMG signals, we follow the physical structure of a skeletal muscle beginning with the electrical behavior of a single muscle fiber and then describing the electrical behavior of a skeletal muscle by assembling the muscle fibers.

An electrical stimulus from the spinal cord influences the chemo-electrical behavior of the innervated muscle fibers ${D}_{\text{F},j}\subseteq \mathbb{R}$, j = 1, ..., NMF, for ${N}_{\text{MF}}\in \mathbb{N}$ muscle fibers. These electrical fluctuations travel along the muscle fibers as action potentials (APs), propagate through the muscle, and are measured at $M\in \mathbb{N}$ measuring points summarized in $\mathbf{x}\in {\mathbb{R}}^{M{\times}3}$.

We apply the widely used model by Rosenfalck [28] to model the muscle fiber AP:

Equation (1)

Here, ${r}_{1,j},{r}_{2,j},{r}_{3,j}\in \mathbb{R}$ are known, fixed constants, and the spatial coordinate s can be rewritten as s = uj t using the AP velocities uj and time t.

To assemble a three-dimensional skeletal muscle ${D}_{\text{M}}\subseteq {\mathbb{R}}^{3}$ from the one-dimensional muscle fibers ${D}_{\text{F},j}\subseteq \mathbb{R}$, a transfer operator is needed. Thus, we introduce the smoothing operator $S:{D}_{\text{F},j}\to {\mathbb{R}}^{3}$ with

Equation (2)

where $\beta \in \mathbb{R}$ is a smoothing parameter and πj : DMDF,j is the orthogonal projection of a muscle tissue point xDM onto the muscle fiber DF,j with starting point ${y}_{j}\in {\mathbb{R}}^{3}$ and direction ${ \overrightarrow {d}}_{j}\in {\mathbb{R}}^{3}$. Note that the muscle fiber directions ${ \overrightarrow {d}}_{j}$ in general depend on $x\in {\mathbb{R}}^{3}$ and are known for the forward problem, e.g., through a medical imaging technique. The projection reads

Equation (3)

Applying the smoothing operator to the muscle fibers yields ${\cup }_{j=1}^{{N}_{\text{MF}}}S\left({D}_{\text{F},j}\right)={D}_{\text{M}}$, and we obtain the membrane potential ${V}_{\text{m}}\left(x\right)={\sum }_{j=1}^{{N}_{\text{MF}}}S\left({v}_{\text{m},j}\right)\left(x\right)$.

The bidomain equation, as stated in [26], models the propagation of the membrane potential Vm through a skeletal muscle by

Equation (4)

where ϕe denotes the extracellular electrical potential, and σi, σe are the intra- and extracellular electrical conductivities. Additionally, no-flow boundary conditions are introduced at the domain boundary. A zero-mean integral condition is used to ensure uniqueness of the solution.

The above model can easily be extended by the electrophysiology of surrounding connective tissue and bones, and a model of force generation and the corresponding continuum mechanics, see [26] and the references therein. Within our setting, the muscle geometry and the structure of the tissue remain unchanged in time.

Note that we model the conductivities as matrices, e.g., ${\sigma }_{\text{i}}\in {\mathbb{R}}^{3{\times}3}$, where each matrix entry ${\left({\sigma }_{\text{i}}\right)}_{j,k}$ quantifies the conductivity of the tissue in the xj xk -direction for j, k = 1, 2, 3. In particular, the eigenvector of σi that belongs to the largest eigenvalue represents the orientation of the underlying muscle fiber, and the largest eigenvalue corresponds to the longitudinal conductivity of the underlying muscle fiber. Note that the conductivity of a muscle fiber in transversal direction is much smaller. This relation enables us to draw conclusions about the structure of muscular tissue from its intracellular conductivity. To verify our ansatz described in the following sections, we restrict ourselves to diagonal conductivity matrices, i.e., the corresponding eigenvectors are the unit vectors ${ \overrightarrow {e}}_{j}\in {\mathbb{R}}^{3}$ for j = 1, 2, 3. Consequently, the muscle fiber direction is one of these unit vectors.

A reasonable assumption on σi is that it is bounded, i.e., there exist constants s > 0 and s+ < such that sσis+ holds componentwise. Physically this corresponds to the tissue neither being fully insulating nor superconducting. Formalizing these considerations leads to the assumption $p{:=}\left({\left({\sigma }_{\text{i}}\right)}_{1,1},{\left({\sigma }_{\text{i}}\right)}_{2,2},{\left({\sigma }_{\text{i}}\right)}_{3,3}\right)\in {\left[{s}_{-},{s}_{+}\right]}^{3}{=:}\mathcal{J}$.

For simplicity, we encapsulate the above models in the definition of the observation operator

Equation (5)

which maps the diagonal entries p of a given intracellular conductivity σi to the calculated electrical potential ϕ(x) at measuring points $\mathbf{x}\in {\mathbb{R}}^{M{\times}3}$.

To complete the forward EMG model, we include the inevitable measurement error which is unknown but is usually assumed to be additive and to follow a normal distribution. Hence, the measurement error is modeled as a random variable $\eta :{\Omega}\to {\mathbb{R}}^{M}$ on a complete probability space $\left({\Omega},\mathcal{F},P\right)$ with $\eta \sim \mathcal{N}\left(0,{\Xi}\right)$ and covariance matrix ${\Xi}=\mathrm{diag}\left(\xi ,\dots ,\xi \right)\in {\mathbb{R}}^{M{\times}M}$. Adding the measurement error to (5) yields the model for EMG data

Equation (6)

Solving (6) for p, as in the inverse problem setting, shows that p must be a random variable as well. For emphasizing the randomness of p, we write p = p(ω).

A naive inversion of the probabilistic forward problem would be to search for a $p\left(\omega \right)\in \mathcal{J}$ such that ${\phi }_{\text{EMG}}^{\text{comp}}\left(p\left(\omega \right)\right)={\phi }_{\text{EMG}}^{\text{meas}}$ for given measurements ${\phi }_{\text{EMG}}^{\text{meas}}\in {\mathbb{R}}^{M}$. This problem formulation searches for particular realizations of the random variable p that, however, misrepresents the behavior of the probabilistic inverse EMG problem. Hence, we need a more appropriate problem formulation.

We consider a function space Bayesian formulation which aims at calculating the probability distribution of p for given data ${\phi }_{\text{EMG}}^{\text{meas}}$.

To follow this approach, we assume that the entries of p are uncorrelated and equip $\mathcal{J}$ with the product σ-algebra ${\Theta}{:=}{\bigotimes}_{j=1}^{3}\mathcal{B}\left(\left[{s}_{-},{s}_{+}\right]\right)$, where $\mathcal{B}\left(\left[{s}_{-},{s}_{+}\right]\right)$ is the Borel-σ-algebra on [s, s+]. Subsequently, the product probability measure $\rho {:=}{\bigotimes}_{j=1}^{3}\mathrm{d}{\lambda }_{j}$ is defined on the measurable space $\left(\mathcal{J},{\Theta}\right)$ with dλj denoting the normalized Lebesgue measure on [s, s+], similar to [18, 29]. Note that ρ is the probability law of the random variable p, since the diagonal entries p of the intracellular conductivity σi are uncorrelated. The Lebesgue measure indicates that the entries of p are uniformly distributed on [s, s+]. In the Bayesian context, ρ is called the prior measure or short prior, because it describes the behavior of p prior to having any knowledge about the conductivity, e.g., from measurements.

The Bayesian inverse EMG problem searches for the conditional probability distribution ρEMG of p given EMG measurements ${\phi }_{\text{EMG}}^{\text{meas}}$. We prove the existence of the posterior distribution ρEMG in section 4.

For solving our Bayesian inverse EMG problem, we use a Metropolis–Hastings algorithm, see, e.g., [27]. A Metropolis–Hastings algorithm is an acceptance-rejection algorithm that draws samples from the posterior distribution by solving the EMG forward problem for different realizations of p and comparing the results. If the proposal is accepted by an acceptance strategy a, it becomes part of a Markov chain. Otherwise, the old sample will be kept and a new proposal will be drawn.

In [4], the acceptance strategy $a\left(p,\tilde {p}\right){:=}\mathrm{min}\left\{1,\mathrm{exp}\left({\Phi}\left(p\right)-{\Phi}\left(\tilde {p}\right)\right)\right\}$ with the potential ${\Phi}:\mathcal{J}{\times}{\mathbb{R}}^{M}\to \mathbb{R}$ defined by

Equation (7)

and Ξ-norm ${\Vert}\mathbf{v}{{\Vert}}_{{\Xi}}{:=}{\Vert}{{\Xi}}^{-\frac{1}{2}}\mathbf{v}{{\Vert}}_{{\mathbb{R}}^{M}}$ for all $\mathbf{v}\in {\mathbb{R}}^{M}$ was derived such that the resulting Markov chain is reversible with respect to the prior ρ. This yields the convergence of the Metropolis–Hastings algorithm.

We rewrite the acceptance strategy:

Consequently, a new proposal will always be accepted, if it produces a smaller error than the last accepted sample, and will otherwise be rejected with probability 1 − a, i.e., the old sample will be kept with probability 1 − a.

3. Low-rank tensor formats

Evaluating the acceptance strategy in every step of the Metropolis–Hastings algorithm requires the evaluation of the observation operator ${\mathcal{G}}_{\mathbf{x}}$, i.e., the solution of the forward EMG problem, for a new set of diagonal entries p of the intracellular conductivity. Consequently, we need a way to compute these solutions fast. We use low-rank tensor formats to accelerate these computations and motivate these formats using an example, analog to [10].

We consider the scaling of a discrete operator Ah by a parameter ph (j), j = 1, ..., n with $n\in \mathbb{N}$, i.e., ph (j)Ah . We assume that the right-hand side bh is constant for all ph (j). Using classical methods, we would need to solve the following linear system:

Using the Kronecker product to reformulate this system

we achieve a data-sparse representation. We use a generalization of this representation to derive a data-sparse representation of the parameter-dependent forward EMG problem which can be interpreted as the CANDECOMP/PARAFAC, or short CP, representation introduced in [2, 17].

Definition 3.1 (CP vector and CP operator). A CP representation of a tensor $\mathbf{b}\in {\mathbb{R}}^{{n}_{1}{\times}\cdots {\times}{n}_{d}}$, with representation rank $r\in {\mathbb{N}}_{0}$, is defined as

Equation (8)

We call each $\ell \in \mathcal{D}{:=}\left\{1,\dots ,d\right\}$ mode and d the dimension. The minimal r, such that (8) holds, is called the CP rank of b and in this case (8) is called the CP decomposition of b. We call a tensor of the form (8) a CP vector.

A CP representation of a tensor operator A from ${\mathbb{R}}^{{n}_{1}{\times}\cdots {\times}{n}_{d}}$ to ${\mathbb{R}}^{{n}_{1}{\times}\cdots {\times}{n}_{d}}$, with representation rank r and dimension d, is defined as

Equation (9)

We call a tensor of the form (9) a CP operator.

Note that the ${b}_{k}^{\left(\ell \right)}$ in (8) are vectors and that the ${A}_{k}^{\left(\ell \right)}$ in (9) are matrices. Therefore, a CP vector b is a sum of rank r Kronecker products of d vectors, and a CP operator A is a summation over Kronecker products of matrices. Thus, using definition 3.1, there exist CP vectors and CP operators of any dimension and rank.

A big advantage of the CP format is the data-sparsity in case of a small representation rank r, since a tensor $\mathbf{b}\in {\mathbb{R}}^{{n}_{1}{\times}\cdots {\times}{n}_{d}}$ of the form (8) has storage cost in $\mathcal{O}\left(r{\sum }_{\ell =1}^{d}{n}_{\ell }\right)\approx \mathcal{O}\left(rdn\right)$ compared to $\mathcal{O}\left({\prod }_{\ell =1}^{d}{n}_{\ell }\right)\approx \mathcal{O}\left({n}^{d}\right)$ with $n={\mathrm{max}}_{\ell \in \mathcal{D}}\enspace {n}_{\ell }$.

Therefore it is desirable to represent the operator and the right-hand side of the forward EMG problem data-sparse using low-rank tensor formats. To compute the solution of the discrete forward EMG problem, we need to solve linear systems within low-rank tensor formats. An algorithm that can calculate the inverse of an operator with rank r > 1 in a direct way is unknown.

Consider, e.g., a CP operator A of dimension 1 and rank 2, i.e., A = A1 + A2, with ${A}_{1},{A}_{2}\in {\mathbb{R}}^{n{\times}n}$. Then, finding a direct inverse of A in the CP format means finding matrices Cj and Dj such that ${\mathbf{A}}^{-1}={\left({A}_{1}+{A}_{2}\right)}^{-1}={\sum }_{j=1}^{J}{C}_{j}^{-1}+{D}_{j}^{-1}$ should hold for some rank $J\in \mathbb{N}$. Since such a property is unknown even for matrix summations [25], it is also unknown in the more general tensor case.

Figure 1.

Figure 1. Dimension tree for dimension d = 4.

Standard image High-resolution image

We therefore need iterative solvers and thus arithmetic operations within low-rank tensor formats. These arithmetic operations often lead to an increase of the representation rank.

Consider, e.g., a CP operator of dimension 2 and rank 3, i.e., $\mathbf{A}={\sum }_{i=1}^{3}{A}_{i}^{\left(1\right)}\bigotimes{A}_{i}^{\left(2\right)}$ and a CP vector of dimension 2 and rank 2, i.e., $\mathbf{x}={\sum }_{j=1}^{2}{x}_{j}^{\left(1\right)}\bigotimes{x}_{j}^{\left(2\right)}$. Then, the application of A to x yields $\mathbf{A}\mathbf{x}=\left({\sum }_{i=1}^{3}{A}_{i}^{\left(1\right)}\bigotimes{A}_{i}^{\left(2\right)}\right)\left({\sum }_{j=1}^{2}{x}_{j}^{\left(1\right)}\bigotimes{x}_{j}^{\left(2\right)}\right)={\sum }_{i=1}^{3}{\sum }_{j=1}^{2}{A}_{i}^{\left(1\right)}{x}_{j}^{\left(1\right)}\bigotimes{A}_{i}^{\left(2\right)}{x}_{j}^{\left(2\right)}={\sum }_{k=1}^{6}{y}_{k}^{\left(1\right)}\bigotimes{y}_{k}^{\left(2\right)}$ with ${y}_{k}^{\left(\nu \right)}{:=}{A}_{i}^{\left(\nu \right)}{x}_{j}^{\left(\nu \right)}$ for k = i + 3(j − 1) and ν = 1, 2. Therefore Ax is a CP vector of representation rank 6(=2 × 3).

The above example shows that we need a truncation of a tensor to lower rank, i.e., an approximation with a tensor of lower rank. To guarantee the convergence of iterative methods, we have to guarantee that the truncation error is small enough, cf. [14].

The set of CP tensors of rank r is, however, not closed which makes the approximation of a CP tensor of rank r an ill-posed problem, cf. [30]. Therefore, we cannot guarantee that the truncation error will be small enough to yield convergence of the iterative method. To overcome this drawback, we use the hierarchical Tucker format to represent and compute the solution of a linear system.

The general idea of the hierarchical Tucker format, which was first introduced in [15] and further analyzed in [9], is to define a hierarchy among the modes $\mathcal{D}=\left\{1,\dots ,d\right\}$. To do so, we define the so-called dimension tree analogously to [9, definition 3.1].

Definition 3.2 (Dimension tree). A dimension tree $\mathcal{T}$ for dimension $d\in \mathbb{N}$ is a binary tree with nodes labeled by non-empty subsets of $\mathcal{D}$. Its root is labeled with $\mathcal{D}$, each leaf node is labeled with a single-element subset $z=\left\{\ell \right\}\subseteq \mathcal{D}$, and each inner node is labeled with the disjoint union of its two children. We will identify a node with its label z and therefore write $z\in \mathcal{T}$.

Figure 1 shows an example of a dimension tree for d = 4. The labels of dimension trees lead to the corresponding matricization for each node which we define as in [9, definition 3.3]:

Definition 3.3 (Matricization and vectorization). Let $\boldsymbol{\phi }\in {\mathbb{R}}^{{n}_{1}{\times}\cdots {\times}{n}_{d}}$, $z\subseteq \mathcal{D}$ with z ≠ ∅, and $g{:=}\mathcal{D}{\backslash}z$. The matricization of ϕ corresponding to z is defined as ${\boldsymbol{\phi }}^{\left(z\right)}\in {\mathbb{R}}^{{n}_{z}{\times}{n}_{g}}$, where nz := ∏z n and ng := ∏g n , with ${\boldsymbol{\phi }}^{\left(z\right)}\left[{\left({i}_{j}\right)}_{j\in z},{\left({i}_{j}\right)}_{j\in g}\right]{:=}\boldsymbol{\phi }\left[{i}_{1},\dots ,{i}_{d}\right]$ for all $i={\left({i}_{j}\right)}_{j\in \mathcal{D}}$. In particular, ${\boldsymbol{\phi }}^{\left(\mathcal{D}\right)}\in {\mathbb{R}}^{{n}_{1}\cdots {n}_{d}}$ holds, which can also be interpreted as the vectorization of ϕ .

A matricization can be interpreted as an unfolding of the tensor as illustrated in figure 2. Based on the concept of matricizations the hierarchical Tucker rank is defined accordingly to [9, definition 3.4]:

Definition 3.4 (Hierarchical Tucker rank). Let $\boldsymbol{\phi }\in {\mathbb{R}}^{{n}_{1}{\times}\cdots {\times}{n}_{d}}$ and $\mathcal{T}$ be a dimension tree. The hierarchical Tucker rank of ϕ is defined as ${\text{rank}}_{\mathcal{T}}\left(\boldsymbol{\phi }\right){:=}{\left({r}_{z}\right)}_{z\in \mathcal{T}}$, where rz := rank( ϕ (z)) denotes the matrix rank of the matricization ϕ (z) for all $z\in \mathcal{T}$.

The set of tensors with hierarchical Tucker rank node-wise bounded by ${\left({r}_{z}\right)}_{z\in \mathcal{T}}$ is defined as $\mathcal{H}-Tucker\left(\mathcal{T},{\left({r}_{z}\right)}_{z\in \mathcal{T}}\right){:=}\left\{\boldsymbol{\gamma }\in {\mathbb{R}}^{{n}_{1}{\times}\cdots {\times}{n}_{d}}\vert \text{rank}\left({\boldsymbol{\gamma }}^{\left(z\right)}\right){\leqslant}{r}_{z}\quad \text{for}\enspace \text{all}\enspace z\in \mathcal{T}\right\}$.

Using the dimension tree, the concept of matricization, and the hierarchical Tucker rank, one can define the representation of a tensor within the hierarchical Tucker format, cf. [9, definition 3.6]. The memory required for a hierarchical Tucker representation, with dimension tree $\mathcal{T}$ and representation rank ${\left({r}_{z}\right)}_{z\in \mathcal{T}}$, of a tensor $\boldsymbol{\phi }\in {\mathbb{R}}^{{n}_{1}{\times}\cdots {\times}{n}_{d}}$ for $n={\mathrm{max}}_{\ell \in \mathcal{D}}\enspace {n}_{\ell }$ and $r={\mathrm{max}}_{z\in \mathcal{T}}\enspace {r}_{z}$ is given by $\mathcal{O}\left(rdn+{r}^{3}d\right)$, cf. [9, lemma 3.7]. The existence of a truncation method of a low-rank tensor $\boldsymbol{\phi }\in \mathcal{H}-Tucker\left(\mathcal{T},\enspace {\left({r}_{z}\right)}_{z\in \mathcal{T}}\right)$ down to lower rank ${\left({\tilde {r}}_{z}\right)}_{z\in \mathcal{T}}$ with an arithmetic cost in $\mathcal{O}\left({r}^{2}dn+{r}^{4}d\right)$ was proven in [9]. The resulting approximation $\tilde {\boldsymbol{\phi }}{:=}\text{truncate}\left(\boldsymbol{\phi }\right)\in \mathcal{H}-Tucker\left(\mathcal{T},{\left({\tilde {r}}_{z}\right)}_{z\in \mathcal{T}}\right)$ fulfills the quasi-optimal error estimation

Further, we can transfer a CP representation of a tensor vector or tensor operator with CP rank r into a hierarchical Tucker representation with rank node-wise bounded by r, cf. [13, theorem 11.17]. Following this approach, we represent the operator and the right-hand side in the hierarchical Tucker format. For solving parameter-dependent linear problems in the hierarchical Tucker format, we use the preconditioned conjugate gradients (PCG) method. In algorithm 1 the PCG method is briefly introduced similar to [21, algorithm 2].

Algorithm 1. Preconditioned conjugate gradients method with truncation.

Input: CP operator A, CP vector b, CP rank 1 preconditioner M, initial guess ϕ (0) in the hierarchical Tucker format
Output: Approximate solution ϕ in the hierarchical Tucker format of Aϕ = b
1: ${\boldsymbol{\rho }}_{\left(0\right)}=\text{truncate}\left(\mathbf{b}-\mathbf{A}{\boldsymbol{\phi }}_{\left(0\right)}\right)$
2:  ζ (0) = M−1 ρ (0)
3:  π (0) = ζ (0)
4: ${\boldsymbol{\theta }}_{\left(0\right)}=\text{truncate}\left(\mathbf{A}{\boldsymbol{\pi }}_{\left(0\right)}\right)$
5: k = 0
6: while $\frac{{\Vert}{\boldsymbol{\rho }}_{\left(k\right)}{\Vert}}{{\Vert}\mathbf{b}{\Vert}}{ >}\varepsilon $ and k < kmax do
7:  ${\boldsymbol{\phi }}_{\left(k+1\right)}=\text{truncate}\left({\boldsymbol{\phi }}_{\left(k\right)}+\frac{\left\langle {\boldsymbol{\rho }}_{\left(k\right)},{\boldsymbol{\pi }}_{\left(k\right)}\right\rangle }{\left\langle {\boldsymbol{\theta }}_{\left(k\right)},{\boldsymbol{\pi }}_{\left(k\right)}\right\rangle }{\boldsymbol{\pi }}_{\left(k\right)}\right)$
8:  ${\boldsymbol{\rho }}_{\left(k+1\right)}=\text{truncate}\left(\mathbf{b}-\mathbf{A}{\boldsymbol{\phi }}_{\left(k+1\right)}\right)$
9:   ζ (k+1) = M−1 ρ (k+1)
10:  ${\boldsymbol{\pi }}_{\left(k+1\right)}=\text{truncate}\left({\boldsymbol{\zeta }}_{\left(k+1\right)}-\frac{\left\langle {\boldsymbol{\theta }}_{\left(k\right)},{\boldsymbol{\zeta }}_{\left(k+1\right)}\right\rangle }{\left\langle {\boldsymbol{\theta }}_{\left(k\right)},{\boldsymbol{\pi }}_{\left(k\right)}\right\rangle }{\boldsymbol{\pi }}_{\left(k\right)}\right)$
11:  ${\boldsymbol{\theta }}_{\left(k+1\right)}=\text{truncate}\left(\mathbf{A}{\boldsymbol{\pi }}_{\left(k+1\right)}\right)$
12:  k = k + 1
13: end while

The PCG method in algorithm 1 approximates the solution of a parameter-dependent linear system numerically within the hierarchical Tucker format if the tensor operator A is positive definite and symmetric. In [12, lemma 5] the authors proved that this algorithm converges if the truncation error ɛ is small enough. Algorithm 1 comprises additions and inner products of two tensors in hierarchical Tucker format which have an arithmetic cost in $\mathcal{O}\left(dn{r}^{2}+\mathrm{d}{r}^{4}\right)$, application of an operator which has an arithmetic cost in $\mathcal{O}\left(d{n}^{2}r\right)$, and evaluation of an entry of the represented tensor which has an arithmetic cost in $\mathcal{O}\left(d{r}^{3}\right)$. Hence, for small rank r most of the operations needed for the PCG method scale linearly in the dimension d and the mode size n, thus yielding an efficient method to solve parameter-dependent linear systems using low-rank tensor formats.

This means that, if we are able to prove the existence of a low-rank representation of the operator and right-hand side of the forward EMG problem, we can compute the solution of the linear system data-sparse and fast within the hierarchical Tucker format.

Finding conditions that guarantee the existence of a low-rank approximation for a given tensor is a research topic of its own [1, 3, 22]. This goes beyond the scope of this article, and we thus assume that the solution of the parameter-dependent EMG forward problem has a low-rank approximation. This is backed up by the numerical experiments in section 7.

Figure 2.

Figure 2. Visual representation of a matricization.

Standard image High-resolution image

4. The Bayesian inverse EMG problem

We present our first main contribution: the proof of the well-posedness of the Bayesian inverse EMG problem discussed in section 2. Note that the proof of the well-posedness is valid for any bounded conductivity σi that can be represented through parameters $p\in \mathcal{J}$ for any parameter space $\mathcal{J}$. For diagonal conductivities these parameters are the diagonal entries of σi and $\mathcal{J}={\left[{s}_{-},{s}_{+}\right]}^{3}$. In the more general case of space-dependent intracellular conductivities, the parameters can be chosen as the coefficients of a Karhun–Loève expansion of σi(x, ω), see, e.g., [4, 18]. The following proof thus holds for both, space-independent and space-dependent, conductivities.

First we prove the existence of the posterior distribution ρEMG of parameters p given measurements ${\phi }_{\text{EMG}}^{\text{meas}}$ for a prior ρ using the infinite-dimensional version of Bayes' theorem for inverse problems [4, theorem 3.4].

Theorem 4.1 (Bayes' theorem for our inverse EMG problem). Let ${\mathbb{Q}}_{0}$ and ${\mathbb{Q}}_{p}$ denote the measures with distribution $\mathcal{N}\left(0,{\Xi}\right)$ and $\mathcal{N}\left({\mathcal{G}}_{\mathbf{x}}\left(p\right),{\Xi}\right)$. Then,

  • B.1  
    The scaling factor $Z{:=}{\int }_{\mathcal{J}}\mathrm{exp}\left(-{\Phi}\left(p;{\phi }_{\text{EMG}}^{\text{meas}}\right)\right)\rho \left(\mathrm{d}p\right)$ is positive ${\mathbb{Q}}_{0}$-almost surely,
  • B.2  
    The potential ${\Phi}:\mathcal{J}{\times}{\mathbb{R}}^{M}\to \mathbb{R}$, as defined in (7), is ν0-measurable with product measure ${\nu }_{0}\left(\mathrm{d}p,\mathrm{d}\phi \right){:=}\rho \left(\mathrm{d}p\right){\mathbb{Q}}_{0}\left(\mathrm{d}\phi \right)$,
  • B.3  
    For ${\phi }_{\text{EMG}}^{\text{meas}}$ the conditional distribution ρEMG exists, ρEMG is absolutely continuous with respect to ρ, and
    ν-almost surely with the product measure $\nu \left(\mathrm{d}p,\mathrm{d}\phi \right){:=}\rho \left(\mathrm{d}p\right){\mathbb{Q}}_{p}\left(\mathrm{d}\phi \right)$.

To prove the above theorem, we need the boundedness and Lipschitz continuity of the observation operator as stated in the following lemma:

Lemma 4.2. The observation operator is bounded and Lipschitz continuous with respect to p, i.e., there exist constants 0 < C, Lp < such that

Equation (10)

Equation (11)

for all $p,{p}_{1},{p}_{2}\in \mathcal{J}$.

The proof consists of basic calculations and estimations on the weak form of the deterministic EMG forward problem and is thus left to the reader.

Proof of theorem 4.1. The proof is based on the proof of the measurability of the potential Φ. Since B.1 and B.2 are the assumptions required for the Bayes theorem in [4, theorem 3.4] to hold, B.3 follows directly once B.1 and B.2 are proven. As the ν0-measurability of Φ, meaning that Φ is ρ-measurable in p and ${\mathbb{Q}}_{0}$-measurable in ${\phi }_{\text{EMG}}^{\text{meas}}$, follows from the Lipschitz continuity of the corresponding mappings, we show that

  • (a)  
    Φ is Lipschitz continuous with respect to p and
  • (b)  
    Φ is Lipschitz continuous with respect to ${\phi }_{\text{EMG}}^{\text{meas}}$.

Note that we also need the Lipschitz continuity of Φ to prove that the posterior depends continuously on the measurement data in theorem 4.4. For ease of notations, we introduce the shorthand ${\langle u,v\rangle }_{{\Xi}}{:=}\langle {{\Xi}}^{-\frac{1}{2}}u,{{\Xi}}^{-\frac{1}{2}}v\rangle $ for $u,v\in {\mathbb{R}}^{M}$ and neglect the second argument of the potential Φ.

  • (a)  
    Let ${p}_{1},{p}_{2}\in \mathcal{J}$ with p1p2, and (TI) and (HI) denote the triangle and Hölder's inequality. Using lemma 4.2, we have
  • (b)  
    For ${\phi }_{1},{\phi }_{2}\in {\mathbb{R}}^{M}$ with ϕ1ϕ2 we express the norms in the definition of Φ as scalar products obtaining
    Equation (12)

This concludes the proof. □

The well-posedness of the Bayesian inverse EMG problem also includes the continuity of the posterior ρEMG with respect to the data ${\phi }_{\text{EMG}}^{\text{meas}}$. Therefore, we need to define a metric on the space of measures. Similar to [4, 18] we choose the Hellinger metric.

Definition 4.3 (Hellinger metric). Let μ1 and μ2 denote two probability measures that are absolutely continuous with respect to a probability measure ζ. The Hellinger metric of μ1 and μ2 is then defined as

With the help of the Hellinger metric we now prove the Lipschitz continuity of the posterior ρEMG with respect to measured EMG data.

Theorem 4.4. Let ρEMG denote the solution of our Bayesian inverse EMG problem given by theorem 4.1. Then ρEMG depends Lipschitz continuously on the measured data ${\phi }_{\text{EMG}}^{\text{meas}}$ with respect to the Hellinger metric. This means there exists a positive constant L > 0 such that

Equation (13)

holds for all ${\phi }_{1},{\phi }_{2}\in {\mathbb{R}}^{M}$ and the posterior distributions ${\rho }_{1}^{\text{EMG}}$ and ${\rho }_{2}^{\text{EMG}}$ of σ given ϕ1 and ϕ2.

To prove the above theorem, we need the following lemma:

Lemma 4.5. The scaling factor $Z\left(\phi \right)={\int }_{\mathcal{J}}\mathrm{exp}\left(-{\Phi}\left(p,\phi \right)\right)\enspace \mathrm{d}\rho \left(p\right)$ is Lipschitz continuous in ϕ, i.e., there exists a constant LZ > 0 such that

Equation (14)

holds for all ${\phi }_{1},{\phi }_{2}\in {\mathbb{R}}^{M}$ with ϕ1ϕ2.

The statement follows from lemma 4.2 with basic calculations and estimations and is thus left to the reader.

Proof of theorem 4.4. Let ${\rho }_{1}^{\text{EMG}},{\rho }_{2}^{\text{EMG}}$ denote the solutions of the Bayesian inverse EMG problem for given measurements ϕ1ϕ2. For simplicity, we write Φj := Φ(p, ϕj ) and Zj := Z(ϕj ), j = 1, 2. We estimate the Hellinger distance between the two posterior distributions using Young's inequality (YI), the Lipschitz continuity of the exponential function and the inverse of the square root on bounded domains with constants Le and Lsqrt and lemma 4.5:

As Z1 > 0 holds, it follows that $\frac{1}{{Z}_{1}}{< }\infty $. It thus remains to prove that Z2 < which is a consequence of $\mathcal{G}$ being bounded and $\rho \left(\mathcal{J}\right)=1$. The assertion follows with Lipschitz constant ${L}_{\rho }^{2}{:=}{L}_{\text{e}}^{2}{L}_{\phi }^{2}\frac{1}{{Z}_{1}}+{L}_{\text{sqrt}}^{2}{L}_{\text{Z}}^{2}{Z}_{2}$. □

Remark 4.6. The estimate in (13) also describes the behavior of the posterior with respect to the discretization of the underlying equations.

Recapitulating theorems 4.1 and 4.4 shows that modeling the measurement error as a stochastic quantity leads to a regularization of our inverse EMG problem, see also [4].

5. Discretization and tensorization

As described in section 2, we compute the posterior distribution ρEMG using a Metropolis–Hastings algorithm. We obtain an approximation of the posterior by drawing a finite number of samples. Additionally, we discretize the forward operator ${\mathcal{G}}_{\mathbf{x}}$ as follows. In accordance with section 4, we show a discretization for the more general case of space-dependent intracellular conductivities and mention that this discretization simplifies slightly for the space-independent case.

With $x={\left({x}_{1},{x}_{2},{x}_{3}\right)}^{\top }\in {D}_{\text{M}}$ the left-hand side of equation (4) reads

Equation (15)

and the right-hand side is given by

Equation (16)

Since our forward solver uses a finite difference discretization, we consider the same discretization using centered differences of second order, and therefore assume that ϕeC4(DM) and σiC1(DM). This is reasonable under our assumptions. Our theoretical and numerical results directly generalize to, e.g., finite element discretizations of arbitrary but given muscle geometries. The practical realization is future work.

In the following we use h = (hM, ht, hσ ) to indicate the discretization of the muscle geometry by hM, the time by ht and the parameter space by hσ . We denote the grid points by (${x}_{{j}_{1}}$, ${x}_{{j}_{2}}$, ${x}_{{j}_{3}}$), jk = 0, ..., n, for $n\in \mathbb{N}$ and a discrete conductivity at grid point (${x}_{{j}_{1}}$, ${x}_{{j}_{2}}$, ${x}_{{j}_{3}}$) by ${\sigma }_{{j}_{1},{j}_{2},{j}_{3}}$.

Theorem 5.1. For

Equation (17)

a second-order consistent stencil is given by

Proof. Because of the Kronecker product structure of (17) the statement follows from the one-dimensional case. There, Taylor's theorem and equating the coefficients of

yields ${\tilde {\sigma }}_{j}=\frac{{\sigma }_{j-1}+{\sigma }_{j}}{2}$ for a second-order consistent stencil given by

immediately finishing the proof. □

Next, we derive an affine representation of the discrete operator and prove a low-rank tensor format representation of the operator and the right-hand side of the forward EMG problem. This is our second main contribution.

Corollary 5.2. An affine representation of the discrete operator in the three-dimensional case is given by

where in the first plane the stencil is given by

in the second plane by

and in the third plane by

Proof. Follows from theorem 5.1 with linearity. □

We define ${A}_{h}^{\left(0\right)}{:=}{A}_{h,{\sigma }_{\text{e}}}$ denoting the discrete operator given by theorem 5.1 for constant ${\sigma }_{\text{e}}\in {\mathbb{R}}^{3{\times}3}$ and ${A}_{h,{j}_{1},{j}_{2},{j}_{3}}$ denoting the discrete operator given by the stencil ${M}_{{j}_{1},{j}_{2},{j}_{3}}$ from corollary 5.2. Then the discrete operator of (15) is given by

Using the vectorizations $\text{vec}\left({A}_{h,{j}_{1},{j}_{2},{j}_{3}}\right){=:}{A}_{h}^{\left(k\right)}$ and $\text{vec}\left({\sigma }_{{j}_{1},{j}_{2},{j}_{3}}\right){=:}{p}^{\left(k\right)}$, see definition 3.3, yields a parameter-dependent affine structure of the form

with p := (p(1), ..., p(d)), where each ${A}_{h}^{\left(k\right)}$ is constant, i.e., ${A}_{h}^{\left(k\right)}$ is parameter-independent.

We now take a closer look at the right-hand side and discretize the time variable t in (1) using equidistant time steps tj = jht, j = 0, ..., tmax for time step size ht. Multiplying this with the AP velocities uk , k = 1, ..., NMF, we achieve sj = uk tj for the discretization of the muscle fiber coordinate s.

Furthermore, we remark that the linear dependency of the right-hand side on the intracellular conductivity is obvious under our assumptions, which include that the muscle fiber direction is one of the standard unit vectors, i.e., $ \overrightarrow {d}={ \overrightarrow {e}}_{j}$, j = 1, 2, or 3. If Vm is independent of σi, the structure of the right-hand side is the same as the structure of the operator. Then we see the linear structure of (16) that has the form

How to represent an arbitrary right-hand side in a parameter-dependent way is ongoing research.

We now discretize the parameter space by choosing a finite number of parameters ${p}_{h}{:=}\left({p}_{h}^{\left(1\right)},\dots ,{p}_{h}^{\left(\ell \right)},\dots ,{p}_{h}^{\left(d\right)}\right)$ from a discrete set ${\mathcal{J}}_{h}$. We fix discrete values for all ${p}_{h}^{\left(\ell \right)}$, i.e., ${p}_{h}^{\left(\ell \right)}\in \left\{{p}_{h}^{\left(\ell \right)}\left(1\right),{p}_{h}^{\left(\ell \right)}\left(2\right),\dots ,{p}_{h}^{\left(\ell \right)}\left({n}_{\ell }\right)\right\}$, and reformulate our problem as:

Equation (18)

Assuming that each parameter ${p}_{h}^{\left(\ell \right)}$ can take n different values, applying classical methods one has to solve a system of ${\prod }_{\ell =1}^{d}{n}_{\ell }\approx {n}^{d}$ linear equations. To overcome the curse of dimensionality in this case, we exploit the structure of the linear system, see section 3. We find a data-sparse representation of the problem that allows us to solve the parameter-dependent system for all ${p}_{h}\in {\mathcal{J}}_{h}$ simultaneously, analogously to [10]. For computing the solution of (18) for all possible ${p}_{h}\in {\mathcal{J}}_{h}$, we define a large block-diagonal system with the operator

where ${A}_{j}^{\left(0\right)}={A}_{h}^{\left(0\right)}+{\sum }_{\ell =1}^{d}{p}_{h}^{\left(\ell \right)}\left(j\right){A}_{h}^{\left(\ell \right)}$ denotes the jth diagonal block.

The memory requirement to store A, however, grows exponentially in n and thus, even for moderate values of d and n , a classical representation of our problem is infeasible. Therefore, we reformulate the problem using the notation ${A}_{j}^{\left(m\right)}={\sum }_{\ell =m}^{d}{p}_{h}^{\left(\ell \right)}\left(j\right){A}_{h}^{\left(\ell \right)}$, m = 1, ..., d, and ${\text{Id}}_{{n}_{k}}$ denoting the identity in ${\mathbb{R}}^{{n}_{k}{\times}{n}_{k}}$, and achieve:

This leads to the following data-sparse CP representation of the operator

with discrete parameters ${p}_{h}^{\left(\ell \right)}=\left({p}_{h}^{\left(\ell \right)}\left(1\right),\dots ,{p}_{h}^{\left(\ell \right)}\left({n}_{\ell }\right)\right)$. Similar results can be obtained for the right-hand side.

Concluding, we represent the operator and the right-hand side of (18) exactly using low-rank tensor formats. Further, we approximate the solution of (18) using the hierarchical Tucker format in algorithm 1.

6. The tensorized Metropolis–Hastings algorithm

Having proved the theory for our Bayesian inverse EMG problem and a low-rank tensor representation of the operator and right-hand side of the discrete forward EMG problem, we now derive our final main contribution: A fast tensorized Metropolis–Hastings algorithm. Therefore, we combine the precomputation of the forward EMG problem described in section 2 for all parameters simultaneously using the hierarchical Tucker format and algorithm 1 with the Metropolis–Hastings sampling, as shown in algorithm 2.

Algorithm 2. Tensorized Metropolis–Hastings.

Input: Starting point ph,(1) for the Markov chain, sampling radius δ
Output: A Markov chain ph
1: Precompute $\mathcal{G}\left({p}_{h}\right)$ for all ${p}_{h}\in {\mathcal{J}}_{h}$ using tensor formats
2: for j = 1, ..., J − 1 do
3:  Propose ${\tilde {p}}_{h}\sim \mathcal{U}\left(\left[{p}_{h,\left(j\right)}-\delta ,{p}_{h,\left(j\right)}+\delta \right]\cap {\mathcal{J}}_{h}\right)$ independent of ph,(j)
4:  Draw $c\sim \mathcal{U}\left(0,1\right)$
5:  if $c{\leqslant}a\left(\mathcal{G}\left({p}_{h,\left(j\right)}\right),\mathcal{G}\left({\tilde {p}}_{h}\right)\right)$ then
6:   ${p}_{h,\left(j+1\right)}={\tilde {p}}_{h}$
7:  else
8:   ph,(j+1) = ph,(j)
9:  end if
10: end for

To be more precise, we first choose a fixed number of samples $J\in \mathbb{N}$ that have to be drawn during the sampling process. We then precompute the solution of the parameter-dependent forward EMG problem on a discrete set ${\mathcal{J}}_{h}$ in the hierarchical Tucker format using the PCG method from algorithm 1 and store the data-sparse solution. Recall that storing the solution of the parameter-dependent problem for all parameters is only feasible within data-sparse formats like the hierarchical Tucker format.

Doing so enables us to evaluate the precomputed tensor solution with arithmetic cost in $\mathcal{O}\left(nd{r}^{3}\right)$ and evaluate this solution fast instead of solving the discretized forward EMG problem in every iteration in line 5 of the algorithm. Note that we draw new samples ${\tilde {p}}_{h}$ uniformly from an interval with radius δ around the last accepted sample intersected with the discrete set ${\mathcal{J}}_{h}$ to account for the local behavior of the potential Φ and to accelerate convergence.

We assume that the cost of drawing one sample from the posterior distribution equals the solution time Ts of the discretized forward EMG problem for the standard Metropolis–Hastings algorithm and the evaluation time Te of the precomputed tensor solution for the tensorized Metropolis–Hastings algorithm. Thus, the runtime of the standard Metropolis–Hastings algorithm is JTs, while the runtime of the tensorized algorithm (TA) is the sum of the precomputation time Tp and the evaluation times, i.e., Tp + JTe. We notice that asymptotically the speedup $\frac{J{T}_{\text{s}}}{{T}_{\text{p}}+J{T}_{\text{e}}}$ is limited by $\frac{{T}_{\text{s}}}{{T}_{\text{e}}}$ for J.

Based on our mathematical theory we expect that the Markov chains constructed by both algorithms behave similarly. This is due to the fact that we exactly represent the operator and the right-hand side of the forward EMG problem for all discrete parameter combinations within the hierarchical Tucker format. Additionally, we compute the tensor solution using algorithm 1 with specified truncation accuracy, resulting in an error-controlled approximation.

7. Numerical experiments

We illustrate our method for the inverse EMG problem with numerical experiments. We conduct all experiments in Matlab using the KerMor framework 4 and the htucker toolbox [20]. Throughout our experiments we use the following default settings.

The geometry that we use is a muscle cuboid of size 4 cm × 2 cm × 1 cm that is equipped with 30 × 30 muscle fibers. The muscle geometry is discretized using the grid size ${h}_{\text{M}}=\frac{1}{3}$ while the muscle fibers are discretized using 30 grid points, and we use 101 time steps. We fix the extracellular conductivity at σe = diag(6.7, 6.7, 6.7). As reference conductivity we choose pref = (0.893, 8.930, 0.893), i.e., the muscle fiber direction is the second unit vector and the muscle fibers are aligned parallel to the second coordinate axis. We allow the muscle fiber direction to be one of the three unit vectors. As upper bound on the conductivity we define s+ = 10 and s = 0.001 as lower bound which we also set as the discretization step size in the parameter space, i.e., hσ = s.

For computing the tensor solution of (18), we use algorithm 1. There we set kmax = 15, ɛ = 1 × 10−4 and we truncate to a relative accuracy of 1 × 10−6. As preconditioner we define $\mathbf{M}{:=}{\text{Id}}_{{n}_{d}}\otimes \cdots \otimes {\text{Id}}_{{n}_{1}}\otimes {A}_{h}^{\left(0\right)}$, since we observed similar convergence behavior and similar runtimes of the algorithm independent of the chosen low-rank tensor preconditioner, see, e.g., [21], in former experiments. We compute the tensor solution on a suitable conductivity grid with grid size hσ and ${A}_{h}^{\left(0\right)}$ using the conductivity at the midpoint of that grid. For handling the time-dependency in the right-hand side, we solve the corresponding linear system for all time steps simultaneously. This leads to a tensor of size 364 × 101 × 4000 × 4001 × 4000.

For sampling from the posterior distribution of intracellular conductivity given EMG measurements, we use algorithm 2. There we set the total number of samples to 500 000 and use Gaussian noise with ξ = 2.0. The algorithm draws a conductivity proposal in a sampling radius δ = 1.5 around the last accepted sample. As default we draw the initial guess from a uniform distribution on an interval with radius δ around the reference solution, and we discard the first 200 samples as burn-in. These choices proved reasonable in our parameter studies. Additionally, we modify the algorithm such that it also samples the muscle fiber direction as one of the unit vectors.

We call algorithm 2 using the Matlab build-in QR decomposition to solve the forward problem for the proposed conductivity in each iteration the SA, and we call algorithm 2 using the precomputed tensor solution the TA.

Figure 3.

Figure 3. Relative singular values of the corresponding matricization of the low-rank solution of the forward EMG problem.

Standard image High-resolution image

7.1. Rank of the hierarchical Tucker format solution

In our first numerical experiment, we examine the hierarchical Tucker rank, see definition 3.4, of the tensor solution of the linear system to support our assumption that the solution is well approximated with low rank. Further, a small rank is important for efficient arithmetic operations as some of these operations in low-rank tensor formats scale in $\mathcal{O}\left({r}^{4}\right)$, see section 3. Therefore, in figure 3 we show a logarithmic-linear plot of the relative singular values for the corresponding matricizations of the solution of the forward problem using our default setting. We observe that the rank of the matricization remains smaller than 6 in the parameter space, i.e., the rank of the matricizations corresponding to {3}, {4}, and {5}. We also see that the rank of the matricization corresponding to {2} is 55 while the rank of the matricization corresponding to {1} is 343. We expect that the matricization corresponding to {1} has full rank since this separates the spatial dimension, i.e., {1}, and the time dimension, i.e., {2}, and since each time step yields its own right-hand side. Using tensor formats, we reduce the theoretical storage cost of the full tensor from 1.88 × 1010 MB(≈18 800 000 GB) to 4.41 MB counting the storage cost for 1 entry as 64 bit.

7.2. Comparison of the tensorized algorithm and the standard algorithm

For the validation of our TA, we compare its statistical behavior to the standard Metropolis–Hastings algorithm. We run both algorithms in our default setting for the reference conductivities ${p}_{1}^{\text{ref}}=\left(0.893,8.930,0.893\right)$ and ${p}_{2}^{\text{ref}}=\left(0.893,0.893,8.930\right)$.

We present the acceptance rates $\frac{\#\text{samples}\;\text{acc.}}{\#\text{samples}\;\text{drawn}}$, the mean absolute deviations (MADs) $\frac{1}{\#\text{samples}\;\text{acc.}}{\sum }_{k=1}^{\#\text{samples}\;\text{acc.}}\vert {p}_{\left(k\right)}-\bar{p}\vert $ and variance $\frac{1}{\#\text{samples}\;\text{acc.}-1}{\sum }_{k=1}^{\#\text{samples}\;\text{acc.}}{\left({p}_{\left(k\right)}-\bar{p}\right)}^{2}$ of the accepted diagonal entries of the conductivities in table 1. For the reference values ${p}_{1}^{\text{ref}}$ and ${p}_{2}^{\text{ref}}$ we observe that both methods have similar acceptance rates. We further notice that the SA and TA have a comparable reliability, i.e., comparable MAD and variance.

Table 1. Comparison of the SA and the TA with 500 000 drawn samples, Gaussian noise with ξ = 2.0, and sampling radius δ = 1.5.

  ${p}_{1}^{\text{ref}}$ ${p}_{2}^{\text{ref}}$
SATASATA
Acceptance rate (%)5.395.385.795.79
MAD(p(1))0.860.860.500.50
MAD(p(2))0.440.440.240.24
MAD(p(3))0.170.170.550.55
Var(p(1))1.051.050.380.38
Var(p(2))0.290.290.090.09
Var(p(3))0.040.040.440.44

We conclude that the sampling process of both algorithms is similar and that our tensor approach is therefore a promising ansatz to accelerate the SA if the discretization error of the forward problem is small. In this case, we furthermore reason that our results indicate that the tensor solution of the forward EMG problem is indeed a good approximation to the solution that we obtain using the Matlab build-in QR decomposition. We highlight that these results are in line with our theoretical findings from section 5.

7.3. Speedup tests

First, we examine the speedup $\frac{\text{runtime}\;\text{SA}}{\text{runtime}\;\text{TA}}$ of our tensor method compared to the standard method for fixed discretization grid size hM and varying number of samples. Therefore, we run both algorithms in the default setting for 125 samples and double the number of samples until we reach 128 000 samples. We present the speedup of the TA compared to the SA in figure 4.

Figure 4.

Figure 4. Speedup of the TA compared to the SA for fixed grid size and a varying number of samples.

Standard image High-resolution image

We observe that the speedup curve grows steadily and flattens as the number of samples increases. This is due to the fact that the influence of the precomputation time of the TA, which is ${\bar{T}}_{\text{p}}\approx 13.71$ s on average, decreases with growing number of samples. As mentioned in section 6, the speedup is bounded by the quotient $\frac{{T}_{\text{s}}}{{T}_{\text{e}}}$. We insert the average time ${\bar{T}}_{\text{s}}\approx 0.14\enspace \text{s}$ needed for one sample using the SA and the average time ${\bar{T}}_{\text{e}}\approx 0.0037\text{s}$ needed for one sample using the TA and obtain an upper bound of 39.60 for the speedup. For 128 000 samples the speedup is 36.77, which corresponds to a runtime of 5.27 h using the SA, compared to 0.14 h(≈8.59 min) using the TA.

Further, we run both algorithms in the default setting for grid sizes ${h}_{\text{M}}=\frac{1}{3},\frac{1}{6},\frac{1}{9},\frac{1}{12}$. Furthermore, to reduce the overall computation time, we reduce the number of samples to 100. We use our findings from section 6 to extrapolate the measured sampling times to 100 000 samples. To be more precise, we first compute the average time for drawing one sample with both algorithms, then scale this number by 100 000 to achieve estimates on Te for the TA and Ts for the SA and add the measured precomputation time Tp for the TA. Note that the precomputation time is independent of the number of samples. Figure 5 shows the speedup resulting from this extrapolation.

Figure 5.

Figure 5. Estimated speedup of the TA compared to the SA for varying grid size hM and 100 000 samples.

Standard image High-resolution image

As expected, we observe that the speedup in figure 5 grows steadily and is unbounded in contrast to the speedup for fixed grid size and increasing number of samples. For ${h}_{\text{M}}=\frac{1}{12}$ we observe a speedup of 650.81 which corresponds to a runtime of 2.87 h for the TA compared to a runtime of 1864.91 h(≈77.70 d) for the SA.

We expect that the TA outperforms the SA for realistic muscle geometries or fine grid sizes. Furthermore, we conclude that using the TA enables us to solve problems that are infeasible to solve using the SA, in reasonable time.

8. Related work

Surface EMG signals have been used to localize the innervation zones of skeletal muscle, see, e.g., [32] and the references therein. Furthermore, researchers are interested in denoising surface EMG signals, i.e., in reducing crosstalk of neighboring muscles or neighboring muscle regions, see [23]. In [24], regularization methods for inverse problems are used to reduce crosstalk in surface EMG signals.

To overcome the ill-posedness of inverse problems, regularization methods like the Tikhonov regularization are a widely used ansatz, see, e.g., [7] and the references therein. The Tikhonov regularization was used in [8, 33] to reconstruct the electrical conductivity of biological tissue from EMG measurements. Moreover, in [33] model order reduction was used to accelerate the computations.

For other Bayesian inverse problems different approaches to speedup the sampling process have been examined, e.g., in [18] the authors used a method based on polynomial chaos expansions to construct a surrogate of the forward problem. In [29] quasi Monte Carlo methods and multilevel Monte Carlo methods were used to accelerate the convergence of the sampling algorithm.

Furthermore, low-rank tensor methods were examined in the context of Bayesian inverse problems. In [5] low-rank tensor formats were used to compute a surrogate of the target distribution. There the authors directly approximated the target distribution using a generalization of the cross approximation. In [6] the authors used low-rank tensor formats to approximate the stochastic Galerkin solution of the parameter-dependent forward problem to achieve a discrete representation of the posterior distribution.

9. Conclusion

Applying mathematical theory results in an efficient algorithm to solve the Bayesian inverse EMG problem. Proving the well-posedness of the Bayesian inverse EMG problem guarantees the convergence of this algorithm. Further, proving a data-sparse representation of the forward EMG problem allows for the efficient precomputation of the parameter-dependent forward solution. The presented numerical experiments support this mathematical theory but also indicate that a high number of samples is required to obtain accurate results. The sampling algorithm which uses the data-sparse representation of the forward EMG problem computes this high number of samples in a reasonable time.

The mathematical theory of the Bayesian inverse problem holds for general symmetric positive definite conductivities and thus for arbitrary muscle fiber directions. The low-rank representation of the forward EMG problem, however, holds for fixed muscle fiber directions only. In our numerical experiments the speedup using tensor methods enables solving problems with grid sizes that are infeasible using classical methods. Therefore, future work is the generalization of the low-rank representation of the right-hand side of the forward EMG problem to arbitrary muscle fiber directions. This generalization could enable the computation of realistic problems in medical applications and lead to a non-invasive and radiation-free imaging method.

Acknowledgments

We thank Maren Klever for her critical reading of and suggestions for this article. We thank the anonymous referees for helping to improve the article by their suggestions. This research was partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy—EXC-2075—390740016. L. Grasedyck and T. A. Werthmann have been supported by the DFG within the DFG priority program 1886 (SPPPoly) under Grant No. GR-3179/5-1.

Footnotes

Please wait… references are loading.
10.1088/1361-6420/abd85a