Partial directed coherence: twenty years on some history and an appraisal

Baccalá, Luiz A.; Sameshima, Koichi

doi:10.1007/s00422-021-00880-y

Partial directed coherence: twenty years on some history and an appraisal

60th Anniversary Retrospective
Published: 08 June 2021

Volume 115, pages 195–204, (2021)
Cite this article

Download PDF

Biological Cybernetics Aims and scope Submit manuscript

Partial directed coherence: twenty years on some history and an appraisal

Download PDF

3674 Accesses
7 Citations
Explore all metrics

Abstract

Here while we reminisce about how partial directed coherence was proposed, its motivation and evolution, we take the opportunity to relate it to some of its kin quantities and some of its offspring. Emphasis is placed on our development of asymptotic criteria to place it as a reliable investigation tool, where the connectivity detection problem is completely solved as opposed to what we call the characterization problem. We end by musing over some points now on our wishlist.

1 Introduction

It is not often one can look back and take stock twenty years on, especially when it is by invitation by the very journal that carried this contribution. While retracing memory lane, it has been nice to realize how much the journal itself was a source of inspiration towards the road we ultimately took.

To tell how Partial Directed Coherence (PDC) was conceived and some of its natural consequences, we have split the paper into its “pre-history” (Sect. 2), the questions techniques then available raised (Sect. 2.1) followed by its actual proposition and justification (Sect. 3) and development (Sects. 4 and 6), followed by key inferential aspects (Sect. 5) and its duality to directed transfer function (DTF) (Kamiński and Blinowska 1991) that justify an upgrade to how one must interpret connectivity descriptions (Sect. 7). We also briefly cover some of its variants and its extension to characterizing nonlinear interactions (Sects. 8 and 9). We end with some considerations regarding what may follow (Sect. 10). The exposition here is broad and the reader is asked to examine details in the referenced literature.

2 Prolegomena

It all began when one of us was looking for possible pace maker candidates for theta rhythms in rat brains. In those days, correlation methods were the ‘de facto’ approach to characterizing one brain structure’s influence over another (Glaser and Ruchkin 1976). The idea is that waveforms measuring brain activity in one structure $x_1(n)$ would be related to that of another structure $x_2(n)$ if they looked similar, eventually making an allowance for some propagation delay that cross-correlation functions can capture. This has a number of shortcomings. Correlation functions are inherently pairwise structure descriptions and often lead to erroneous conclusions if used for network reconstructions as we examined at some length in Baccalá and Sameshima (2001a).

Furthermore signal shapes often suffer considerable change under propagation (due to dispersion) which reduces the usefulness of waveform similarity-based characterizations like correlation. Some mitigation can be attained via coherency analysis, the frequency domain counterpart of correlation analysis. In fact, one may show that correlation/coherency are optimal when one can set the value of what the input signal is, say $x_1(n)$, as the mean-squared error $\epsilon (n)$ in:

$$\begin{aligned} x_2(n)=h \star x_1(n)+\epsilon (n), \end{aligned}$$

(1)

is minimized for a suitable choice of the shape propagation/distortion filter h(n) where $\star $ stands for the convolution operator. Use of models like (1) is widespread in engineering.

Rather than defined by experimenters, neural signals of interest are derived from observations, so that Eq. (1) applications call for ‘a priori’ consideration as to what the input signal is, and lacking that, require a choice based on relative overall signal delay.

We were encouraged by the appearance of an article in Biological Cybernetics (Schnider et al. 1989) advocating a ‘systems identification’ approach resting on Granger Causality (GC) (Granger 1969). It treated $x_1(n)$ and $x_2(n)$ without the need for predefining which one was the input and had the advantage of allowing feedback detection (Sims 1972).

Briefly, a time series is said to Granger cause another when information from the former can be used to improve prediction of the latter. Because $x_1(n)$ may G-cause $x_2(n)$ without $x_2(n)$ necessarily having to G-cause $x_1(n)$, one may characterize the direction of information flow. Feedback is present when information travels to and fro.

At this point it is worth bringing in some technical points. In order to make Granger’s proposal operational, one must choose how one is to predict a time series based on another’s past. The only operationally viable approach at the time was to use linear prediction by fitting autoregressive models, where a predictor that only considers its own past

$$\begin{aligned} \hat{\tilde{x}}_1(n)=\sum _{k} \tilde{a}_{11}(k) x_1(n-k), \; k>0 \end{aligned}$$

(2)

is compared to another that includes the other series’s past

$$\begin{aligned} {\hat{x}}_1(n)=\sum _{k} a_{11}(k) x_1(n-k)+\sum _{k} a_{12}(k) x_2(n-k) , \; k>0 \end{aligned}$$

(3)

where the $\tilde{a}_{ii}(k)$, $a_{ij}(k)$ coefficients are obtained by minimizing the respective mean-squared prediction errors $E[|x_1(n)-\hat{\tilde{x}}_1(n)|^2]$ and $E[|x_1(n)-{\hat{x}}_1(n)|^2]$ which are then compared to see if the addition of the second term in Eq. (3) significantly improves prediction (Geweke 1982). One may alternatively test for the null hypothesis $a_{12}(k)=0$ for all k in (3). Similar reasoning applies if $x_1(n)$ and $x_2(n)$ are swapped, so that one may efficiently test both GC directions simultaneously through a bivariate autoregressive model for $[x_1(n) \; x_2(n)]^T$:

$$\begin{aligned} \left[ \begin{array}{c} x_1(n)\\ x_2(n) \end{array} \right] =\sum _k \left[ \begin{array}{cc} a_{11}(k) \; a_{12}((k)\\ a_{21}(k) \; a_{22}((k) \end{array} \right] \left[ \begin{array}{c} x_1(n-k)\\ x_2(n-k) \end{array} \right] + \left[ \begin{array}{c} w_1(n)\\ w_2(n) \end{array} \right] \end{aligned}$$

(4)

where $w_i(n)$ stand for innovations processes with covariance matrix

$$\begin{aligned} \varSigma _{\mathbf {w}}=\left[ \begin{array}{cc} \sigma _1^2 &{} \sigma _{12}\\ \sigma _{21} &{} \sigma _2^2 \end{array}\right] \end{aligned}$$

(5)

so that proper testing may proceed by testing for $a_{ij}(k)=0$ directly.

Through Schnider et al. (1989) we also were first introduced to the idea of Directed Coherence (DC) aimed at representing GC in the frequency domain as first discussed in Saito and Harashima (1981)’s hard to find paper. This is possible because linear models like (4) allow representing the joint spectral density matrix of the time series as:

$$\begin{aligned} \begin{aligned} {\mathbf {S}}(\nu )={\mathbf {H}}(\nu ){\varvec{\varSigma }}_{\mathbf {w}}{\mathbf {H}^{H}}(\nu ), \end{aligned} \end{aligned}$$

(6)

where $^H$ is the Hermitian transpose and

$$\begin{aligned} {\mathbf {H}}(\nu )= \left[ \begin{array}{cc} H_{11}(\nu ) \; H_{12}(\nu )\\ H_{21}(\nu ) \; H_{22}(\nu ) \end{array} \right] ={\bar{\mathbf {A}}}(\nu )^{-1} \end{aligned}$$

(7)

for

$$\begin{aligned} {\bar{\mathbf {A}}}(\nu )= \left[ \begin{array}{cc} 1-\sum _k a_{11}(k) e^{-{\mathbf {j}}2\pi k \nu } &{} -\sum _k a_{12}(k) e^{-{\mathbf {j}}2\pi k \nu }\\ -\sum _k a_{21}(k) e^{-{\mathbf {j}}2\pi k \nu } &{} 1-\sum _k a_{22}(k) e^{-{\mathbf {j}}2\pi k \nu } \end{array} \right] \end{aligned}$$

(8)

with ${\mathbf {j}}=\sqrt{-1}$ for the normalized frequency $|\nu |\le 0.5$ so that $a_{ij}(k)=0$ if and only if $H_{ij}(\nu )=0$ for all $\nu $.

Under this formulation, the coherency between the time series is given by

$$\begin{aligned} \begin{aligned} C_{ij}(\nu )&= \dfrac{S_{ij}(\nu )}{\sqrt{S_{ii}(\nu ) \; S_{jj}(\nu )}} \\ {}&= \dfrac{{\mathbf {h}}_i(\nu )\varSigma _{\mathbf {w}}{\mathbf {h}}^H_j(\nu )}{\sqrt{ ({\mathbf {h}}_i(\nu )\varSigma _{\mathbf {w}}{\mathbf {h}}_i^H(\nu ))({\mathbf {h}}_j(\nu )\varSigma _{\mathbf {w}}{\mathbf {h}^H}_j(\nu ))}} \end{aligned} \end{aligned}$$

(9)

where ${\mathbf {h}}_i(\nu )=[ H_{i1}(\nu ) \; H_{i2}(\nu )]$.

When (9) is fully expanded, the term that carries GC from $x_j(n)$ to $x_i(n)$ is represented by:

$$\begin{aligned} DC_{j\rightarrow i} (\nu )=\dfrac{\sigma _iH_{ij}(\nu )}{\sqrt{S_{ii}(\nu )}} \end{aligned}$$

(10)

as it is zero if and only if $x_j(n)$ does not G-cause $x_i(n)$ and where $\sigma _i$ stands for $w_i(n)$’s standard deviation. This is essentially Saito & Harashima’s directed coherence.

Though GC/DC allowed pinpointing information flow direction, these descriptions remained essentially restricted to the pairwise case of dynamical observations, which is however limited when trying to describe relationships from a network standpoint for $x_1(n), \dots ,$ $x_N(n)$ simultaneously observed quantities.

There is no reason why one should be restricted to bivariate models. Indeed, one can generalize (4) for the multivariate time series vector as ${\mathbf {x}}(n)=[x_1(n), \dots ,$ $x_N(n)]^T$ as:

$$\begin{aligned} {\mathbf {x}}(n)=\sum _k {\mathbf {A}}_k {\mathbf {x}}(n-k)+ {\mathbf {w}}(n) \end{aligned}$$

(11)

so the state of affairs was about to change with Kamiński and Blinowska (1991)’s proposal of DTF, where they used the inverse of (8)’s N-dimensional counterpart to compute the N-dimensional version of (7) whereby connectivity from $x_j(n)$ to $x_i(n)$ could be portrayed through:

$$\begin{aligned} DTF_{ij}(\nu )=\dfrac{H_{ij}(\nu )}{\sqrt{\sum _k |H_{ik}(\nu )|^2}}=\dfrac{H_{ij}(\nu )}{\sqrt{\mathbf {h_i}(\nu ){\mathbf {h}}_i^H(\nu )}} \end{aligned}$$

(12)

which differs from DC in its complete dispensing of ${\mathbf {w}}(n)$’s covariance matrix, even when $N=2$. Much as in the bivariate case, to compute $H_{ij}(\nu )$ one had to invert $\bar{{\mathbf {A}}}(\nu )$ whose entries are given by:

$$\begin{aligned} \bar{A}_{ij}(\nu )=\left\{ \begin{array}{l} 1-\sum _k a_{ij}(r)e^{-{\mathbf {j}}2\pi \nu k},\;\text {if}\;\;i=j \\ \;\; \; -\sum _k a_{ij}(r)e^{-{\mathbf {j}} 2\pi \nu k},\;\text {otherwise} \end{array} \right. \end{aligned}$$

(13)

for ${\mathbf {h}}_i(\nu )=[ H_{i1}(\nu ), \dots , H_{iN}(\nu )]$ in the now more general case.

Multivariate time domain GC representation had also progressed. Noteworthy was the work of Geweke (1984) who selected the i and j series of interest and built linear models conditioned on the other series to characterize causality. Another approach was due to Lütkepohl who put forward a time domain null hypothesis test based on rejecting $a_{ij}(k)=0$ as these describe $x_j(n)$ past influences on $x_i(n)$ (Lütkepohl 1993). At that time, Lütkepohl’s approach seemed better not only because it was computationally simpler but also because it possessed theoretically rigorous asymptotics.

2.1 Goals and paradoxes

Though we examined a slightly modified version of DTF:

$$\begin{aligned} \gamma _{ij}(\nu )=\dfrac{\sigma _j H_{ij}(\nu )}{\sqrt{\sum _k \sigma ^2_k |H_{ik}(\nu )|^2}}, \end{aligned}$$

(14)

which we termed Directed Coherence as it is closer to Saito & Harashima’s definition through the use of $w_i(n)$’s variances in (Saito and Harashima 1981), we showed its occasional inconsistencies with Lütkepohl’s approach. The paradox was that results from simulated examples of known structure in their graph theoretical representation yielded more directed arrows than those in the original models when using DTF/DC, whereas Lütkepohl’s approach was always able to correctly spot the theoretically imposed connections (Baccalá et al. 1998).

The answer clearly rested on the fact that, when $N>2$, $H_{ij}(\nu )$ is not just a function of $A_{ij}(\nu )$ but depends on other entries of $\bar{{\mathbf {A}}}(\nu )$ as well. What was needed was a reasonable descriptor based on $\bar{{\mathbf {A}}}(\nu )$ to match Lütkepohl’s approach.

3 The benefits of looking widely

To have this intuition is one thing, to prove where it comes from quite another. Oddly, because of the similarity of the formulas in the $N=3$ case for partial coherence (Gersch 1972), we found our general answer and method of proof from a book on differential geometry (Burke 1985) wherefrom it became clear that we could write another frequency domain quantity of interest in the $N>3$ case as well. This quantity, partial coherency is directly based on the $a_{ij}(k)$ and equals:

$$\begin{aligned} \kappa _{ij}(\nu )=\dfrac{\bar{{\mathbf {a}}}^H_i(\nu ){\varSigma }_{\mathbf {w}}^{-1}\bar{{\mathbf {a}}}_j(\nu )}{\sqrt{(\bar{{\mathbf {a}}}^H_i(\nu ){\varSigma }_{\mathbf {w}}^{-1}\bar{{\mathbf {a}}}_i(\nu ))(\bar{{\mathbf {a}}}^H_j(\nu ){\varSigma }_{\mathbf {w}}^{-1}\bar{{\mathbf {a}}}_j(\nu ))}}, \end{aligned}$$

(15)

as shown in Baccalá (2001), where $\bar{{\mathbf {a}}}_i(\nu )$ stands for the i-th column of $\bar{{\mathbf {A}}}(\nu )$.

At this point, partial coherency/e deserves a brief recap. It amounts to the ordinary coherency/e between residues, i.e. that part of what is left from a time series when the effect of other left over series are subtracted:

$$\begin{aligned} \eta _i(n)=x_i(n)-{\mathfrak {P}}_{i}(n), \end{aligned}$$

(16)

where ${\mathfrak {P}}_{i}(n)$ is the best optimum linear predictor of $x_i(n)$ using the other series, so that $\kappa _{ij}(\nu )$ is just the coherency between $\eta _i(n)$ and $\eta _j(n)$ that result. It is common to see partial coherence computed as a ratio of suitably chosen minors from a normalized spectral density matrix that describes the pairwise coherencies when $N>3$. Minor determinants, however, are neither needed nor computationally efficient if a multivariate autoregressive model is available for the model already automatically encodes (16) under its hood. Furthermore, this also avoids numerical error propagation.

Close examination of (15) reveals its formal similarity to the second equality in (9) leading naturally to partial directed coherence’s definition in Baccalá and Sameshima (2001b) as:

$$\begin{aligned} \pi _{ij}(\nu )=\dfrac{\bar{A}_{ij}(\nu )}{\sqrt{\sum _{k=1}^N|\bar{A}_{kj} (\nu )|^2}} =\dfrac{\bar{A}_{ij}(\nu )}{\sqrt{\bar{{\mathbf {a}}}_j^H(\nu )\bar{{\mathbf {a}}}_j(\nu )}} \end{aligned}$$

(17)

obtained by ditching terms involving $\sigma _i$’s as had been done for DTF’s introduction, where $\bar{{\mathbf {a}}}_j(\nu )$ is the j-th column of $\bar{{\mathbf {A}}}(\nu )$. Note that, when $N=2$, DTF and PDC are numerically equal when $i \ne j$.

4 Sunspots in your life

By examining the relationship between solar activity and melanomas (Baccalá et al. 2007) we realized that DTF/PDC failed at exposing the correct causal interaction in contrast to Lütkepohl’s approach. At the time, we were applying the ‘ad hoc’ criterion that $|\pi _{ij}(\nu )|^2>0.1$ was not to be disregarded. As it turned out, the problem was mainly elsewhere as we realized that DC (14) actually worked. The main trouble laid in DTF/PDC’s lack of scale invariance. This led to the introduction of generalized partial directed coherence (gPDC):

$$\begin{aligned} _g\pi _{ij}(\nu )=\dfrac{\bar{A}_{ij}(\nu )/\sigma _i}{\sqrt{\sum _{k=1}^N\dfrac{1}{\sigma ^2_k}|\bar{A}_{kj} (\nu )|^2}} \end{aligned}$$

(18)

A possible quick fix for the original DTF/PDC was to normalize the data by its variance prior to analysis, but this meant larger estimation variability (Baccalá et al. 2007). It is interesting to note that, when Baccalá and Sameshima (2001b) was submitted to Biological Cybernetics, we decided to simplify PDC to (17); a rejected previous version already contained (18) as a possibility. There, it was called PDC with a ‘statistical metric’. This was a simplification of $\bar{{\mathbf {a}}}^H_j(\nu ){\varSigma }_{\mathbf {w}}^{-1}\bar{{\mathbf {a}}}_j(\nu )$ which looks suspiciously like a metric for the $\bar{{\mathbf {a}}}_j(\nu )$ vectors. PDC’s (17) metric was referred to as ‘Euclidean’. This nomenclature still survives in our software package.^{Footnote 1}

The use of the full ${\varSigma }_{\mathbf {w}}^{-1}$ weighing term (information metric) led to information PDC (iPDC) discussed in Sect. 6.

5 Some serious stats

It is perhaps reassuring that, when we replaced the ‘ad hoc’ $|\pi _{ij}(\nu )|^2>0.1$ criterion to reject the null hypothesis

$$\begin{aligned} H_0: |\pi _{ij}(\nu )|^2=0 \end{aligned}$$

(19)

by a rigorously obtained threshold, the apparent wrong Sunspot-Melanoma causality of the original PDC was mended (Sameshima et al. 2014a).

Historically, we were working on $H_0$ inference when Schelter et al. (2006) provided a workable solution. Thankfully this made us further contribute PDC asymptotic confidence intervals in Takahashi et al. (2007).

The generality of the approach in Takahashi et al. (2007) allowed including gPDC and iPDC within a single mathematical framework (Baccalá et al. 2013).

To achieve this we noticed that all PDC forms can be written as a ratio:

$$\begin{aligned} |\pi _{ij}(\nu )|^2=\pi (\varvec{\theta },\nu )=\frac{\pi _n(\varvec{\theta },\nu )}{\pi _d(\varvec{\theta },\nu )}, \end{aligned}$$

(20)

where $\varvec{\theta }$ refers to a vector for all $a_{ij}(k)$ and $\sigma _{ij}$ parameters according to the PDC of choice. Thus

1.
The ratio (20) is asymptotically normal:
$$\begin{aligned} \sqrt{n_s} (\left| {\widehat{\pi }}_{ij}(\nu )\right| ^2-\left| \pi _{ij}(\nu )\right| ^2)\rightarrow {\mathcal {N}}(0,\gamma ^2(\nu )), \end{aligned}$$
(21)
where $\gamma ^2(\nu )$ is a frequency dependent variance whose details depend on the PDC brand of interest (Baccalá et al. 2013) and where $n_s$ is the number of data points used.
2.
Under hypothesis (19), (20) is distributed as
$$\begin{aligned} n_s\; \pi _d(\varvec{\theta },\nu )\; (\left| {\widehat{\pi }}_{ij}(\nu )\right| ^2-\left| \pi _{ij}(\nu )\right| ^2) {\mathop {\rightarrow }\limits ^{d}} \sum _{k=1}^{q} l_k(\nu ) \chi ^2_1, \end{aligned}$$
(22)
whose $l_k(\nu )$ weights depend only on the numerator $\pi _n(\varvec{\theta },\nu )$ while denominator dependence comes only through the left-hand side of (22) alone.

These results allow computing objective frequency dependent decision threshold values and confidence intervals that formerly were only possible through computer intensive resampling approaches (Baccalá et al. 2006).

In the Sunspot-Melanoma case, as few as $n_s=37$ data points suffice for adequate inference. In practice, inference performance is not so simple to characterize as it varies with N and also is a function of the spectra of the time series involved.

Much credit for these developments is due to our former student Dr. Daniel Y. Takahashi (Takahashi 2009). Application of the same approach also allowed us to deduce the allied results (Baccalá et al. 2016) for DTF/DC and its information extension iDTF (Takahashi et al. 2010) (see Sect. 6). They have the same general form as those for PDC.

The use of (22) and (21) respectively address two distinct problems: (a) Connectivity Detection and (b) Connectivity Characterization.

6 Information interpretation

Now back to the information metric. We were faced with the need to consider whether

$$\begin{aligned} _\iota \pi _{ij}(\nu ) =\dfrac{\bar{A}_{ij}(\nu )/\sigma _i}{\sqrt{\bar{{\mathbf {a}}}^H_j(\nu ){\varSigma }_{\mathbf {w}}^{-1}\bar{{\mathbf {a}}}_j(\nu )}} \end{aligned}$$

(23)

had any sensible meaning at all.

To see it we must recap what (11) means. It is a representation of ${\mathbf {x}}(n)$ in its relationships and characteristics and evolution as drawn from all conceivable possibilities provided by the purely random innovations process ${\mathbf {w}}(n)$ whose white amorphous nature is shaped by the filter represented by the ${\mathbf {A}}_k$ matrices in (11).

However, as we have seen, without reference to possibly how ${\mathbf {x}}(n)$ is constructed, we always can obtain each $\eta _i(n)$ through (16) leading to an equivalent representation via the process $\varvec{\eta }(n)=[\eta _1(n),\dots , \eta _N(n)]^T$ which has the same intrinsic information, only that this time it is unveiled differently through the partial coherence (15) that relates the ($x_i(n),x_j(n))$ pair after deducting the effect of the other series. In signal space terms, the ${\mathbf {x}}(n)$ time series representation is reciprocal to $\varvec{\eta }(n)$ (and vice-versa). They describe the same phenomenon.

With these ideas in mind, also in Biological Cybernetics (Takahashi et al. 2010), we showed that (23) was a form of coherency, only this time linking $\eta _j(n)$ to $w_i(n)$. Consequently, under suitable hypotheses, this translates into the mutual information rate between $\eta _j(n)$ to $w_i(n)$ via:

$$\begin{aligned} {\text{ MIR }}(w_i,\eta _j) =-\frac{1}{2}\int ^{0.5}_{-0.5}\log (1-|\iota \pi _{ij}(\nu )|^2)\;d\nu \end{aligned}$$

(24)

The process of obtaining $\varvec{\eta }(n)$ from ${\mathbf {x}}(n)$ is called partialization. It can be applied to ${\mathbf {w}}(n)$ as well, and leads to its reciprocal (partialized) $\varvec{\zeta }(n)$ process so that

$$\begin{aligned} _\iota \gamma _{ij}(\nu )=\dfrac{\sigma _j H_{ij}(\nu )}{\sqrt{{\mathbf {h}}_i(\nu )\varSigma _{\mathbf {w}}{\mathbf {h}}_i^H(\nu )}}, \end{aligned}$$

(25)

that we named information DTF (iDTF) is the coherency between $\zeta _j(n)$ and $x_i(n)$ which translates naturally to a measure of their mutual information rate:

$$\begin{aligned} {\text{ MIR }}(x_i,\zeta _j) =-\frac{1}{2}\int ^{0.5}_{-0.5}\log (1-|_\iota \gamma _{ij}(\nu ))|^2)\;d\nu . \end{aligned}$$

(26)

Here the trick is that iDTF/iPDC describe the relationships between processes with different indices between the partialized and the original processes.The characterization of strict information flow directionality is possible because swapping i for j involves different base quantities, i.e. we go from relating $\eta _j$ to $w_i$ to relating $\eta _i$ to $w_j$ in the iPDC case (and x and $\zeta $ for iDTF).

These developments were an integral part of Dr. Takahashi’s Doctoral Thesis (Takahashi 2009). A simplified exposition of these ideas can be found in (Takahashi et al. 2014).

7 Duality and the need for new concepts

At this point a couple of numerical illustrations may be worthwhile

Example 1

Consider data at time n to be generated by

$$\begin{aligned} \left\{ \begin{aligned} x_1(n) =&-0.95\;x_2(n-1)+w_1(n)\\ x_2(n) =&\ \ 0.95\;x_1(n-1)+x_3(n-1)+w_2(n)\\ x_3(n) =&-0.9025\;x_3(n-2)+w_3(n) \end{aligned} \right. \end{aligned}$$

(27)

where $w_i(n)$ are zero mean independent Gaussian innovations of unit variance. iPDC and iDTF computations are shown respectively in Fig. 1a and b displayed in the ‘standard’ position from ‘Source’ (column) to ‘Target’ (row) structure. A graphical summary is contained in Fig. 2, which contrasts the theoretical structure of (27) with that derived from Fig. 1, whereby it is clear that iPDC mirrors the theoretical structure while iDTF has an additional arrow whose very existence was the drive in looking for PDC in the first place. This extra arrow’s role is to state the fact that there is a signal pathway for $x_3(n)$ to reach $x_1(n)$ even though they are not immediately attached. $x_3(n)$’s action onto $x_1(n)$ takes place as its signal must first travel through $x_2(n)$.

Example 2

Next consider the low pass source signal $x_1(n)$ as directly attached towards $x_3(n)$ through a bandpass filter represented by u(n). There is another pathway from $x_1(n)$ towards $x_3(n)$ that goes through the intermediate step of $x_2(n)$ whose dynamics is specially crafted to mirror the delay pattern imposed by the u(n) filter. This scenario is simulated via:

$$\begin{aligned} \left\{ \begin{aligned} x_1(n) =&\ 0.75\;x_1(n-1)+w_1(n)\\ x_2(n) =&-0.64\;x_2(t-2)+x_1(t-1)+w_2(n)\\ x_3(n) =&-0.5\;x_3(n-1)+x_2(n-1)+a u(n-1) \\&+w_3(n)\\ u(n) =&-0.64\;u(n-2)+x_1(n-1) \end{aligned} \right. \end{aligned}$$

(28)

with $w_i(n)$ as in Example 1. When $a=-1$, even though $x_1(n)$ is ‘physically’ attached to $x_3(n)$, it has no ultimate ‘influence’ on $x_3(n)$. Is this too artificial? This actually happens in the real world, e.g. in multipath cell phone transmission propagation that induce communication loss (Stallings 2005).

As before, the structure of (28) (Fig. 3a) is confirmed in the diagrammatic representation of iPDC (Fig. 3b) as represented by the diagram in (Fig. 3c), whereas $x_1(n)$’s influence cancellation over $x_3(n)$ is evident in Fig. 3d and its allied diagram (Fig. 3e).

These examples show that, when $N>2$, PDC and DTF highlight distinct complementary aspects of dynamic connectivity. These differences lead us to introduce the concepts of Granger Connectivity and Granger Influentiability to refer respectively to what PDC and DTF portray (Baccalá and Sameshima 2014).

In conclusion PDC is centered on describing the directed attaching connections, whereas DTF centers on the overall effect. Both descriptions are informative and valuable in providing a full picture of ‘connectivity’. Alone neither is complete. We often refer to PDC and DTF as dual representations.

8 Theme and variations

One could say that PDC’s main theme is represented by Equation (13). All else, the original PDC itself, gPDC and iPDC included, are mainly variations, even though iPDC’s interpretation is deeper. This claim as to variations is certainly true in at least two ways: (a) alternative methods to characterize what we called Granger Connectivity, which include (Schelter et al. 2009; Plomp et al. 2014) by weighing $A_{ij}(\nu )$ differently and (b) its use as a weighing factor in other estimators (Korzeniewska et al. 2003).

Ideally each such alternative deserves consideration. However, to avoid too shallow a comparison and thereby risk unfairness, we think it is best to say we are happy that the introduction of PDC has led to many characterization alternatives whose common property is that their nullity is synonymous with lack of Granger Connectivity as defined above. Methods, however, differ in both computational and statistical performance.

The core reason for this is that $A_{ij}(\nu )$ is the transfer function linking $\eta _j(n)$, the j-th partialized process, to $w_i(n)$ - the i-th innovations process accounting for the ground observed stochastic variability. This result is contained in Takahashi et al. (2010)’s Appendix.

A similar role is played by $H_{ij}(\nu )$ in producing $A_{ij}(\nu )$’s dual to characterize what we called Granger Influentiability whose many weighing variations lead to different interpretative details (Korzeniewska et al. 2003).

Here one might also ask about PDC’s place among the host of other connectivity estimation methods. If we exclude correlation-based ones and consider only methods that expose connection directionality, it is related to Geweke’s approach (Geweke 1984) which has been lately refined by Barrett and Seth (2010). They represent different ways of subtracting the effect of other observed time series upon the relationship between i and j. PDC’s procedure is however very natural and computationally efficient due to its link to partial coherence. Whereas both methods are applicable to rejecting Granger Connectivity, our numerical experiments under controlled conditions suggest good theoretical fit for PDC asymptotics (Sameshima et al. 2014b) when detecting connectivity when contrasted to other approaches.

‘Saameeshiima! What is the (connection) force?’ was a question by Prof. György Böhm, a former boss. He was not alone in asking. We called it the Connection ‘Characterization’ problem. The definitive answer is not so easy, because there are many connection characterization methods, linear and nonlinear (Sameshima and Baccalá 2014), that differ in how they account for the effect of other series, each in their own specific way.

PDC, through its frequency domain description of connection details, qualifies it as a contender to represent the ‘force’ of Granger Connectivity. Its frequency domain character highlights spectral bands where signal transmission is absent even when overall Granger Connectivity may be present due to other frequencies. A good example of this is discussed in Example 7.3 from Sameshima et al. (2014a) whose notch frequency character comes clearly out. This stresses PDC’s advantage over Lütkepohl’s approach which had been proposed as a time domain test for Granger Causality, thereby just addressing the ‘Detection’ problem without any frequency domain qualification.

In comparison to other approaches, what one can safely say is that different methods behave much like different units to gauge a given physical quantity. If you measure temperature and start with Celsius, you’d better stick to it and not use Fahrenheit, even though homaging your wife for her warmth may sound good in the name of domestic accord. The Cartesian precision behind PDC’s and DTF’s and their rigorous asymptotic variance results (Baccalá et al. 2013, 2016) allow easy and reliable comparison over significant changes under different experimental conditions.

9 What about nonlinearity?

Linear modelling through (11) lies at the core of PDC’s and DTF’s proposals. It is this very linearity that ensures their frequency domain interpretation. Many authors have considered methods to grapple with nonlinear systems. Here, the reader should be forewarned though: many papers that contain proposals for dealing with nonlinearity often suffer from the problem of providing simulated examples composed of nonlinear subsystems that interact linearly, whose connectivity is easily and correctly represented by PDC. Even bilinear interactions can also be detected by PDC though again with no assurance of preset $\alpha $ confidence levels.

One type of interaction that linear methods are entirely unable to capture are those involving even interaction functions—a $x^2_1(n-1)$ affecting $x_2(n)$’s equation is one such simplest example (Massaroppe et al. 2011). But even in this case, through Reproducing Kernel transformations, one can easily deal with the connectivity detection problem. Moreover, thankfully, because of the analogous mathematics, one may even use the very same estimation framework to define appropriate Kernel GC, PDCs and DTFs; for them the same general asymptotic forms hold (Massaroppe and Baccalá 2015, 2019). This is attractive because the required $n_s$ number of data points for inference is much smaller than that of other competing nonlinear methods. In fact, Granger Connectivity Detection becomes feasible even without having to solve the challenging pre-image problem (Massaroppe and Baccalá 2019) required for dynamic prediction.

10 What now? a sum up and a wishlist.

Any set of $x_i(n)$ signals is amenable to PDC/DTF description. This makes it an universal decomposition of the set’s intrinsic relationships. Technically the spectral factorization behind (11) may be more or less difficult to carry out, but there are options (Wilson 1972). Further systematic investigation is under way.

In fact, the lack of proficiency by practical researchers with multivariate time series modelling constitutes a challenge to the translation of these methods to the clinical practice, something lying at the top of our wishlist. We also would like to see it used beyond neuroscience which inspired it and this is slowly beginning (Raphaldini et al. 2021).

We hope this retrospective helps overcome the educational road block to PDC/DTF employment by retelling their history and our motivation. Naturally further education of the ever evolving world of time series techniques is desirable. Things like employing state space estimation methods and time varying approaches are examples of interesting practical developments (Astolfi et al. 2008; Sommerlade et al. 2014). Further analysis of all the latter approaches would take us too far afield here.

For now we are happy that, at only twenty years of age, PDC has attracted more attention than one could wish. Surely people still feel more comfortable with correlation based ideas, they have been around at least 100 years now, and are only becoming enthused with Granger causality (Granger 1969) at roughly half that age though GC still induces heated debate given the conceptually loaded nature of the word ‘causality’, which (sells but) evokes a number of different and sometimes conflicting ideas (Baccalá and Sameshima 2014; Pearl 2000; Spirtes et al. 2001; Valdés-Sosa et al. 2011). No wonder also here, there is still a long way to go.

Another set of challenges comes from the care needed to preserve the information of interest in the signals. Things like how power line interference should be removed (Baccalá and Sameshima 2015) or the choice of sampling frequency can impact estimate reliability and inference via models like (11). The same is true of artifact removal algorithms. Currently, there is no satisfactory inventory of those challenges.

From our initial goal of looking into a data driven tool to help reverse engineer brain dynamical relationships we realized that reliable inferential results were central as all too often researchers judge methods based on confirmatory bias, i.e. their preconceptions as to how the brain works. This is what moved us to stress statistically robust gauging methods which are at their most interesting when they provide challenging evidence to preconceived ideas and that is what we hope these developments can furnish.

Ideally brain dynamical reverse engineering requires the availability of neural signals in their purest form. Scalp EEG, due to reflecting mixtures (at the sensor space) of deeper brain signal sources (the source space) can be misleading (a.k.a. due to the volume conduction problem) calling for methods to address it. Whereas PDC/DTF decompositions are still valid at the sensor space, one can expect difficulties recovering source space connectivity. Field inversion methods have been successfully applied in addressing this via PDC applied to source space reconstructed data (Amaral et al. 2017). A modified PDC version has also been proposed and evaluated (Omidvarnia et al. 2014).

Another clear challenge stems from the need for dimension reduction. Adequate dynamical characterization requires as many representative signals as possible be included, yet there are practical limits to how many simultaneous time series models like (11) can handle for a given number $n_s$ of time observations. To achieve this, sparse time series modelling (Valdés-Sosa et al. 2005) and spatial wavelet preprocessing (in fMRI) have been used (Vieira et al. 2014, 2015).

In conclusion, there is still a lot to be done. Long live PDC, its kin and offspring!

Notes

https://www.lcs.poli.usp.br/~baccala/pdc/CRCBrainConnectivity/AsympPDC/asymp_package_v2b.zip

References

Amaral SR, Baccalá LA, Barbosa LS, Caticha N (2017) Backward renormalization-group inference of cortical dipole sources and neural connectivity efficacy. Phys Rev E 95: https://doi.org/10.1103/PhysRevE.95.062415
Article PubMed Google Scholar
Astolfi L, Cincotti F, Mattia D, De Vico Fallani F, Tocci A, Colosimo A, Salinari S, Marciani MG, Hesse W, Witte H, Ursino M, Zavaglia M, Babiloni F (2008) Tracking the time-varying cortical connectivity patterns by adaptive multivariate estimators. IEEE T Bio-Med Eng 55:902–913. https://doi.org/10.1109/TBME.2007.905419
Article CAS Google Scholar
Baccalá LA (2001) On the efficient computation of partial coherence from multivariate auto-regressive models. In: Proceeding of world multiconference on systemics, cybernetics and informatics. vol 6, pp 10–14
Baccalá LA, Sameshima K (2001a) Overcoming the limitations of correlation analysis for many simultaneously processed neural structures. Prog Brain Res Adv Neural Popul Coding 130:33–47. https://doi.org/10.1016/S0079-6123(01)30004-3
Article Google Scholar
Baccalá LA, Sameshima K (2001b) Partial directed coherence: a new concept in neural structure determination. Biol Cybern 84:463–474. https://doi.org/10.1007/PL00007990
Article PubMed Google Scholar
Baccalá LA, Sameshima K (2014) Causality and influentiability: The need for distinct neural connectivity concepts. In: Ślezak D, Tan AH, Peters JF, Schwabe L (eds) Brain Informatics and Health, Springer International Publishing, Cham, pp 424–435, https://doi.org/10.1007/978-3-319-09891-3_39
Baccalá LA, Sameshima K (2015) On neural connectivity estimation problems. In: 2015 37th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 5400–5403, https://doi.org/10.1109/EMBC.2015.7319612
Baccalá LA, Sameshima K, Ballester G, Valle AC, Timo-Iaria C (1998) Studying the interaction between brain structures via directed coherence and Granger causality. Applied Sig Process 5:40–48, http://www.lcs.poli.usp.br/~baccala/pdc/papers/asp.pdf
Baccalá LA, Takahashi DY, Sameshima K (2007) Generalized partial directed coherence. In: 15th international conference on digital signal processing, pp 163–166, https://doi.org/10.1109/ICDSP.2007.4288544
Baccalá LA, De Brito CSN, Takahashi DY, Sameshima K (2013) Unified asymptotic theory for all partial directed coherence forms. Philos T Roy Soc A 371(1997):1–13. https://doi.org/10.1098/rsta.2012.0158
Article Google Scholar
Baccalá LA, Takahashi DY, Sameshima K (2016) Directed transfer function: unified asymptotic theory and some of its implications. IEEE T Bio-Med Eng 63:2450–2460. https://doi.org/10.1109/TBME.2016.2550199
Article Google Scholar
Baccalá LA, Takahashi DY, Sameshima K (2006) Computer intensive testing for the influence between time series. In: Schelter B, Winterhalter M, Timmer J (eds) Handbook of Time Series Analysis, Wiley-VCH, Weinheim, chap 16, pp 411–436, https://doi.org/10.1002/9783527609970.ch16
Barrett AB, Seth AK (2010) Multivariate Granger causality and generalized variance. Phys Rev E 81:1–14. https://doi.org/10.1103/PhysRevE.81.041907
Article CAS Google Scholar
Burke WL (1985) Applied differential geometry. Cambridge University Press, Cambridge
Book Google Scholar
Gersch W (1972) Causality or driving in electrophysiological signal analysis. Math Biosci 14:177–196. https://doi.org/10.1016/0025-5564(72)90017-x
Article Google Scholar
Geweke JF (1982) Measurement of linear dependence and feedback between multiple time series. J Am Stat Assoc 77:304–313. https://doi.org/10.2307/2287238
Article Google Scholar
Geweke JF (1984) Measures of conditional linear dependence and feedback between time series. J Am Stat Assoc 79:907–915. https://doi.org/10.1080/01621459.1984.10477110
Article Google Scholar
Glaser E, Ruchkin DS (1976) Principles of neurobiological signal analysis. Academic Press, New York
Google Scholar
Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37:424–438. https://doi.org/10.2307/1912791
Article Google Scholar
Kamiński M, Blinowska KJ (1991) A new method of the description of the information flow in brain structures. Biol Cybern 65:203–210. https://doi.org/10.1007/BF00198091
Article PubMed Google Scholar
Korzeniewska A, Manzczak M, Kamiński M, Blinowska KJ, Kasicki S (2003) Determination of information flow direction among brain structures by a modified directed transfer function (dDTF) method. J Neurosci Meth 125:195–207. https://doi.org/10.1016/S0165-0270(03)00052-9
Article Google Scholar
Lütkepohl H (1993) Introduction to multiple time series analysis. Springer, New York
Book Google Scholar
Massaroppe L, Baccalá LA (2019) Kernel methods for nonlinear connectivity detection. Entropy 21:610. https://doi.org/10.3390/e21060610
Article PubMed Central Google Scholar
Massaroppe L, Baccalá LA (2015) Kernel-nonlinear-PDC extends partial directed coherence to detecting nonlinear causal coupling. In: 2015 37th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 2864–2867, https://doi.org/10.1109/EMBC.2015.7318989
Massaroppe L, Baccalá LA, Sameshima K (2011) Semiparametric detection of nonlinear causal coupling using partial directed coherence. In: 2011 annual international conference of the IEEE engineering in medicine and biology society, pp 5927–5930, https://doi.org/10.1109/IEMBS.2011.6091466
Omidvarnia A, Azemi G, Boashash B, O’toole JM, Colditz PB, Vanhatalo S (2014) Measuring time-varying information flow in scalp EEG signals: orthogonalized partial directed coherence. IEEE T Bio-Med Eng 61:680–693. https://doi.org/10.1109/TBME.2013.2286394
Article Google Scholar
Pearl J (2000) Causality: models, reasoning, and inference. Cambridge University Press, Cambridge
Plomp G, Quairiaux C, Michel CM, Astolfi L (2014) The physiological plausibility of time-varying Granger-causal modeling: normalization and weighting by spectral power. NeuroImage 97:206–216. https://doi.org/10.1016/j.neuroimage.2014.04.016
Article PubMed Google Scholar
Raphaldini B, Teruya ASW, da Silva Leite Dias P, Massaroppe L, Takahashi DY (2021) Stratospheric ozone and quasi-biennial oscillation (QBO) interaction with the tropical troposphere on intraseasonal and interannual timescales: a normal-mode perspective. Earth Syst Dyn 12:83–101. https://doi.org/10.5194/esd-12-83-2021
Article Google Scholar
Saito Y, Harashima H (1981) Tracking of information within multichannel EEG record - causal analysis in EEG. In: Yamaguchi N, Fujisawa K (eds) Recent Advances in EEG and EMG Data Processing. Elsevier, Amsterdam, pp 133–146
Google Scholar
Sameshima K, Baccalá LA (eds) (2014) Methods in brain connectivity inference through multivariate time series analysis. CRC Press, Boca Raton
Google Scholar
Sameshima K, Takahashi DY, Baccalá LA (2014a) Asymptotic PDC properties. In: Sameshima K, Baccalá LA (eds) Methods in Brain Connectivity Inference through Multivariate Time Series Analysis, CRC Press, Boca Raton, chap 7, pp 113–131, https://doi.org/10.1201/b16550-14
Sameshima K, Takahashi DY, Baccalá LA (2014b) On the statistical performance of connectivity estimators in the frequency domain. In: Ślezak D, Tan AH, Peters JF, Schwabe L (eds) Brain Informatics and Health. Springer International Publishing, Cham, pp 412–423. https://doi.org/10.1007/978-3-319-09891-3_38
Schelter B, Winterhalder M, Eichler M, Peifer M, Hellwig B, Guschlbauer B, Lücking CH, Dahlhaus R, Timmer J (2006) Testing for directed influences among neural signals using partial directed coherence. J Neurosci Meth 152:210–219. https://doi.org/10.1016/j.jneumeth.2005.09.001
Article Google Scholar
Schelter B, Timmer J, Eichler M (2009) Assessing the strength of directed influences among neural signals using renormalized partial directed coherence. J Neurosci Meth 179:121–130. https://doi.org/10.1016/j.jneumeth.2009.01.006
Article Google Scholar
Schnider SM, Kwong RH, Lenz FA, Kwan HC (1989) Detection of feedback in the central nervous system using system identification techniques. Biol Cybern 60:203–212. https://doi.org/10.1007/BF00207288
Article CAS PubMed Google Scholar
Sims CA (1972) Money, income, and causality. Am Econ Rev 62:540–552, http://www.jstor.org/stable/1806097
Sommerlade L, Thiel M, Platt B, Plano A, Riedel G, Grebogi C, Mader W, Mader M, Timmer J, Schelter B (2014) Time-variant estimation of connectivity and Kalman filter. In: Sameshima K, Baccalá LA (eds) Methods in brain connectivity inference through multivariate time series analysis, CRC Press, Boca Raton, chap 9, https://doi.org/10.1201/b16550-17
Spirtes P, Glymour C, Scheines R (2001) Causation, prediction, and search, 2nd edn. The MIT Press, Cambridge, MA
Stallings W (2005) Wireless communications and networks, 2nd edn. Pearson Prentice Hall, Upper Saddle River
Google Scholar
Takahashi DY (2009) Medidas de fluxo de informação com aplicação em neurociência. PhD thesis, Universidade de São Paulo, https://doi.org/10.11606/T.95.2009.tde-07062011-115256
Takahashi DY, Baccalá LA, Sameshima K (2007) Connectivity inference between neural structures via partial directed coherence. J Appl Stat 34:1259–1273. https://doi.org/10.1080/02664760701593065
Article Google Scholar
Takahashi DY, Baccalá LA, Sameshima K (2010) Frequency domain connectivity: an information theoretic perspective. In: 2010 Annual international conference of the ieee engineering in medicine and biology, pp 1726–1729, https://doi.org/10.1109/IEMBS.2010.5626700
Takahashi DY, Baccalá LA, Sameshima K (2014) Information partial directed coherence. In: Sameshima K, Baccalá LA (eds) Methods in brain connectivity inference through multivariate time series analysis, CRC Press, Boca Raton, chap 5, pp 75–86, https://doi.org/10.1201/b16550-12
Valdés-Sosa PA, Sánchez-Bornot JM, Lage-Castellanos A, Vega-Hernández M, Bosch-Bayard J, Melie-García L, Canales-Rodríguez E (2005) Estimating brain functional connectivity with sparse multivariate autoregression. Philos T Roy Soc B 360:969–981. https://doi.org/10.1098/rstb.2005.1654
Article Google Scholar
Valdés-Sosa PA, Roebroeck A, Daunizeau J, Friston K (2011) Effective connectivity: influence, causality and biophysical modeling. NeuroImage 58:339–361. https://doi.org/10.1016/j.neuroimage.2011.03.058
Article PubMed Google Scholar
Vieira G, Amaro E, Baccalá LA (2014) Local sparse component analysis for blind source separation: An application to resting state fMRI. In: 2014 36th annual international conference of the IEEE engineering in medicine and biology society, pp 5611–5614, https://doi.org/10.1109/EMBC.2014.6944899
Vieira G, Amaro E, Baccalá LA (2015) Local dimension-reduced dynamical spatio-temporal models for resting state network estimation. Brain Inform 2:53–63. https://doi.org/10.1007/s40708-015-0011-5
Article PubMed PubMed Central Google Scholar
Wilson GT (1972) The factorization of matricial spectral densities. SIAM J Appl Math 23:420–426. https://doi.org/10.1137/0123044
Article Google Scholar

Download references

Acknowledgements

L.A.B. was funded by CNPq, grant number 308073/2017-7. L.A.B and K.S. were partially supported by FAPESP Grant 2017/12943-8. Both are attached to the Center for Interdisciplinary Research on Applied Neurosciences (NAPNA), Universidade de São Paulo, São Paulo, SP, Brazil.

Author information

Authors and Affiliations

Escola Politécnica, Universidade de São Paulo, Av. Prof. Luciano Gualberto, Trav. 3, #138, São Paulo, SP, Brazil
Luiz A. Baccalá
Departamento de Radiologia & Oncologia, LIM 43, Hospital das Clinicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, SP, Brazil
Koichi Sameshima

Authors

Luiz A. Baccalá
View author publications
You can also search for this author in PubMed Google Scholar
Koichi Sameshima
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luiz A. Baccalá.

Additional information

Communicated by Benjamin Lindner.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baccalá, L.A., Sameshima, K. Partial directed coherence: twenty years on some history and an appraisal. Biol Cybern 115, 195–204 (2021). https://doi.org/10.1007/s00422-021-00880-y

Download citation

Received: 30 April 2021
Accepted: 28 May 2021
Published: 08 June 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00422-021-00880-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Partial directed coherence: twenty years on some history and an appraisal

Abstract

1 Introduction

2 Prolegomena

2.1 Goals and paradoxes

3 The benefits of looking widely

4 Sunspots in your life

5 Some serious stats

6 Information interpretation

7 Duality and the need for new concepts

Example 1

Example 2

8 Theme and variations

9 What about nonlinearity?

10 What now? a sum up and a wishlist.

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation