1 Introduction

Models belonging to the quantum exponential family have been intensively studied within Statistical Physics long before Amari [1, 2] introduced dually flat geometries into the theory of statistical models. The generalization of Amari’s work to quantum models was taken up by Hasegawa and others [3,4,5,6,7,8]. See for instance Chapter 7 of [2] and the book of Petz [9].

In a series of papers [10,11,12,13,14,15], Pistone and coworkers developed a parameter-free approach to Information Geometry. Similar efforts are found in the work of Newton [16, 17].

In the book of Ay et al. [18], Chapter 3.3, two distinct approaches are mentioned. In Pistone’s approach, the manifold of probability measures compatible with a given measure receives the structure of a Banach manifold. Alternatively, a manifold of probability measures can receive its geometry from its embedding in a linear space of signed measures.

Early efforts to generalize Pistone’s approach to the quantum context include the works of Grasselli and Streater [19,20,21,22] and of Jenčová [23]. Recently, a different line of research is started by Ciaglia et al. [24, 25]. They study the action of the group of invertible operators on the manifold of density operators.

Technical problems appear when considering continuous measure spaces. Such problems are avoided here by restriction to the finite-dimensional case.

From the Literature, the following guidelines are adopted.

  • The manifold has a maximal extent; models belonging to an exponential family describe submanifolds;

  • The manifolds are Banach manifolds; charts take values in a Banach space;

  • Each point of the manifold is the centre of a chart;

  • The geodesics are exponential arcs;

  • The metric can be obtained from a divergence function by differentiation;

  • Parallel transport is used to derive the geometric connection.

In Quantum Information Theory [9], Bures’ distance [26, 27] is extensively used. It is the quantum analogue of the Hellinger distance and has quite unique properties. Never the less the inner product needed in the present context is that of Bogoliubov [19, 28,29,30]. For this inner product, the e- (exponential) and m- (mixture) connections become each other duals [19]. A proof follows in Sect. 5.

Let me finally point out that the generalization of Information Geometry to the non-commutative context is characterized by non-uniqueness. Section 10.3 of [9] discusses a class of metrics that all generalize the Fisher information metric to the quantum context. In addition, the notion of exponential arcs, which is the topic of the present work, is non-unique. An alternative definition given in [31] introduces exponential arcs of faithful states on a \(\sigma \)-finite von Neumann algebra.

The next section gives the definition of exponential arcs of density matrices. Vectors tangent to these arcs are discussed in Sect. 3. The exponential map is shown to be well-defined, one-to-one and onto. A chart affine for the e-connection is discussed. In Sects. 4 and 5, Bogoliubov’s inner product is introduced. Parallel transport is used to derive the covariant derivative for the m-connection and for its dual, the e-connection. Sections 6 and 7 introduce convex potentials and dual charts. The link with parameterized approaches is made in Sect. 8. At the end follows a section with summary and discussion.

2 Exponential arcs

A first step in the construction of a geometry on the manifold \({\mathbb M}\) of non-degenerate density matrices of dimension n-by-n is the choice of the geodesics that will be used to connect pairs of points in the manifold. A non-degenerate density matrix is a positive-definite matrix with complex entries and unit trace. Its eigenvalues can be interpreted as probabilities summing up to 1.

The following definition generalizes the concept introduced by Cena and Pistone [13, 14] to the non-commutative context.

Definition 2.1

An exponential arc connecting the density matrix \(\sigma \) to the density matrix \(\rho \) is a map \(t\mapsto \sigma _t\) with \(\sigma _t\) given by

$$\begin{aligned} \sigma _t=\exp ((1-t)\log \rho +t\log \sigma -\alpha (t)) \end{aligned}$$

with \(\alpha (t)\) given by

$$\begin{aligned} \alpha (t)= & {} \log \,\mathrm{Tr}\,\exp ((1-t)\log \rho +t\log \sigma ). \end{aligned}$$

Note that, given \(\rho \) and \(\sigma \) in \({\mathbb M}\), \(\sigma _t\) with \(t\in [0,1]\) is a non-degenerate density matrix belonging to the manifold \({\mathbb M}\). It satisfies \(\sigma _0=\rho \) and \(\sigma _1=\sigma \). The normalization function \(\alpha \) satisfies \(\alpha (0)=\alpha (1)=0\).

For further use, introduce the following notation.

Notation 2.2

For any pair of density matrices \(\rho \) and \(\sigma \) in \({\mathbb M}\), the tangent vector \(Y_\rho (\sigma )\) is given by

$$\begin{aligned} Y_\rho (\sigma )= & {} \frac{\mathrm{\,d}\,}{\mathrm{\,d}t}\bigg |_{t=0}\sigma _t, \end{aligned}$$

where \(t\mapsto \sigma _t\) is the exponential arc connecting \(\sigma \) to \(\rho \).

Use the identity (see [2] p. 156 for a proof)

$$\begin{aligned} \frac{\mathrm{\,d}\,}{\mathrm{\,d}t}\bigg |_{t=0}e^{H+tA} = \int _0^1\mathrm{\,d}u\,e^{uH}Ae^{(1-u)H} = \int _0^1\mathrm{\,d}u\,e^{(1-u)H}Ae^{uH} \end{aligned}$$
(1)

to calculate

$$\begin{aligned} Y_\rho (\sigma )= & {} \int _0^1\mathrm{\,d}u\,\rho ^u\left[ \log \sigma -\log \rho -\frac{\mathrm{\,d}\,}{\mathrm{\,d}t}\alpha (t)\bigg |_{t=0}\right] \rho ^{1-u}. \end{aligned}$$

Note that

$$\begin{aligned} \frac{\mathrm{\,d}\,}{\mathrm{\,d}t}\alpha (t)\bigg |_{t=0}= & {} \,\mathrm{Tr}\,\int _0^1\mathrm{\,d}u\,\rho ^u[\log \sigma -\log \rho ]\rho ^{1-u}\\= & {} -D(\rho ||\sigma ) \end{aligned}$$

with \(D(\rho ||\sigma )\) Umegaki’s relative entropy [32]

$$\begin{aligned} D(\rho ||\sigma )= & {} \,\mathrm{Tr}\,\rho (\log \rho -\log \sigma ). \end{aligned}$$
(2)

Notation 2.3

Each matrix A defines a matrix denoted \([A]^{{\tiny K}}_\rho \) by the relation

$$\begin{aligned}{}[A]^{{\tiny K}}_\rho= & {} \int _0^1\mathrm{\,d}u\,\rho ^u \,A\,\rho ^{1-u}, \qquad \rho \in {\mathbb M}. \end{aligned}$$

The map \(A\mapsto [A]^{{\tiny K}}_\rho \) is the Kubo transform [9].

Notation 2.4

Given a pair of density matrices \(\rho \) and \(\sigma \) let

$$\begin{aligned} c_\rho (\sigma )= & {} \log \sigma -\log \rho +D(\rho ||\sigma ). \end{aligned}$$
(3)

Note that \(\,\mathrm{Tr}\,\rho \, c_\rho (\sigma )=0\). With these notations, one can write the tangent vector as

$$\begin{aligned} Y_\rho (\sigma )= & {} \left[ c_\rho (\sigma )\right] ^{{\tiny K}}_\rho . \end{aligned}$$
(4)

3 The tangent plane

Fix a non-degenerate density matrix \(\rho \) in \({\mathbb M}\). The tangent plane \(T_\rho {\mathbb M}\) at the point \(\rho \) in the manifold \({\mathbb M}\) is the space of derivatives at the origin \(t=0\) of exponential arcs \(t\mapsto \sigma _t\) connecting any density matrix \(\sigma \) in \({\mathbb M}\) to the density matrix \(\rho \). Let us characterize this space.

Proposition 3.1

For any n-by-n matrix V, there exists an n-by-n matrix A such that \(V=[A]^{{\tiny K}}_\rho \). If V is Hermitian then A is Hermitian as well. If in addition, the trace of V vanishes then the expectation \(\,\mathrm{Tr}\,\rho A\) vanishes as well.

Proof

Consider an orthonormal basis \((e_i)_i\) in which \(\rho \) is diagonal. One has \(\rho e_i=\lambda _i\) with \(\lambda _i>0\). The matrix A with matrix elements given by

$$\begin{aligned} \langle e_j|Ae_i\rangle= & {} \frac{\langle e_j|Ve_i\rangle }{\int _0^1\mathrm{\,d}u\,\lambda _i^u\lambda _j^{1-u}} \end{aligned}$$

satisfies the requirements.

If V is Hermitian, then one has

$$\begin{aligned} \langle e_j|Ae_i\rangle \,\int _0^1\mathrm{\,d}u\,\lambda _i^u\lambda _j^{1-u}\,= & {} \langle e_j|Ve_i\rangle =\overline{\langle e_i|Ve_j\rangle }\\= & {} \overline{\langle e_i|Ae_j\rangle }\int _0^1\mathrm{\,d}u\,\lambda _i^u\lambda _j^{1-u}. \end{aligned}$$

This shows that also A is Hermitian.

If in addition \(\,\mathrm{Tr}\,V=0\), then one has

$$\begin{aligned} \,\mathrm{Tr}\,\rho A= & {} \sum _i\lambda _i\langle e_i|Ae_i\rangle \\= & {} \sum _i\lambda _i\frac{\langle e_i|Ve_i\rangle }{\lambda _i}\\= & {} \,\mathrm{Tr}\,V\\= & {} 0. \end{aligned}$$

For convenience, the following notations are introduced.

Notation 3.2

The linear space of Hermitian matrices A with vanishing trace \(\,\mathrm{Tr}\,A=0\) is denoted \(\mathcal{A}_{{\tiny sa}}^0\). The linear space of Hermitian matrices A with vanishing expectation \(\,\mathrm{Tr}\,\rho A=0\) given \(\rho \) in \({\mathbb M}\) is denoted \(\mathcal{A}_\rho \).

Proposition 3.3

The tangent space \(T_\rho {\mathbb M}\) consists of all Hermitian n-by-n matrices with vanishing trace: \(T_\rho {\mathbb M}=\mathcal{A}_{{\tiny sa}}^0\).

Proof

Let V be any matrix in \(\mathcal{A}_{{\tiny sa}}^0\). By the previous proposition, there exists a Hermitian n-by-n matrix A such that \(V=[A]^{{\tiny K}}_\rho \) holds. Let \(\sigma \) be defined by

$$\begin{aligned} \sigma= & {} \frac{\exp (\log \rho +A)}{\,\mathrm{Tr}\,\exp (\log \rho +A)}. \end{aligned}$$

Then \(\sigma \) is a density matrix. Let \(t\mapsto \sigma _t\) denote the exponential arc connecting \(\sigma \) to \(\rho \). The tangent vector at \(t=0\) is given by

$$\begin{aligned} Y_\rho (\sigma )= & {} \int _0^1\mathrm{\,d}u\,\rho ^u\left[ \log \sigma -\log \rho -\frac{\mathrm{\,d}\,}{\mathrm{\,d}t}\alpha (t)\bigg |_{t=0}\right] \rho ^{1-u}\\= & {} \int _0^1\mathrm{\,d}u\,\rho ^u\left[ A-\log \,\mathrm{Tr}\,\exp (\log \rho +A) -\frac{\mathrm{\,d}\,}{\mathrm{\,d}t}\alpha (t)\bigg |_{t=0}\right] \rho ^{1-u}\\= & {} \int _0^1\mathrm{\,d}u\,\rho ^u\,A\,\rho ^{1-u}\\= & {} [A]_\rho ^{{\tiny K}}\\= & {} V. \end{aligned}$$

In the above calculation, it is used that

$$\begin{aligned} D(\rho ||\sigma )= & {} -\,\mathrm{Tr}\,\rho A+\log \,\mathrm{Tr}\,\exp (\log \rho +A) \end{aligned}$$
(5)

and \(\,\mathrm{Tr}\,\rho A=\,\mathrm{Tr}\,V\). The latter vanishes by assumption.

Proposition 3.4

If two exponential arcs \(\sigma _t\) and \(\tau _t\) connecting \(\sigma \), respectively, \(\tau \) to \(\rho \) have the same tangent vector at \(t=0\) then they coincide.

Proof

Because \(\sigma _t\) and \(\tau _t\) have the same tangent vector at \(t=0\), it follows that

$$\begin{aligned} 0= & {} \left[ \log \sigma -\log \tau +D(\rho ||\sigma )-D(\rho ||\tau )\right] ^{{\tiny K}}_\rho . \end{aligned}$$

Take the trace of this expression to find that

$$\begin{aligned} 0= & {} D(\rho ||\sigma )-D(\rho ||\tau ). \end{aligned}$$

One concludes that

$$\begin{aligned} 0= & {} \left[ \log \sigma -\log \tau \right] ^{{\tiny K}}_\rho . \end{aligned}$$
(6)

By Proposition 3.1, the linear map \(A\mapsto [A]_\rho ^{{\tiny K}}\) is invertible. Hence, it is a one-to-one map between the spaces \(\mathcal{A}_\rho \) and \(\mathcal{A}_{{\tiny sa}}^0\) because these spaces are finite-dimensional. From (6), it then follows that \(\log \sigma -\log \tau =0\) and hence, that \(\sigma =\tau \).

The book of Amari and Nagaoka [2] introduces the notions of an m-connection and of an e-connection. The geodesics of the m- or mixture connection are the convex combinations of probability measures. The non-commutative generalization of the probability measures is the quantum expectations, also called quantum states. Their convex combinations correspond with convex combinations of density matrices. In a similar manner, the e-connection of the manifold of quantum states \({\mathbb M}\) can be defined as the connection that has exponential arcs as its geodesics.

The inverse \(\sigma \mapsto Y_\rho (\sigma )\) of the exponential map \(Y_\rho (\sigma ) \mapsto \sigma \) could be used as a chart for the manifold \({\mathbb M}\). This chart is affine in the case of the m-connection. Alternatively, one can use the correspondence provided by Proposition 3.1 between \(\mathcal{A}_{{\tiny sa}}^0\) and \(\mathcal{A}_\rho \). It will turn out that the chart \(c_\rho \) is affine in case of the e-connection. Note that it satisfies \(c_\rho (\rho )=0\). It is said to be centered at the point \(\rho \) in \({\mathbb M}\).

The transition map \(c_{\rho _1}\mapsto c_{\rho _2}\) from reference point \(\rho _1\) to any other reference point \(\rho _2\) is given by

$$\begin{aligned} c_{\rho _2}(\sigma )= & {} c_{\rho _1}(\sigma ) +\log \rho _1-\log \rho _2 +D(\rho _2||\sigma )-D(\rho _1||\sigma ). \end{aligned}$$

The expression in the r.h.s. is Fréchet-differentiable for any density matrix \(\sigma \) in \({\mathbb M}\). One concludes that the different charts are mutually compatible.

4 The metric

Eguchi [33] shows how to derive a metric on the tangent planes starting from a divergence function. The obvious divergence function here is Umegaki’s relative entropy (2) discussed in Sect. 2. The use of the Bogoliubov metric in relation with Umegaki’s relative entropy is found for instance in [4, 25, 34].

Proposition 4.1

An inner product is defined on the tangent plane \(T_\rho {\mathbb M}\) by,

$$\begin{aligned} (Y(\sigma ),Y(\tau ))_\rho= & {} -\frac{\mathrm{\,d}\,}{\mathrm{\,d}s}\frac{\mathrm{\,d}\,}{\mathrm{\,d}t}D(\sigma _s||\tau _t)\bigg |_{s=t=0}, \end{aligned}$$

where \(\sigma _t\) and \(\tau _t\) are exponential arcs connecting density matrices \(\sigma \) and \(\tau \) to the density matrix \(\rho \) and \(Y_\rho (\sigma )\) and \(Y_\rho (\tau )\) are the tangents of \(t\mapsto \sigma _t\), respectively, \(t\mapsto \rho _t\) at \(t=0\). The inner product is given in terms of the chart \(c_\rho \) by

$$\begin{aligned} (Y(\sigma ),Y(\tau ))_\rho = \,\mathrm{Tr}\,Y_\rho (\sigma )\,c_\rho (\tau ) = \int _0^1\mathrm{\,d}u\,\,\mathrm{Tr}\,\rho ^u c_\rho (\sigma )\rho ^{1-u}c_\rho (\tau ). \end{aligned}$$
(7)

Proof

One calculates

$$\begin{aligned} \frac{\mathrm{\,d}\,}{\mathrm{\,d}t}\bigg |_{t=0}D(\sigma _s||\tau _t)= & {} -\frac{\mathrm{\,d}\,}{\mathrm{\,d}t}\bigg |_{t=0}\,\mathrm{Tr}\,\sigma _s\log \tau _t\nonumber \\= & {} -\,\mathrm{Tr}\,\sigma _s \left[ \log \tau -\log \rho \right] -D(\rho ||\tau ). \end{aligned}$$
(8)

This implies

$$\begin{aligned} (Y(\sigma ),Y(\tau ))_\rho= & {} \frac{\mathrm{\,d}\,}{\mathrm{\,d}s}\bigg |_{s=0} \,\mathrm{Tr}\,\sigma _s \left[ \log \tau -\log \rho \right] \\= & {} \,\mathrm{Tr}\,Y_\rho (\sigma ) \, \left[ \log \tau -\log \rho \right] \\= & {} \,\mathrm{Tr}\,Y_\rho (\sigma )\,\left[ c_\rho (\tau )+D(\rho ||\tau )\right] \\= & {} \,\mathrm{Tr}\,Y_\rho (\sigma )\,c_\rho (\tau ). \end{aligned}$$

The tangent vector \(Y(\sigma )\) can be expressed in terms of the chart \(c(\sigma )\). This gives

$$\begin{aligned}(Y(\sigma ),Y(\tau ))_\rho= & {} \,\mathrm{Tr}\,\left[ c_\rho (\sigma )\right] ^{{\tiny K}}_\rho \,c_\rho (\tau ). \end{aligned}$$

This can be written as (7) because \(\,\mathrm{Tr}\,\rho \,c_\rho (\sigma )=0\).

Let us verify that (7) defines a non-degenerate inner product on the tangent space \(T_\rho {\mathbb M}\).

Bilinearity follows because the relation between tangent vector and chart is linear. Positivity follows from

$$\begin{aligned} (Y(\sigma ),Y(\sigma ))_\rho= & {} \int _0^1\mathrm{\,d}u\,\,\mathrm{Tr}\,\left( \rho ^{(1-u)/2}c_\rho (\sigma )\rho ^{u/2}\right) ^\dagger \,\left( \rho ^{(1-u)/2}c_\rho (\sigma )\rho ^{u/2}\right) . \end{aligned}$$

Finally, \((Y(\sigma ),Y(\sigma ))_\rho =0\) implies that \(\rho ^{(1-u)/2}c_\rho (\sigma )\rho ^{u/2}=0\) for all u in [0, 1]. This implies \(c_\rho (\sigma )=0\). The latter is only possible when \(\sigma =\rho \).

Expression (7) is Bogoliubov’s inner product [28,29,30] adapted to the present notations.

5 The dual geometry

With any geometry with parallel transport \(\Pi \) corresponds a dual geometry [2] with parallel transport \(\Pi ^*\) given by

$$\begin{aligned} (\Pi (\rho _1\mapsto \rho _2)V,\Pi ^*(\rho _1\mapsto \rho _2)W)_{\rho _2}= & {} (V,W)_{\rho _1}. \end{aligned}$$
(9)

Here \(\rho _1\) and \(\rho _2\) belong to the manifold \({\mathbb M}\) and V and W are tangent vectors in \(T_{\rho _1}{\mathbb M}\).

A flat geometry is obtained when the parallel transport \(\Pi \) is chosen equal to the identity map, where each tangent space is identified with the space \(\mathcal{A}_{{\tiny sa}}^0\) of traceless Hermitian matrices. The geometry is that of the m-connection [2]. Let us verify this now.

The covariant derivative of a vector field V along a smooth curve \(\gamma \) is given by [35]

$$\begin{aligned}{}[\nabla _{\dot{\gamma }} V]_{\gamma _t}= & {} \frac{\mathrm{\,d}\,}{\mathrm{\,d}s}\bigg |_{s=0}\Pi (\gamma _{t+s}\mapsto \gamma _t)\,V(\gamma _t). \end{aligned}$$

With \(\Pi \) equal to the identity map and with the path \(\gamma \) given by

$$\begin{aligned} \gamma _t= & {} (1-t)\rho +t\sigma \end{aligned}$$

and the vector field given by \(V(\gamma _t)=\gamma _t-\gamma _0\) one obtains

$$\begin{aligned}{}[\nabla _{\dot{\gamma }} V]_{\gamma _t}= & {} \frac{\mathrm{\,d}\,}{\mathrm{\,d}t}\gamma _t\\= & {} \sigma -\rho . \end{aligned}$$

The fact that the covariant derivative is constant along this path and equal to the derivative \(\dot{\gamma }\) indicates that the path is a geodesic of a flat connection. It is a geodesic of the m-connection.

Let us now consider the dual of the m-connection. Because \(\Pi \) is the identity map (9) simplifies to

$$\begin{aligned} (V,\Pi ^*(\rho _1\mapsto \rho _2)W)_{\rho _2}= & {} (V,W)_{\rho _1}. \end{aligned}$$
(10)

By Proposition 3.1, there exists A in \(\mathcal{A}_{\rho _1}\) such that \(W=[A]^{{\tiny K}}_{\rho _1}\). Similarly, there exist B in \(\mathcal{A}_{\rho _2}\) such that \(\Pi ^*(\rho _1\mapsto \rho _2)W=[B]^{{\tiny K}}_{\rho _2}\). From (7),*** it follows that

$$\begin{aligned}\,\mathrm{Tr}\,VB= & {} (V,[B]^{{\tiny K}}_{\rho _2})_{\rho _2}\\= & {} (V,\Pi ^*(\rho _1\mapsto \rho _2)W)_{\rho _2}\\= & {} (V,W)_{\rho _1}\\= & {} \,\mathrm{Tr}\,VA. \end{aligned}$$

Because V is an arbitrary traceless matrix, it follows that \(B-A\) is a multiple of the identity \({\mathbb I}\) and hence that

$$\begin{aligned}\Pi ^*(\rho _1\mapsto \rho _2)\,[A]^{{\tiny K}}_{\rho _1}= & {} [A-\,\mathrm{Tr}\,\rho _2 A]^{{\tiny K}}_{\rho _2}. \end{aligned}$$

Choose now the vector field \(V(\rho )=Y_\rho (\sigma )\) in combination with a path \(\gamma \) equal to the exponential arc \(t\mapsto \sigma _t\) connecting \(\sigma \) to \(\rho \). Then one finds

$$\begin{aligned}{}[\nabla ^*_{\dot{\gamma }} Y(\sigma )]_{\sigma _t}= & {} \frac{\mathrm{\,d}\,}{\mathrm{\,d}s}\bigg |_{s=0} \Pi ^*(\sigma _{t+s}\mapsto \sigma _t)\,[c_{\sigma _{t+s}}(\sigma )]^{{\tiny K}}_{\sigma _{t+s}}\\= & {} \frac{\mathrm{\,d}\,}{\mathrm{\,d}s}\bigg |_{s=0} [c_{\sigma _t}(\sigma )]^{{\tiny K}}_{\sigma _{t}}\\= & {} 0. \end{aligned}$$

This shows that \(t\mapsto \sigma _t\) is a geodesic for the dual connection \(\nabla ^*\). Because the geodesics are exponential arcs, the connection is a non-commutative generalization of the e-connection of [2].

6 The Legendre structure

The relative entropy \(D(\rho ||\sigma )\) is convex in its first argument \(\rho \). The proof is based on Klein’s inequality [36]. See [9] for the more general argument based on operator monotonicity of the function \(f(x)=-x\log x\). This convexity suggests the use of Legendre transforms.

Definition 6.1

Given a density matrix \(\rho \) and a matrix A in \(\mathcal{A}_\rho \), the potential \(\Phi _\rho (A)\) is defined by

$$\begin{aligned}\Phi _\rho (A)= & {} \log \,\mathrm{Tr}\,\exp (\log \rho +A). \end{aligned}$$

It is the analogue of the logarithm of the partition sum in Statistical Physics. The matrix A corresponds with minus the Hamiltonian. The term \(\log \rho \) is added to enable that an arbitrary point of the manifold can be taken as the center of the manifold.

Note that the Banach space of Hermitian matrices can be identified with the dual of the linear space generated by the density matrices by identification of the linear functional \(\rho \mapsto \,\mathrm{Tr}\,\rho A\) with the matrix A itself. The Legendre transform of the map \(\sigma \mapsto D(\sigma ||\rho )\) is therefore equal to

$$\begin{aligned} A\mapsto & {} \sup \{\,\mathrm{Tr}\,\sigma \, A-D(\sigma ||\rho ):\,\sigma \in {\mathbb M}\}, \qquad \rho \in {\mathbb M} \text{ and } A\in \mathcal{A}_\rho . \end{aligned}$$

Proposition 6.2

For all \(\rho \) in \({\mathbb M}\) and A in \(\mathcal{A}_\rho \) is

$$\begin{aligned} \Phi _\rho (A)= & {} \sup \{\,\mathrm{Tr}\,\sigma \, A-D(\sigma ||\rho ):\,\sigma \in {\mathbb M}\}. \end{aligned}$$
(11)

The maximum is reached for \(\sigma =\tau _A\) with \(\tau _A\) in \({\mathbb M}\) such that \(c_\rho (\tau _A)=A\). It takes on the value

$$\begin{aligned} \Phi _\rho (A)= & {} D(\rho ||\tau _A). \end{aligned}$$

Proof

From

$$\begin{aligned} A=c_\rho (\tau _A)=\log \tau _A-\log \rho +D(\rho ||\tau _A) \end{aligned}$$

one obtains

$$\begin{aligned} \Phi _\rho (A)= & {} \log \,\mathrm{Tr}\,\exp (\log \rho +A)\nonumber \\= & {} \log \,\mathrm{Tr}\,\exp (\log \tau _A+D(\rho ||\tau _A))\nonumber \\= & {} D(\rho ||\tau _A). \end{aligned}$$
(12)

It then follows that

$$\begin{aligned} A= & {} \log \tau _A-\log \rho +\Phi _\rho (A). \end{aligned}$$
(13)

Use this to obtain

$$\begin{aligned} 0\le & {} D(\sigma ||\tau _A)\\= & {} \,\mathrm{Tr}\,\sigma \left[ \log \sigma -\log \rho -A+\Phi _\rho (A)\right] \\= & {} D(\sigma ||\rho )-\,\mathrm{Tr}\,\sigma A+\Phi _\rho (A). \end{aligned}$$

This shows that for any \(\sigma \) one has

$$\begin{aligned} \Phi _\rho (A)\ge & {} \,\mathrm{Tr}\,\sigma A-D(\sigma ||\rho ). \end{aligned}$$

Take now \(\sigma =\tau _A\). Then the inequality \(0\le D(\sigma ||\tau _A)\) in the above calculation becomes an equality. Hence, \(\sigma =\tau _A\) realizes the supremum in (11).

The proposition shows that \(A\mapsto \Phi _\rho (A)\) is a Legendre transform. In particular, this implies that it is a convex function.

7 The dual chart

The following result is standard.

Proposition 7.1

The plane tangent to the potential \(\Phi _\rho (A)\) at the contact point \(\tau _A\) is the map

$$\begin{aligned} B\mapsto \Phi _\rho (A)+\,\mathrm{Tr}\,\tau _A (B-A). \end{aligned}$$

Proof

From (11), one obtains

$$\begin{aligned} \Phi _\rho (B)\ge & {} \,\mathrm{Tr}\,\tau _AB-D(\tau _A||\rho )\\= & {} \,\mathrm{Tr}\,\tau _A(B-A)+\,\mathrm{Tr}\,\tau _A A-D(\tau _A||\rho )\\= & {} \,\mathrm{Tr}\,\tau _A(B-A)+\Phi _\rho (A). \end{aligned}$$

This shows that the plane \(B\mapsto \,\mathrm{Tr}\,\tau _A(B-A)+\Phi _\rho (A)\) remains below the potential \(\Phi _\rho \). Contact at \(B=A\) is clear.

From the above proposition, one concludes that the Legendre dual of the matrix A in \(\mathcal{A}_\rho \) is the linear functional defined by the density matrix \(\tau _A\). In the approach with coordinates, the derivative of the dual coordinate yields the metric tensor: The first item of (3.32) of [2] reads

$$\begin{aligned} \partial \eta _j/\partial \theta ^i=g_{ij}. \end{aligned}$$

The derivative of the potential gives the dual coordinate: (3.33) of [2] reads \(\partial _i\psi =\eta _i\). Parameter-free analogues follow below.

Proposition 7.2

Take A and B in \(\mathcal{A}_\rho \). The Fréchet derivative \(d_{B}\,\Phi _\rho (A)\) of the potential \(\Phi _\rho (A)\) in the direction B equals \(\,\mathrm{Tr}\,\tau _A B\).

Proof

One can write

$$\begin{aligned} \Phi _\rho (A+B)= & {} \Phi _\rho (A)+\log \frac{\,\mathrm{Tr}\,\exp (\log \rho +A+B)}{\,\mathrm{Tr}\,\exp (\log \rho +A)}. \end{aligned}$$

Use now (13) to obtain

$$\begin{aligned} \Phi _\rho (A+B)= & {} \Phi _\rho (A)+\log \,\mathrm{Tr}\,\exp (\log \tau _A+B)\\= & {} \Phi _\rho (A)+\log \left( 1+\,\mathrm{Tr}\,[B]^{{\tiny K}}_{\tau _A}+ \text{ o } (||B||)\right) \\= & {} \Phi _\rho (A)+\,\mathrm{Tr}\,\tau _A B+ \text{ o } (||B||). \end{aligned}$$

Proposition 7.3

Choose \(\rho \) in \({\mathbb M}\) and A and B in \(\mathcal{A}_\rho \). The Fréchet derivative \(d_{B}\,\tau _A\) of \(\tau _A\) in the direction B equals

$$\begin{aligned} d_{B}\,\tau _A=\left[ B-\,\mathrm{Tr}\,\tau _A\,B\right] ^{{\tiny K}}_{\tau _A}. \end{aligned}$$
(14)

Proof

One has

$$\begin{aligned} B= & {} c_\rho (\tau _{A+B})-c_\rho (\tau _{A})\\= & {} \log \tau _{A+B}-\log \tau _A+D(\rho ||\tau _{A+B})-D(\rho ||\tau _A)\\= & {} \log \tau _{A+B}-\log \tau _A-\,\mathrm{Tr}\,\rho (\log \tau _{A+B}-\log \tau _A). \end{aligned}$$

This can be written in first-order approximation as

$$\begin{aligned} \tau _{A+B}= & {} \exp \left( \log \tau _A+B+\,\mathrm{Tr}\,\rho (\log \tau _{A+B}-\log \tau _A)\right) \\= & {} \tau _A +\left[ B- \,\mathrm{Tr}\,\tau _A B\right] ^{{\tiny K}}_{\tau _A}\\&+\left[ \,\mathrm{Tr}\,\tau _A\, B+\,\mathrm{Tr}\,\rho (\log \tau _{A+B}-\log \tau _A)\right] \,\tau _A + \text{ o } (||B||). \end{aligned}$$

Take the trace of this expression to see that the third term in the r.h.s. vanishes. Hence, one concludes (14).

Proposition 7.4

Select \(\rho \) and \(\sigma \) in \({\mathbb M}\) and A in \(\mathcal{A}_{{\tiny sa}}^0\). One has

$$\begin{aligned} (Y(\rho ),Y(\sigma ))_{\tau _A}= & {} \,\mathrm{Tr}\,c_{\tau _A}(\sigma )\,d_{B}\,\tau _A \end{aligned}$$

with \(B=c_{\tau _A}(\rho )-\,\mathrm{Tr}\,c_{\tau _A}(\rho )\).

Proof

Use \(0=\,\mathrm{Tr}\,\tau _Ac_{\tau _A}(\rho )\) to find \(\,\mathrm{Tr}\,\tau _A B=-\,\mathrm{Tr}\,c_{\tau _A}(\rho )\) and hence \(c_{\tau _A}(\rho )=B-\,\mathrm{Tr}\,\tau _A B\). This is used in the now following calculation.

From Proposition 4.1, one obtains

$$\begin{aligned}(Y(\rho ),Y(\sigma ))_{\tau _A}= & {} \,\mathrm{Tr}\,c_{\tau _A}(\sigma )\,Y_{\tau _A}(\rho )\\= & {} \,\mathrm{Tr}\,c_{\tau _A}(\sigma )\,[c_{\tau _A}(\rho )]^{{\tiny K}}_{\tau _A}\\= & {} \,\mathrm{Tr}\,c_{\tau _A}(\sigma )\,[B-\,\mathrm{Tr}\,\tau _A B]^{{\tiny K}}_{\tau _A}\\= & {} \,\mathrm{Tr}\,c_{\tau _A}(\sigma )\,d_{B}\tau _A. \end{aligned}$$

8 Affine coordinates

The space \(\mathcal{A}_{{\tiny sa}}^0\) of traceless Hermitian matrices of dimension n-by-n is a Hilbert space for the Hilbert-Schmidt inner product

$$\begin{aligned} (A,B)_{{\tiny HS}}= & {} \,\mathrm{Tr}\,AB, \qquad A,B\in \mathcal{A}_{{\tiny sa}}^0. \end{aligned}$$

Hence one can construct an orthonormal set \((f_i)_i\) of basis vectors in \(\mathcal{A}_{{\tiny sa}}^0\). For any density matrix, \(\sigma \) in \({\mathbb M}\) one can write

$$\begin{aligned} \log \sigma= & {} x^i(\sigma )\, f_i+\,\mathrm{Tr}\,\log \sigma \quad \text{ with } \quad x^i(\sigma )=(\log \sigma ,B^i)_{{\tiny HS}}. \end{aligned}$$

The charts \(c_\rho \) have vanishing expectation value. Their expansion therefore reads

$$\begin{aligned} c_\rho (\sigma )= & {} [x^i(\sigma )-x^i(\rho )]\,(f_i-\,\mathrm{Tr}\,\rho \, f_i). \end{aligned}$$

Introduce a field of basis vectors \(e_i\) in the tangent bundle. It is defined by

$$\begin{aligned}{}[e_i]_\rho= & {} [f_i-\,\mathrm{Tr}\,\rho \,f_i]^{{\tiny K}}_\rho . \end{aligned}$$

The tangent vectors \(Y(\sigma )\) can then be written as follows

$$\begin{aligned} Y_\rho (\sigma ) = [c_\rho (\sigma )]^{{\tiny K}}_\rho = [x^i(\sigma )-x^i(\rho )]\,[e_i]_\rho . \end{aligned}$$

The metric tensor g is defined by

$$\begin{aligned} g_{ij}(\rho )= & {} (e_i,e_j)_\rho . \end{aligned}$$

One finds for any pair \(\sigma \), \(\tau \) in \({\mathbb M}\)

$$\begin{aligned} (Y(\sigma ),Y(\tau ))_\rho= & {} [x^i(\sigma )-x^i(\rho )]\,g_{ij}\,[x^j(\sigma )-x^j(\rho )]. \end{aligned}$$

Let us next consider the dual charts. Take B in \(\mathcal{A}_\rho \) and expand it as

$$\begin{aligned} B= & {} B^i (f_i-\,\mathrm{Tr}\,\rho \,f_i). \end{aligned}$$

From (14) one obtains for any A and B in \(\mathcal{A}_{{\tiny sa}}^0\)

$$\begin{aligned} d_B\tau _A= & {} [B-\,\mathrm{Tr}\,\tau _A B]^{{\tiny K}}_{\tau _A}\\= & {} B^i[f_i-\,\mathrm{Tr}\,\tau _A f_i]^{{\tiny K}}_{\tau _A}\\= & {} B^i[e_i]_{\tau _A}. \end{aligned}$$

Hence, one has

$$\begin{aligned} ([e_i]_{\tau _A},d_B\tau _A)_{\tau _A}=B^j(e_i,e_j)_{\tau _A}=g_{ij}B^j. \end{aligned}$$

9 Summary and discussion

The manifold \({\mathbb M}\) of non-degenerate n-by-n matrices is studied in a parameter-free way. Starting point is the notion of exponential arcs. The tangents to such arcs span at each point \(\rho \) of the manifold the space \(\mathcal{A}_{{\tiny sa}}^0\) of traceless Hermitian matrices. Affine charts are introduced for both the m- and the e-connection. The latter turn \({\mathbb M}\) into a Banach manifold by means of a global chart \(c_\rho \) centered at an arbitrary point \(\rho \) of the manifold.

Bogoliubov’s inner product is defined on any of the tangent planes \(T_\rho {\mathbb M}\). Parallel transport relates the different tangent planes. The covariant derivative corresponding with the dual parallel transport is derived. It defines the e-connection.

The divergence function is convex in its first argument. This enables the introduction of a convex potential function \(\Phi _\rho \) defined on the range \(\mathcal{A}_\rho \) of the chart \(c_\rho \). The derivative of \(\Phi _\rho \) defines the dual chart.

In a final section, affine coordinates are introduced. In this way, the link is made to more conventional approaches.

The identity (1) plays an essential role in controlling the effects of non-commutativity. The symbol \([A]^{{\tiny K}}_\rho \) denotes the Kubo transform of A in \(\mathcal{A}_\rho \). See Notation 2.3 in Sect. 2. It maps a Hermitian matrix A with vanishing expectation \(\,\mathrm{Tr}\,\rho A=0\) onto a tangent vector with vanishing trace \(\,\mathrm{Tr}\,[A]^{{\tiny K}}_\rho =0\). In combination with the chart \(c_\rho (\sigma )\), it allows to express the exponential map as \(Y_\rho (\sigma )=[c_\rho (\sigma )]^{{\tiny K}}_\rho \mapsto \sigma \).

In the study of quantum exponential families, the e-connection can be easily derived by taking third order derivatives of the divergence function [33]. They yield the connection coefficients \(\Gamma ^k_{\,ij}\). By use of dual coordinates, it then becomes straightforward to show that exponential arcs are geodesics of a flat geometry. In a coordinate-free approach, it is more transparent to start from parallel transport. The parallel transport of the m-connection is the identity map. Given the metric, one can then derive the dual transport and verify that the exponential arcs are geodesics for the dual connection. This way of working is adapted here because of its transparency.

Throughout this work, the distinction is made between the spaces \(\mathcal{A}_\rho \) of Hermitian matrices A with vanishing expectation \(\,\mathrm{Tr}\,\rho A=0\) and the space \(\mathcal{A}_{{\tiny sa}}^0\) of Hermitian matrices V with vanishing trace \(\,\mathrm{Tr}\,V=0\), although the relation between the two spaces is trivial. Doing so is clarifying. Given a Hermitian matrix A, the parallel transport from \(\rho _1\) in \({\mathbb M}\) to \(\rho _2\) in \({\mathbb M}\) by means of the e-connection maps \(A-\,\mathrm{Tr}\,\rho _1 A\) onto \(A-\,\mathrm{Tr}\,\rho _2 A\). The commutative analogue of this transport law has been emphasized for instance in [15].

In Amari’s work [1, 2], it is important that there is available a potential function the Hessian of which is the metric. It allows for an easy introduction of the Legendre duality. It is shown in Sect. 6 that for each \(\rho \) in \({\mathbb M}\) a potential \(\Phi _\rho \) can be defined on the space \(\mathcal{A}_\rho \) in such a way that its Fréchet derivative, which is the Legendre dual, is a chart affine for the m-connection.

The present paper describes the geometry of the manifold \({\mathbb M}\) of non-degenerate n-by-n matrices from a specific point of view. Much more is known and the overall picture is clear. On the other hand, the generalization to infinite dimensions consists of separate studies such as those of [20, 21, 23, 24, 31]. Finite-dimensional matrices are replaced by possibly unbounded operators on Hilbert space. Density matrices are replaced by normalized positive functionals called states. The technicality of the subject increases, and many aspects concerning the geometry of the manifold of faithful states are still unclear.