1 Introduction

Apart from the linear Schrödinger equation written in standard notation

$$\begin{aligned} ( {\hat{H}} - E {\hat{I}} ) \ \Psi \ = \ 0 \end{aligned}$$
(1)

nonlinear equations are also considered in various areas of physics and chemistry. These can be grouped into two categories:

  1. (1)

    The eigenproblem is nonlinear in energy.

    1. (i)

      An example for such nonlinearity is

      $$\begin{aligned} ({\hat{A}}_0 - E {\hat{A}}_1 - E^2 {\hat{A}}_2 - E^3 {\hat{A}}_3 \ldots ) \Psi = 0 \end{aligned}$$

      where \({\hat{A}}_i\)-s are linear operators. Results on problems of this type in the mathematical literature are extensive, including the search of appropriate variational principles [1,2,3].

    2. (ii)

      Another example is the Feshbach-type [4] effective Hamiltonian appearing in Löwdin’s energy dependent partitioning technique [5, 6]:

      $$\begin{aligned} H^{eff} \ \Psi \ = \ \Big [ H_{AA} + H_{AB} \ \big (H_{BB} - E \big )^{-1} \ H_{BA} \Big ] \ \Psi \ = \ E \ \Psi , \end{aligned}$$

      where \(AA, AB, \ldots \) indicate blocks (projections) of the Hamiltonian.

    We shall not further consider this case in this paper.

  2. (2)

    The equation is nonlinear in the eigenvector \(\Psi \), which may occur when the Hamiltonian depends on the eigenvector:

    $$\begin{aligned} \Big ( {\hat{H}}(\Psi ) - E \Big ) \ \Psi \ = \ 0. \end{aligned}$$
    (2)

    This is the case we shall be concerned with here.

Examples for case (2) are

  1. (i)

    the nonlinear Schrödinger equation describing the interaction of a molecule with a polarizable medium (setting \({\hat{A}}\) and \({\hat{B}}\) appropriately) [7,8,9,10,11,12]:

    $$\begin{aligned} \Bigg ({\hat{H}}^0 - {\hat{A}} \ \frac{\langle \Psi |{\hat{B}}|\Psi \rangle }{\langle \Psi |\Psi \rangle } \Bigg ) \ \Psi \ = \ E \ \Psi . \end{aligned}$$
    (3)
  2. (ii)

    The Gross–Pitajevskij equation [13] (GPE), which is the bosonic analogue of the Hartree–Fock equations:

    $$\begin{aligned} ({\hat{T}} + {\hat{V}} + \lambda || \psi ||^2 ) \psi = E \psi . \end{aligned}$$

    Here \({\hat{T}}\) is the kinetic energy operator, \({\hat{V}}\) is an external potential, and \(\lambda \) is a coupling constant for inter-particle interactions. Effects of nonlinearity and variational principles for GPE and its generalizations have been widely studied [14,15,16,17,18].

  3. (iii)

    The Hartree or the Hartree–Fock equations for many-electron systems. The latter in canonical form are written as

    $$\begin{aligned} {\hat{F}} \psi _i = \varepsilon _i \psi _i \end{aligned}$$
    (4)

    where the nonlinearity arises from the dependence of the Fockian \({\hat{F}}\) on the density matrix P constructed with occupied one-electron orbitals \(\{ \psi _k \}\) in basis set \( \{ \mu \} \) as:

    $$\begin{aligned} P_{\mu \nu } = \sum _k^{\text{ o }cc} \langle \mu | \psi _k \rangle \langle \psi _k | \nu \rangle . \end{aligned}$$
    (5)
  4. (iv)

    Certain nonlinear differential equations for Hamiltonian systems have the form of nonlinear eigenvalue problems. These can originate from special variational principles [19,20,21].

  5. (v)

    The non-perturbative form of the Bloch equation [22,23,24]

    $$\begin{aligned} {\hat{H}} \ {\hat{\Omega }} \ = \ {\hat{\Omega }} \ {\hat{H}} \ {\hat{\Omega }} \end{aligned}$$
    (6)

    where the wave operator is either of form \({\hat{\Omega }} = |\Psi \rangle \langle \Psi | \) or \({\hat{\Omega }} = |\Psi \rangle \langle \varphi | \) where \( \varphi \) is a reference function, constitute another example for wave function nonlinearity. In fact, Eq. (6) was named as the nonlinear Schrödinger equation by Löwdin [23].

In this paper, we shall treat cases (i) and (iii) discussing variational principles.

When relating the ordinary linear Schrödinger equation (1) to the variation of the energy (preserving normalization of the eigenvector by a Lagrangian multiplier E), one proceeds as

$$\begin{aligned} \delta \ \left[ \langle \Psi | H | \Psi \rangle + E ( 1 - \langle \Psi |\Psi \rangle ) \right] \ = \ 0 \end{aligned}$$

resulting

$$\begin{aligned} \langle \delta \Psi | H - E | \Psi \rangle \ + \text {c.c.} \ = \ 0, \end{aligned}$$

which for arbitrary variations \(\langle \delta \Psi | \) is satisfied only when \( H | \Psi \rangle = E | \Psi \rangle \).

In this procedure we have utilized that \(\delta H \ = 0.\)

2 The nonlinear case

2.1 Variation of the energy expectation value

When treating the nonlinear Schrödinger equation (2), one must not assume that \(\delta H \ = 0.\) One should rather write the variation of the energy expectation value \(\mathcal{E}\) with the appropriate Lagrange multiplier E as

$$\begin{aligned} \delta \mathcal{E} = \delta \left( \langle \Psi | {\hat{H}} - E | \Psi \rangle \right) , \end{aligned}$$

where the constant term E whose variation is zero has been omitted. Carrying out the variation we have

$$\begin{aligned} \delta \mathcal{E } = \left( \langle \delta \Psi | {\hat{H}} - E | \Psi \rangle + c.c. \right) + \langle \Psi | \delta {\hat{H}} | \Psi \rangle . \end{aligned}$$
(7)

This shows that satisfying the Schrödinger equation is insufficient to make the variation of the energy expectation value zero. On the contrary, imposing the variational condition \(\delta \mathcal{E } = 0\), the Schrödinger equation will not be satisfied.

2.2 The generalized Hellmann–Feynman theorem

If \(\Psi \) is chosen to satisfy the Schrödinger equation, the first term in Eq. (7) as well as its complex conjugate vanish, and one is left with

$$\begin{aligned} \delta \mathcal{E} = \langle \Psi | \delta {\hat{H}} | \Psi \rangle . \end{aligned}$$
(8)

This equation gives us the variation of the energy expectation value when the wave function is an eigenfunction of the nonlinear Hamiltonian \(H(\Psi )\). The formula is in analogy with the Hellmann–Feynman equation [25], which gives the derivative of the energy with respect to an external parameter R of the potential:

$$\begin{aligned} \frac{\partial E}{\partial R} \ = \ \left\langle \Psi | \frac{\partial {\hat{H}}}{\partial R} | \Psi \right\rangle . \end{aligned}$$

It seems that the similar equation (8) is valid for the variation of the energy expectation value in the nonlinear Schrödinger equation, too. One may call Eq. (8) the generalized Hellmann–Feynman theorem.

2.3 The Hartree–Fock case

The general formulation of the previous sections can be applied to the variation of Fockian expectation values in a straightforward manner. We write the orbital energies using standard notation as

$$\begin{aligned} \varepsilon _i \ = \ \langle \psi _i | {\hat{F}} | \psi _i \rangle . \end{aligned}$$

The variation of \(\varepsilon _i\) with respect to orbital rotations is

$$\begin{aligned} \delta \varepsilon _i \ = \ \Big ( \langle \delta \psi _i | {\hat{F}} | \psi _i \rangle + c.c. \Big ) + \langle \psi _i | \delta {\hat{F}} | \psi _i \rangle . \end{aligned}$$
(9)

The orbital variations can be described by a unitary matrix (this will take care of preserving the orthonormality of orbitals):

where the exponential parametrization of a unitary matrix is used with antihermitean matrix \(\epsilon \). Linear variation of orbital \(\psi _i\) is, accordingly

$$\begin{aligned} \delta \psi _i \ = \ \sum _k \ \epsilon _{ik} \ \psi _k \ = \ \sum _k^{occ} \ \epsilon _{ik} \ \psi _k \ + \ \sum _k^{virt} \ \epsilon _{ik} \ \psi _k. \end{aligned}$$
(10)

To keep the variation orthogonal to the varied orbital, one may want to include the restriction \(i \ne k\) in the occupied sum. This is taken care by \(\epsilon _{ii} \) being 0 due to the antihermiticity of \(\epsilon \). It follows that we consider variations with the property

$$\begin{aligned} \langle \psi _i | \delta \psi _i \rangle = 0. \end{aligned}$$

This property ensures that quantities of type

$$\begin{aligned} \langle \delta \psi _i | {\hat{F}} | \psi _i \rangle = \langle \delta \psi _i | \psi _i \rangle \varepsilon _i = 0, \end{aligned}$$

since we take the variation at the canonical orbitals \(\psi _i\) which are eigenvectors of \({\hat{F}}\). As a consequence, the first term and its complex conjugate in Eq. (9) disappear and one is left with

$$\begin{aligned} \delta \varepsilon _i \ = \ \langle \psi _i | \delta {\hat{F}} | \psi _i \rangle , \end{aligned}$$
(11)

which is the manifestation of the generalized Hellmann–Feynman theorem discussed above in the Hartree–Fock case.

To evaluate \(\delta {\hat{F}}\), we write the diagonal elements of \({\hat{F}}\) as

$$\begin{aligned} \varepsilon _i \ = \ h_{ii} + \sum _k^{occ} \ \big ( 2 (\psi _i \psi _i | \psi _k \psi _k ) - (\psi _i \psi _k | \psi _k \psi _i ) \big ) \ = \ h_{ii} + 2 J_i - K_i \end{aligned}$$

with standard notation. When taking the variation of this, we utilize Eq. (11). This means that the terms involving the variations of \(\psi _i\) drop out, and one should consider only the variations with respect to \(\psi _k\) in the summation of the Coulomb and exchange operators. This will be expressed by the variation symbol \(\delta _i\). One gets:

$$\begin{aligned} \delta \varepsilon _i \ = \ 2 \ \delta _i J_i - \delta _i K_i \ = \ \sum _k^{occ} \ \Big ( 2 \ (\psi _i \ \psi _i | \delta \psi _k \ \psi _k ) \ - \ (\psi _i \ \psi _k | \delta \psi _k \ \psi _i ) \Big ) \ + \ c.c. \end{aligned}$$

The next step is to expand the orbital variations according to Eq. (10). Writing this down for the Coulomb term \(J_i\) first:

with the asterisk indicating complex conjugation. In the last-but-one step we applied an interchange of labels jk, and realized that the two terms involving summation over occupied orbitals \(\psi _k\) and \(\psi _j\) cancel due to the antihermiticity of \(\epsilon \). The two terms containing virtual summation survive, since there j and k are not equivalent indices.

The same manipulation can be repeated for the exchange terms to get

$$\begin{aligned} \delta _i K_i= & {} \ \sum _k^{occ} \sum _j^{virt} \ \Big ( \epsilon _{kj} (\psi _i \psi _k | \psi _j \psi _i ) \ + \ \epsilon _{kj}^* (\psi _i \psi _j | \psi _k \psi _i ) \Big ) \end{aligned}$$

The quantity \(2 \ \delta _i J_i \ - \ \delta _i K_i\) constitutes the internal variation of the Fockian, which determines orbital energy variations through (11).

The fact that orbital variations within the occupied space do not contribute to the variation of orbital energies is trivial, since \(\delta {\hat{F}}\) comes from the variation of the density matrix P which is invariant to any mixing within the occupied space.

2.4 Numerical illustrations

We include two simple illustrations. The first one is a two-state model for equations of type (3). We set \(A = B\) and aim to solve the nonlinear equation

$$\begin{aligned} H^0 \ \Psi \ - \ \lambda \ \frac{\langle \Psi |A|\Psi \rangle }{\langle \Psi | \Psi \rangle } \ A \ \Psi \ = \ E \ \Psi \end{aligned}$$
(12)

where the strength of nonlinearity is characterized by \(\lambda \). Working in the basis set of the eigenvectors of \(H^0\) in the two-state model, choosing the energy origin at \(H^0_{11}\) and the energy unit at \(H^0_{22}\), then

$$\begin{aligned} H^0 = \left( \begin{array}{cc} 0 &{}\quad 0 \\ 0 &{}\quad 1 \end{array} \right) . \end{aligned}$$

Further, assuming only off-diagonal perturbation in A,

$$\begin{aligned} A = \ \left( \begin{array}{cc} 0 &{}\quad 1 \\ 1 &{}\quad 0 \end{array} \right) . \end{aligned}$$

Finally, writing the eigenvector as

$$\begin{aligned} \Psi = \left( \begin{array}{c} 1 \\ x \end{array} \right) \end{aligned}$$

the normalization \(\langle \Psi | \Psi \rangle = 1+x^2\) resulted.

This simple model can be solved analytically by the substitution of these choices into Eq. (12) yielding, apart from a trivial solution \(x=0\), the self-consistent solutions:

$$\begin{aligned} x^2 \ = \ \frac{2\lambda - 1}{ 2 \lambda + 1} \end{aligned}$$

as a function of the perturbation strength \(\lambda \). Choosing \(\lambda =1\) we get the two solutions \(x=\pm 1/\sqrt{3}\). The energy expectation value as a function of x is plotted in Fig. 1. The non-interacting case, when the Schrödinger equation is linear, is \(\lambda =0\). Then, trivially, the expectation value has a stationary point (minimum) at \(x=0\), the zero order solution. The non-linear case is illustrated by choosing \(\lambda =1\). Then the expectation value is not stationary for the self-consistent wave function \(x=\pm 1/\sqrt{3}\), and when \(\mathcal{E}(x)\) is stationary (at around \(x=0.775\)), x does not correspond to any solution of the nonlinear problem.

Fig. 1
figure 1

Variation of the energy expectation value as a function of wave function parameter x (see text). The upper curve (purple in color online) shows \(\mathcal{E}(x)\) for the linear case \(\lambda =0\), while the non-linear situation for \(\lambda =1\) is shown by the lower curve (green in color online). The self-consistent values of x are indicated by the short vertical sticks at \(\pm 1 / \sqrt{3} \) printed in black, where \(\mathcal{E}(x) = E = -0.5\)

The second example we consider are Fockian expectation value variations for a model system consisting of one helium atom and two hydrogens. Pairwise orbital rotations are performed with the unitary matrix

$$\begin{aligned} \left( \begin{array}{cc} \cos \varphi &{}\quad - \sin \varphi \\ \sin \varphi &{}\quad \cos \varphi \\ \end{array} \right) . \end{aligned}$$

Figure 2. shows the trivial situation when only the two occupied orbitals are mixed. As a function of the mixing angle \(\varphi \), the energies of the HOMO (Fig. 2a.) and the next-HOMO (Fig 2b.) vary as expected, showing extremal values at \(\varphi =0\) corresponding to the canonical case, the former showing a maximum, the latter a minimum.

Figure 3. shows that when occupied and virtual orbitals are allowed to mix, the stationary points of orbital energies deviate from the canonical \(\varphi =0\) value. The deviations are not too large, however.

Fig. 2
figure 2

Variation of Fockian expectation values for the model system \(\hbox {HeH}_2\) in a DZ basis set with respect to occupied orbital mixing angle \(\varphi \). Both the HOMO (Fig. 2a.) and the next-HOMO (Fig. 2b.) energies are stationary for canonical orbitals \(\varphi =0\)

Fig. 3
figure 3

Variation of Fockian expectation values for the model system \(\hbox {HeH}_2\) with respect to HOMO-LUMO mixing described by rotation angle \(\varphi \) in a DZ basis set. The orbital energy curves for the HOMO (Fig. 3a.), the LUMO (Fig. 3b.) and the next-HOMO (Fig. 3c.) are drawn

3 A simple variational principle for nonlinear Schrödinger equations

We note first that for the linear, but time-dependent Schrödinger equation the original Dirac–Frenkel variational principle [26, 27] has been reformulated by McLachlan and Ball [28], and discussed thoroughly by Löwdin and Mukherjee [29]. The essence of the reformulation was the utilization of the second centralized moment of the Hamiltonian

$$\begin{aligned} m_2 \ = \ \left\langle \Psi | \Bigg ( {\hat{H}} - \langle {\hat{H}} \rangle \Bigg )^2 | \Psi \right\rangle \end{aligned}$$
(13)

with \(\langle \cdots \rangle \) indicating Rayleigh expectation value \(\langle \Psi | \ldots | \Psi \rangle / \langle \Psi | \Psi \rangle \).

In this section we emphasize that the second moment may also serve as a variational functional for the time-independent nonlinear Schrödinger equation.

An alternative form of \(m_2\) can be written as the square norm

$$\begin{aligned} m_2 \ = \ ||r||^2 \ = \ \langle r | r \rangle \end{aligned}$$

whith the residual vector

$$\begin{aligned} |r\rangle \ = \ \big ( {\hat{H}} - \langle {\hat{H}} \rangle \big ) | \Psi \rangle . \end{aligned}$$

It is evident that if the nonlinear Schrödinger equation (2) is satisfied, \(|r\rangle = 0\), thus \(m_2 = 0\), too. A formal proof that vanishing \(m_2\) is equivalent to the linear Schrödinger equation was given by Nakatsuji and Davidson [30]. We repeat their arguments here with the necessary modification requested by nonlinearity to show that the statement remains valid also in the nonlinear case.

All examples studied here assume that the nonlinear Hamiltonian entering Eq. (2) is Hermitean. Therefore at some wave function \(\Psi \) we may consider the spectral resolution of \(H(\Psi )\) as

$$\begin{aligned} {\hat{H}}(\Psi ) \ = \ \sum _k \ E_k(\Psi ) \ | \Psi _k \rangle \langle \Psi _k | \end{aligned}$$

leading to

$$\begin{aligned} \Big ( {\hat{H}} - \langle {\hat{H}} \rangle \Big )^2 \ = \ \sum _k \ \Big ( E_k - \langle {\hat{H}} \rangle \Big )^2 \ | \Psi _k \rangle \langle \Psi _k |. \end{aligned}$$

One must emphasize that the expansion vectors \(\Psi _k\) also depend on \(\Psi \), which is not indicated in the equations to avoid a cumbersome notation.

Wave function \(\Psi \) entering Eq. (13) can be expanded as

$$\begin{aligned} \Psi \ = \ \sum _k \ C_k \ |\Psi _k\rangle , \end{aligned}$$

leading to

$$\begin{aligned} m_2 \ = \ \sum _k \ |C_k|^2 \ \Big ( E_k - \langle {\hat{H}} \rangle \Big )^2. \end{aligned}$$

Since every term is nonnegative, setting \(m_2=0\) involves that each term in the summation over k must be zero individually. This involves (since \(E_k \ne \langle {\hat{H}} \rangle \) in general) that \(C_k = 0\) for most k. To avoid a trivial solution \(\Psi =0\), there must be at least one value of k, say i, for which

$$\begin{aligned} E_i = \langle {\hat{H}} \rangle \end{aligned}$$

with the associated coefficient \(C_i \ne 0\). This means that \(\Psi \) must be a pure state \(\Psi = \Psi _i\), which is then a self-consistent solution of the non-linear Schrödinger equation. In case of \(\langle {\hat{H}} \rangle \) being equal to a degenerate level, \(\Psi \) becomes a linear combination of the degenerate states.

Based on this conclusion, one may write down the variational principle

$$\begin{aligned} \delta \left\langle \Phi | \Bigg ( {\hat{H}} \ - \ \frac{\langle \Phi | {\hat{H}} |\Phi \rangle }{\langle \Phi |\Phi \rangle } \Bigg )^2 | \Phi \right\rangle \ = \ 0 \end{aligned}$$
(14)

where \(\Phi \) is a (class of) trial function(s). One may distinguish two cases:

  1. (a)

    \(\Phi \) is parametrized sufficiently well so that the family of functions it represents contains one or more true self-consistent eigenvector of the nonlinear Eq. (2). In this case the absolute minimum of the non-negative functional is 0, i.e., the second centralized moment vanishes and the optimal \(\Phi \) becomes an exact eigenstate.

  2. (b)

    \(\Phi \) is less flexible than in case (a), so that solving the variational problem (14) does not yield \(m_2=0\). In this case the variation of \(\Phi \) provides the best possible approximation within the permitted function class, resulting the smallest possible positive value for \(m_2\). The optimal \(\Phi \) serves as an approximate eigenfunction.

It is worth mentioning that the above procedure may be, in principle applied to any self-consistent eigenstate of a nonlinear Hamiltonian, though optimization for excited states may not be easy. Numerically stable procedures for determining excited states variationally, that make use of second centralized moments, were proposed recently [31, 32] for linear Hamiltonians.