Introduction

Optical phase estimation is ubiquitous in many fundamental and practical problems ranging from quantum state preparation1,2,3, sensing4, communications5,6,7,8,9,10,11,12,13,14, and information processing15. In a photonic metrology problem16,17,18,19, an optical probe interacts with a physical system to interrogate its properties. This interaction maps parameters of the system to the state of the optical probe, where an optimal readout can be performed15,17,18,19. When the physical property of the system is mapped onto the phase of the optical probe, the optimal quantum measurement is the canonical phase measurement20, which consists of projections onto phase eigenstates21. However, while theoretically the canonical phase measurement exists, the physical realization of projections onto phase eigenstates are not physically known22. Thus in practical estimation problems in quantum metrology one seeks to determine the limits in precision of physically realizable measurements, and the degree with which they approach to the fundamental quantum limit in sensitivity17,23,24,25,26,27.

Physically realizable measurements of the optical phase have been widely investigated20,21,28 for diverse metrological problems with quantum and classical fields29,30,31,32,33 including sensing small deviations from a known phase29,30,31,32,33,34 and estimation with repeated sampling35,36 and measurements30,37,38,39,40,41. Beyond these specific estimation problems, measurements of the phase of a single optical mode in a single-shot are central for quantum state preparation and detection42,43, waveform estimation and sensing44,45,46,47, and quantum information processing48,49,50. The standard measurement for optical phase estimation is the heterodyne measurement21, which samples both quadratures of the electromagnetic field simultaneously from which the phase can be estimated. However, the achievable sensitivity of the heterodyne measurement21 is far below the ultimate measurement sensitivity allowed by physics, given by the canonical phase measurement21,51. Adaptive measurement techniques based on homodyne detection, a Gaussian measurement, can be used to align the phase quadrature of the optical field with the measurement setting where the homodyne measurement provides maximum sensitivity32. Adaptive homodyne has been theoretically shown to surpass the heterodyne limit and get closer to the canonical phase measurement for optical phase estimation of coherent states20,21, providing the most sensitive Gaussian measurement of the optical phase so far21.

In a complementary measurement paradigm, quantum measurements of coherent states based on photon counting, displacement operations, and feedback2,6,8,9,10,11,12,13,14,52 have enabled state discrimination below the Gaussian sensitivity limits and approaching the Helstrom bound5. Recently, some of the authors proposed and demonstrated a non-Gaussian measurement strategy for single-shot phase estimation of coherent states, able to surpass the heterodyne limit and approach the sensitivity limit of a canonical phase measurement in the presence of loss and noise of real systems2. These measurement strategies are based on realizing coherent displacements of the input field and monitoring the output field with photon number resolving detection. The information from the detection outcomes is then used to implement real-time feedback of displacement operations optimized to maximize the measurement sensitivity of the phase of the input state. This estimation strategy is the most sensitive single-shot non-Gaussian measurement of a completely unknown phase encoded in optical coherent states so far2. In this strategy the optimization of the displacement operation is realized by maximizing either the information gain in subsequent adaptive steps of the measurement or the sharpness of the posterior phase distribution. While these cost functions are functionally different, both perform similarly and get close to the ultimate sensitivity allowed by physics, the Crámer-Rao lower bound (CRLB), in the limit of many photons and many adaptive measurements. While the work in ref. 2 demonstrated the potential of non-Gaussian measurements for single-shot phase estimation, the superiority over adaptive homodyne detection was not proven. A deeper understanding of the properties of convergence and ultimate limits of the estimators produced by non-Gaussian measurements is still missing. This is an open problem due to the complexity associated with the analysis of these adaptive non-Gaussian strategies.

In this article we investigate a family of adaptive non-Gaussian strategies based on photon counting for single-shot optical phase estimation, and assess their performance compared to Gaussian measurements and the canonical phase measurement. To analyze these non-Gaussian strategies, we use the mathematical framework of Bayesian optimal design of experiments, which provides a natural description of non-Gaussian adaptive strategies, allowing us to investigate their fundamental characteristics and determine the limits in sensitivity, which up to now has been a challenging problem. Our work provides a comprehensive statistical analysis of adaptive non-Gaussian measurements for parameter estimation, and the requirements to approach optimal bounds in the asymptotic limit. We show that strategies based on photon counting and feedback for single-shot phase estimation of coherent states provide superior sensitivity over the best known adaptive Gaussian strategies, having the same functional form as the canonical phase measurement in the asymptotic limit, differing only by a scaling factor. This work provides a deep insight into the potential of optimized non-Gaussian measurements for quantum communication, metrology, sensing, and information processing.

Results

Holevo variance of non-Gaussian estimation strategies

The non-Gaussian phase estimation strategies investigated here are based on photon counting, displacement operations, and feedback, and are optimized by maximizing a specific cost function. These strategies maximize either the estimation precision (by minimizing the Holevo variance51), or the information gain about the unknown parameter based on entropy measures, including mutual information, the Kullback-Lieber divergence, and the conditional entropy53,54. We note that these cost functions produce non-identifiable likelihood functions that do not allow to correctly estimate a cyclic parameter, such as the phase55. To address this problem, these non-Gaussian strategies use the Fisher information to optimize the displacement operations, which are the dynamical control variable in the strategy, to guarantee that these cost functions provide identifiable likelihood functions, and to enable optical phase estimation with near-optimal performance.

In the problem of single-shot phase estimation with coherent states, an electromagnetic field in a coherent state \({\rho }_{0}=\left|\alpha \right\rangle \left\langle \alpha \right|\) interacts with a physical system and experiences a unitary transformation \({e}^{{{{\rm{i}}}}\phi \hat{n}}\), where \(\hat{n}\) is the number operator. The phase ϕ induced in the coherent state carries information about the system, which can be extracted by a measurement of the output state \(\rho (\phi)={e}^{{{{\rm{i}}}}\phi \hat{n}}{\rho }_{0}{e}^{-{{{\rm{i}}}}\phi \hat{n}}=\left|{e}^{{{{\rm{i}}}}\phi }\alpha \right\rangle \left\langle {e}^{-{{{\rm{i}}}}\phi }\alpha \right|\,\).

Measurements onto ρ(ϕ), together with an estimator \(\widehat{\phi }\) on the measurement outcomes, provide an estimate of ϕ, and a measurement strategy aims to obtain the best estimation of the physical parameter. The efficiency of such a strategy is characterized by the estimator variance as a function of the number of photons in the coherent state. The most efficient strategy provides a variance with the highest convergence rate towards zero as the number of photons increase.

The standard measurement paradigm for phase estimation of Gaussian states is the heterodyne measurement (a Gaussian measurement), with an estimator variance of \({{{\rm{Var}}}}[{\widehat{\phi }}_{{{{\rm{Het}}}}}]=1/2| \alpha {| }^{2}\). Within the paradigm of Gaussian measurements, adaptive homodyne strategies optimized to minimize the Holevo variance have achieved the best performance among Gaussian measurements for single-shot phase estimation of coherent states20. The best adaptive Gaussian measurement reported to date, termed the Adaptive Mark-II (MKII), achieves a Holevo variance in the limit of large number of photons of:

$${{{\rm{Var}}}}\left[{\widehat{\phi }}_{{{{\rm{MKII}}}}}\right]\approx \frac{1}{4| \alpha {| }^{2}}+\frac{1}{8| \alpha {| }^{3}}.$$
(1)

While this optimized Gaussian strategy surpasses the heterodyne limit, it has an error of order 1/α3 above the Cramér-Rao Lower Bound (1/4α2), and does not reach the performance of the canonical phase measurement21:

$${{{\rm{Var}}}}\left[{\widehat{\phi }}_{{{{\rm{CPM}}}}}\right]\approx \frac{1}{4| \alpha {| }^{2}}+\frac{5}{32| \alpha {| }^{4}}.$$
(2)

In this work, we numerically show that non-Gaussian strategies for single-shot phase estimation based on photon counting, optimized displacement operations, and real-time feedback achieve an estimator variance smaller than Gaussian strategies with an asymptotic scaling in the limit of high mean photon numbers of:

$${{{{\rm{Var}}}}}_{{{{\rm{H}}}}}\left[\widehat{\phi }\right]\approx \frac{0.250\pm 0.001}{| \alpha {| }^{2}}+\frac{0.520\pm 0.010}{| \alpha {| }^{4}}.$$
(3)

Figure 1a summarizes the main result comparing the three asymptotic variances for the canonical phase measurement (solid blue), MKII (solid red), and non-Gaussian strategies (solid green and points). Figure 1b, shows the excess Holevo variance compared to the canonical phase measurement (\({{{\rm{Var}}}}[\cdot ]-{{{\rm{Var}}}}[{\widehat{\phi }}_{{{{\rm{CPM}}}}}]\)) for Heterodyne (purple), MKII (red), and non-Gaussian strategies (green).

Fig. 1: Asymptotic variances for different phase measurements.
figure 1

a Holevo variance in the limit of large mean photon number (MPN) α2 for the canonical phase measurement in Eq. (2), the most sensitive Gaussian measurement know to date (MK II)21 in Eq. (1), and the non-Gaussian strategies in Eq. (3). The points show numerically calculated values for the non-Gaussian strategies in a region where the analytical expression is not valid. Error bars represent a 1-σ standard deviation. b Excess phase variance (Var[  ] - \({{{\rm{Var}}}}[{\widehat{\phi }}_{{{{\rm{CPM}}}}}]\)) for Heterodyne, MKII, and non-Gaussian strategies.

These non-Gaussian strategies implement a series of adaptive steps with displacement operations optimized to maximize information gain, while ensuring efficient phase estimators in the asymptotic limit. These strategies surpass the best known Gaussian strategy in Eq. (1), and have the same functional form as the canonical phase measurement in the asymptotic limit, differing only by a scaling factor, thus providing the highest sensitivity among physically-realizable measurements for single-shot phase estimation of coherent states known to date.

Achieving this performance with non-Gaussian estimation strategies, however, requires a deep understanding of the measurement process. To gain this understanding, we use the mathematical framework of Bayesian optimal experimental design, which provides a natural description of adaptive non-Gaussian measurements. This allows us to optimize these strategies for single-shot phase estimation with a Holevo variance given by Eq. (3).

Bayesian optimal design of experiments

Phase estimation of coherent states based on photon counting with adaptive coherent displacement operations can be defined in the context of Bayesian optimal design of experiments. Optimal design of experiments allows for improving statistical inferences about quantities of interest by appropriately selecting the values of control variables known as designs54,56,57. In this framework, it is assumed that the experimental data y (the measurement outcomes) can be modeled as an element of the set

$${{{{\mathcal{P}}}}}_{{{\Phi }}}=\left\{p(y| {{{\bf{d}}}},{{{\boldsymbol{\phi }}}}),\quad {{{\bf{d}}}}\in {{{\mathcal{D}}}}\,\,{{{\boldsymbol{\phi }}}}\in {{\Phi }}\right\},$$
(4)

where d is a parameter called design chosen from some set \({{{\mathcal{D}}}}\) called design space, ϕ Φ is an unknown parameter to be estimated, and the data y comes from a sample space \({{{\mathcal{Y}}}}\subseteq {\mathbb{R}}\). In this paradigm, the experimenter has full control over the designs d and the ability to adjust them prior to making a measurement. This allows for optimizing such measurement for estimating the unknown parameter ϕ. Bayesian optimal design of experiments goes beyond standard methods for parameter estimation based on Bayesian statistical inference58,59,60,61,62, by providing the suitable mathematical framework to ensure optimal designs to find efficient estimators for a general parameter space63,64,65,66.

In the Bayesian approach of optimal design54,56, the initial lack of knowledge about ϕ is modeled as a prior probability distribution p(ϕ). The measurement aims to reduce the uncertainty of ϕ as much as possible by the use of Bayes’ theorem over the prior distribution. In an estimation problem, the optimal choice for the designs \({{{\bf{d}}}}\in {{{\mathcal{D}}}}\) maximize the expected value of a cost function U(d, ϕ, y) with respect to the possible outcomes of y and ϕ:

$$\begin{array}{rcl}{{{{\bf{d}}}}}^{{{{\rm{opt}}}}}&=&\arg \mathop{\max }\limits_{{{{\bf{d}}}}\in {{{\mathcal{D}}}}}{{{\rm{E}}}}\left[U({{{\bf{d}}}},{{{\boldsymbol{\phi }}}},y)\right]\\ &=&\arg \mathop{\max }\limits_{{{{\bf{d}}}}\in {{{\mathcal{D}}}}}{\int}_{{{{\mathcal{Y}}}}}{\int}_{{{\Phi }}}U({{{\bf{d}}}},{{{\boldsymbol{\phi }}}},y)p({{{\boldsymbol{\phi }}}}| {{{\bf{d}}}},y)d{{{\boldsymbol{\phi }}}}p(y| {{{\bf{d}}}})dy\\ &=&\arg \mathop{\max }\limits_{{{{\bf{d}}}}\in {{{\mathcal{D}}}}}{\int}_{{{{\mathcal{Y}}}}}{\int}_{{{\Phi }}}U({{{\bf{d}}}},{{{\boldsymbol{\phi }}}},y)p\left({{{\boldsymbol{\phi }}}},y| {{{\bf{d}}}}\right)d{{{\boldsymbol{\phi }}}}dy.\end{array}$$
(5)

A standard approach in optimal design of experiments considers choosing dopt at the beginning of the experiment an then sample data from p(ydopt, ϕ) for all subsequent trials. An alternative approach considers dynamically updating dopt on each trial, as more data is collected. The advantage of this approach is that adaptive estimation strategies are never less efficient than the non-adaptive ones67.

The implementation of Bayesian optimal design of experiments requires a cost function, such as the Kullback-Lieber divergence between the prior and posterior distributions68:

$$\begin{array}{lll}U({{{\bf{d}}}},y)&\!\!=\!\!&{D}_{{{{\rm{KL}}}}}\left[p({{{\boldsymbol{\phi }}}}| {{{\bf{d}}}},y)| | p({{{\boldsymbol{\phi }}}})\right]\\ &\!\!=\!\!&{\int}_{{{\Phi }}}p({{{\boldsymbol{\phi }}}}| {{{\bf{d}}}},y)\log \left[\frac{p({{{\boldsymbol{\phi }}}}| {{{\bf{d}}}},y)}{p({{{\boldsymbol{\phi }}}})}\right]d{{{\boldsymbol{\phi }}}}.\end{array}$$
(6)

The Kullback-Lieber divergence provides a distance between probability distribution p(ϕd, y) and p(ϕ)53. If p(ϕd, y) is equal to p(ϕ) then U(d, y) = 0 and there is not any gain of information about ϕ by measuring with design d and outcome y.

Another cost function considered in optimal design is the conditional entropy between the plausible values of ϕ and y

$$\begin{array}{l}U({{{\bf{d}}}})=-H({{{\boldsymbol{\phi }}}}| Y)\\\qquad\;\, =-\mathop{\sum}\limits_{y\in {{{\mathcal{Y}}}}}p(y){\int}_{{{\Phi }}}p({{{\boldsymbol{\phi }}}}| {{{\bf{d}}}},y)\log \left[p({{{\boldsymbol{\phi }}}}| {{{\bf{d}}}},y)\right]d{{{\boldsymbol{\phi }}}},\end{array}$$
(7)

which is a measure of how much information is needed to describe the outcomes of the random variable ϕ given that the value of another random variable Y is known53. However, we note that cost functions Eq. (6) and Eq. (7) can be related to the mutual information53:

$$\begin{array}{rcl}U({{{\bf{d}}}})=I({{{\boldsymbol{\phi }}}}| Y)&=&{{{{\rm{E}}}}}_{y}\left[{D}_{{{{\rm{KL}}}}}\left[p({{{\boldsymbol{\phi }}}}| {{{\bf{d}}}},y)| | p({{{\boldsymbol{\phi }}}})\right]\right]\\ &=&H(p({{{\boldsymbol{\phi }}}}))-H({{{\boldsymbol{\phi }}}}| Y).\end{array}$$
(8)

As a result, designs d maximizing any of these cost functions in Eq. (6), Eq. (7), or Eq. (8) are equivalent54,56,69. Moreover, in the asymptotic limit, maximizing these cost functions is equivalent to minimizing the determinant of the covariance matrix (D-optimality design criteria), that is, in the asymptotic limit, optimizing these cost functions is equivalent to optimizing any member within the family of D-optimal designs54,68.

Our goal is to apply the theory of Bayesian optimal design of experiments to the problem of phase estimation of coherent states with photon counting and adaptive coherent displacement operations. The adaptive non-Gaussian estimation strategy consists of several parts: i) in the first adaptive step, it uses an specific cost function and the prior information to choose the design; ii) then, it performs a measurement; iii) based on the measurement outcome, it uses Bayes’ theorem to update the probability distribution; iv) and lastly, it uses a recursive approach, where this posterior probability distribution becomes the prior of the subsequent measurement step i). The estimation of ϕ at each adaptive step is obtained from the maximum posterior estimator (MAP) of the posterior probability distribution. This approach requires that the MAP converges to the true value of the parameter when the number of measurements increases.

In the adaptive mathematical framework of optimal experimental design, Paninski67 proved that under a set of regular modelling conditions and in the case when \(\phi \in {\mathbb{R}}\), cost functions based on mutual information can allow for designs that lead to asymptotically consistent and efficient MAP estimators with variance

$${\sigma }_{{{{\rm{INFO}}}}}^{2}={\left(\mathop{{{\arg\max}}}\limits_{C\in co\left(F(\phi ;{{{\bf{d}}}})\right)}\left|C\right|\right)}^{-1}.$$
(9)

Here \(co\left(F(\phi ;{{{\bf{d}}}})\right)\) denotes the convex closure of the set of Fisher information functions F(ϕ; d). In other words, the estimations produced by \(\widehat{\phi }\) converge to ϕ (consistency), and the distribution of \(\widehat{\phi }\) tends to a normal distribution with mean ϕ and variance given by Eq. (9) when the number of adaptive steps tends to infinity (efficiency).

Formally, the regularity conditions introduced in67 can be stated as follows:

  1. 1.

    The parameter space Φ is a compact metric space.

  2. 2.

    The log likelihood, \(\log \left(p(y| \phi ,{{{\bf{d}}}})\right)\) is uniformly Lipschitz in ϕ with respect to some dominating measure on \({{{\mathcal{Y}}}}\).

  3. 3.

    The likelihood function is identifiable for ϕ, that is, the likelihood function has a unique global maximum.

  4. 4.

    The prior distribution, assigns a positive probability to any neighborhood of the real value of ϕ.

  5. 5.

    The Fisher information functions F(ϕ; d) are well defined for any ϕ Φ and \({{{\bf{d}}}}\in {{{\mathcal{D}}}}\).

  6. 6.

    The maximum of the convex closure of the set of Fisher information functions \(\left\{F(\phi ,{{{\bf{d}}}})| {{{\bf{d}}}}\in {{{\mathcal{D}}}},\,\phi \in {{\Phi }}\right\}\) must be positive-definite, i.e., \(\mathop{\max }\nolimits_{C\in co\left(F(\phi ;{{{\bf{d}}}})\right)}\left|C\right| \,>\, 0\).

We note that in the case of estimation of a scalar parameter, the maximization of mutual information is equivalent to the minimization of the mean square error (MSE)67:

$$\begin{array}{l}\rm{MSE}(\widehat{\phi })=\rm{E}\left[{\left(\widehat{\phi }-\phi \right)}^{2}\right]\\\qquad\qquad\; =\,{{{\rm{Var}}}}\left(\widehat{\phi }\right)+{\left(\rm{E}\left[\widehat{\phi }\right]-\phi \right)}^{2}.\end{array}$$
(10)

This shows that the MSE is a trade-off between the estimator’s variance and its bias. As a result, since the phase is a scalar quantity, in the asymptotic limit both the mutual information Eq. (8) and the mean square error Eq. (10) are appropriate cost functions to find optimal estimation strategies. Even more, for unbiased estimators (such as the MAP estimator on the asymptotic limit), the MSE corresponds to the estimator variance, then the maximization of mutual information is equivalent to the minimization of estimator variance. In practice, however, an estimation strategy would use a cost function that can be calculated efficiently and with a high rate of convergence.

Phase estimation in coherent states

Optimal phase estimation of coherent states of light aims to obtain the best estimate, from the outcomes of a physical measurement, of an unknown phase ϕ [0, 2π) encoded in a coherent state \(\rho (\phi)=|{e}^{i\phi }\alpha \rangle \langle {e}^{-i\phi }\alpha |\). The most general description of a physical measurement is given by a POVM, a positive operator-valued measure. A measurement M with a discrete set of outcomes \(\left\{m\,| \,m\in S\subseteq {\mathbb{Z}}\right\}\), can be represented as a POVM \(M=\left\{M(m)\,| \,m\in S\right\}\), where the operators M(m) are positive bounded M(m) > 0 and resolve the identity ∑mM(m) = I, mS70. By the Born rule, the probability for m conditioned to ϕ is

$${{{\rm{Tr}}}}\left[M(m)\rho (\phi )\right]=p\left(m| \phi \right).$$
(11)

According to information theory, if an estimator \(\widehat{\phi }\) of ϕ is constructed using a sample from a POVM M, the limit in the accuracy of \(\widehat{\phi }\) is given by the Crámer-Rao Bound71,72

$${{{{\rm{Var}}}}}_{\phi }\left[\widehat{\phi }\right]\ge \frac{1}{{F}_{M}(\phi )}.$$
(12)

Here FM(ϕ) is the Fisher information of M about ϕ, which quantifies how much information about ϕ is carried in a sample from M:

$${F}_{M}(\phi )={{{{\rm{E}}}}}_{\phi }\left[{\left(\frac{\partial }{\partial \phi }\left[\log \left(p\left(m| \phi \right)\right)\right]\right)}^{2}\right].$$
(13)

Since the Fisher information in Eq. (13) depends on the POVM M, the maximization of the Fisher information over all POVMs provides the lowest possible Cramér-Rao bound. This maximum Fisher information over all POVMs is known as the quantum Fisher information FQ (QFI), and the lowest possible bound in the accuracy of \(\widehat{\phi }\) is known as the quantum Cramér-Rao bound (QCRB)5,72,73. In the case of phase estimation of coherent states FQ = 4α2, and the QCRB is:

$${{{{\rm{Var}}}}}_{\phi }\left[\widehat{\phi }\right]\ge \frac{1}{4| \alpha {| }^{2}}.$$
(14)

In the limit of large photon number α 1, the QCRB is saturated by the canonical phase measurement (the optimal phase measurement), which is described by the POVM51:

$$M(\widehat{\phi })=\frac{1}{2\pi }\mathop{\sum }\limits_{n,m=0}^{\infty }\left|n\right\rangle \left\langle m\right|{e}^{{{{\rm{i}}}}\left(n-m\right)\widehat{\phi }},$$
(15)

where \(\left|n\right\rangle\) is an eigenstate of the number operator \(\hat{n}\). The operator \(M(\widehat{\phi })\) is an element of the canonical phase measurement whose outcome is a number \(\widehat{\phi }\in \left[0,2\pi \right)\), which can be used as an estimation for ϕ.

The optimality of the canonical phase measurement indicates that its Holevo variance

$${V}_{{{{\rm{CPM}}}}}={\left|{e}^{-{\alpha }^{2}}\mathop{\sum }\limits_{n = 0}^{\infty }\frac{{\alpha }^{2n+1}}{n!\sqrt{n+1}}\right|}^{2}-1,$$
(16)

is the fundamental bound of estimation precision20,51. Although there are proposals that attempt to implement this POVM, they have not been able to reach the fundamental bound Eq. (16), or Eq. (2). For instance, in74 it was possible to obtain the canonical measurement distribution as the marginal of a joint measurement in phase space producing a worse performance in the context of phase estimation. Moreover, the canonical phase measurement was implemented for the case of one-photon wave packet using quantum feedback22. However, for the case of higher dimensional states, such as coherent states, this problem remains open.

While there is not a satisfactory known method to implement the canonical phase measurement, Gaussian strategies serve as a standard of physically realizable measurement techniques for phase estimation of coherent states. The natural benchmark in the Gaussian strategies is the heterodyne detection, whose variance is lower bounded by \({{{\rm{Var}}}}[{\widehat{\phi }}_{{{{\rm{Het}}}}}]=1/2| \alpha {| }^{2}\)21. Several adaptive Gaussian schemes have been shown to exceed the lower bound for heterodyne detection. The most efficient Gaussian measurement reported to date, termed the Adaptive Mark-II (MKII) strategy20, has a variance in the limit of α 1 given by Eq. (1). Nonetheless, these adaptive Gaussian strategies cannot reach the performance for the canonical phase measurement in Eq. (2)21.

The proposed non-Gaussian strategies for single-shot phase estimation of coherent states are based on optimized adaptive measurements with photon number resolving detection. These non-Gaussian strategies are able to outperform the best known Gaussian strategies and closely follow the performance of the canonical phase measurement in the limit of large photon number.

Adaptive non-Gaussian phase estimation

The proposed non-Gaussian adaptive estimation strategies based on adaptive photon counting2 aim to estimate the phase ϕ Φ = [0, 2π) of a coherent state \(\rho (\phi )=\left|{e}^{{{{\rm{i}}}}\phi }\alpha \right\rangle \left\langle {e}^{-{{{\rm{i}}}}\phi }\alpha \right|\,\) with mean photon number \({{{\rm{E}}}}\left[\hat{n}\right]=| \alpha {| }^{2}\) using a finite number of adaptive measurement steps, and based on the prior information p(ϕ) about ϕ. In every adaptive step l = 1, 2,  , L, the input coherent state with energy α2/L interferes with a local oscillator field, which implements a displacement operation \(\hat{D}\left(\beta \right)\left|\alpha \right\rangle =\left|\alpha +\beta \right\rangle ,\,\beta \in {\mathbb{C}},\) with phase and amplitude chosen by some rule, in general depending on previous measurement outcomes. This is followed by a photon number detection measurement with a given photon number resolution (PNR) m of the detectors6, \(m\in {\mathbb{N}}\). In practice, since the energy in each adaptive step is α2/L, the strategy will only require moderate PNR resolution (m < 10) as L increases.

In the first adaptive measurement l = 12, the strategy makes a random guess hypothesis \({\beta }_{0}\in {\mathbb{C}}\) about the optimal β, and applies the POVM

$$\begin{array}{c}{\left\{\hat{D}({\beta }_{0})\left|n\right\rangle \left\langle n\right|{\hat{D}}^{{\dagger} }({\beta }_{0})\right\}}_{n = 0}^{m-1}\cup \\ \left\{{\mathbb{I}}-\mathop{\sum }\limits_{n=0}^{m-1}\hat{D}({\beta }_{0})\left|n\right\rangle \left\langle n\right|{\hat{D}}^{{\dagger} }({\beta }_{0})\right\}\end{array}$$
(17)

over the state \(\left|{e}^{{{{\rm{i}}}}\phi }\alpha /\sqrt{L}\right\rangle\). In the POVM in Eq. (17), the sum over Fock states \(\left|n\right\rangle \left\langle n\right|\) models the photon detection on the displaced state with a detector with PNR(m)6. The corresponding Wigner function describing a photon-number resolving detector shows non-Gaussian features with negative values. For this reason, these adaptive estimation techniques are called “non-Gaussian”, despite that the estimator produced is asymptotically normal (this result will be proved in the remainder of this section).

Given the detection outcome n1 in l = 1, the posterior probability distribution becomes

$$p\left(\phi | {n}_{1};{\beta }_{0}\right)\propto {{{\mathcal{L}}}}(\phi | {n}_{1};{\beta }_{0})p(\phi ),$$
(18)

where \({{{\mathcal{L}}}}\left(\phi | {n}_{1};{\beta }_{0}\right)\) is the likelihood function given by

$$\begin{array}{rcl}{{{\mathcal{L}}}}\left(\phi | {n}_{1};{\beta }_{0}\right)&=&p\left({n}_{1}| \phi ;{\beta }_{0}\right)\\ &=&{{\mathrm{Tr}}}\,\left[\hat{D}({\beta }_{0})\left|{n}_{1}\right\rangle \left\langle {n}_{1}\right|{\hat{D}}^{{\dagger} }({\beta }_{0})\rho (\phi )\right].\end{array}$$
(19)

The phase estimate ϕ1 in this adaptive step corresponds to the MAP estimator \({\widehat{\phi }}_{{{{\rm{MAP}}}}}\), \({\phi }_{1}={\widehat{\phi }}_{{{{\rm{MAP}}}}}({n}_{1})\), with the posterior probability distribution in Eq. (18). Using the posterior phase distribution in Eq. (18) as the prior for the next adaptive step l = 2, the strategy optimizes a cost function U(β) to obtain the next value of β denoted as β1, and implements the POVM in Eq. (17) with β1. The Bayesian updating procedure is repeated at each step l ≥ 2. After l adaptive measurements the posterior probability distribution becomes

$$\begin{array}{rcl}p(\phi | {{{\bf{n}}}},{{{\boldsymbol{\beta }}}})&=&p(\phi | {n}_{l},\ldots ,{n}_{1},{\beta }_{l-1},\ldots ,{\beta }_{0})\\ &\propto &\mathop{\prod }\limits_{i=1}^{l}p({n}_{i}| \phi ,{\beta }_{i-1})p(\phi ).\end{array}$$
(20)

Here ni is the observed photon detection at step i. Using the MAP on this phase distribution, we obtain the lth estimation \({\widehat{\phi }}_{l}\). The procedure is repeated until the last measurement step L.

This parameter estimation strategy is a particular case of Bayesian optimal design of experiments, where the parameters \(\beta \in {\mathbb{C}}\) are the designs, and which are optimized to estimate a phase ϕ [0, 2π). Since the optimal value for β in each adaptive step depends on all previous detection results, the cost function to be optimized is a function of the posterior distribution in Eq. (20). A suitable choice of cost function, such as the mutual information or the estimator variance, can provide a sequence of estimations \({\widehat{\phi }}_{n}\) that approaches the true value of ϕ2.

In the case of estimation of cyclic parameters, such as phase estimation, the posterior distribution in Eq. (20) is 2π periodic, and the moments of \(\widehat{\phi }\) cannot be calculated as in the linear case51. In such situations, the first moment of the cyclic random variable X is defined as \(\rm{E}\left[{e}^{{{{\rm{i}}}}X}\right]\), and the dispersion of an estimator \(\widehat{\phi }\) is calculated using the Holevo Variance51:

$${{{{\rm{Var}}}}}_{{{{\rm{H}}}}}\left[\widehat{\phi }\right]=\frac{1}{{\left|\rm{E}\left[{e}^{{{{\rm{i}}}}\widehat{\phi }}\right]\right|}^{2}}-1,$$
(21)

which is the analogous to the mean square error. The minimization of the uncertainty about ϕ (positive square root of Eq. (21)), requires maximization of \(S(\beta ,m)=|\rm{E}[{e}^{{{{\rm{i}}}}\widehat{\phi }}]|\), known as the sharpness of the distribution. Then, the suitable cost function for the adaptive protocol is the average sharpness:

$$\bar{S}(\beta ,m)=\mathop{\sum }\limits_{n=0}^{m}p(n)\left|{\int}_{{{\Phi }}}{e}^{{{{\rm{i}}}}\phi }p\left(\phi | n,\beta \right)d\phi \right|.$$
(22)

Identifiability of likelihood

To guarantee a consistent asymptotic estimator the optimized estimation strategies require to satisfy the regularity conditions 1-6 described in section “Bayesian optimal design of experiments”. For phase estimation, the regularity conditions 1 and 2 are satisfied given that ϕ is an interior point of Φ = [0, 2π). Moreover, given that the probability

$$\begin{array}{lll}p(n| \phi ;\beta )&=&{{{\rm{Tr}}}}\left[\left|\frac{\alpha {e}^{{{{\rm{i}}}}\phi }}{\sqrt{L}}\right\rangle \left\langle \frac{\alpha {e}^{{{{\rm{i}}}}\phi }}{\sqrt{L}}\right|\hat{D}(\beta )\left|n\right\rangle \left\langle n\right|{\hat{D}}^{{\dagger} }(\beta )\right]\\ &=&\frac{\exp \left(-{\left|\frac{\alpha {e}^{{{{\rm{i}}}}\phi }}{\sqrt{L}}-\beta \right|}^{2}\right){\left|\frac{\alpha {e}^{{{{\rm{i}}}}\phi }}{\sqrt{L}}-\beta \right|}^{2n}}{n!},\alpha \in {{\mathbb{R}}}_{+},\beta \in {\mathbb{C}}\end{array}$$
(23)

is well defined and two times differentiable, the conditions 4, 5, and 6 are directly satisfied. However, the condition 3 is not satisfied in general. If one chooses β as the value that maximizes the mutual information (8) or the average sharpness (22), the resulting likelihood function can have in general two maxima, that is, a non-identifiable likelihood function30. In that case it is not possible to guarantee the existence of a consistent estimator.

To address the challenge of working with non-identifiable likelihood functions, we use designs with a fixed relation between the phase of β and the amplitude β, given by β = f(θ)eiθ, with θ [0, 2π) and f(θ) a real function. These experimental designs result in a cost function U that is a function of θ.

To see how this method solves the problem of non-identifiability, we observe that the probability p(nϕ; β) in Eq. (23) is Poisson distributed,

$$p(n| \phi ;\beta =| \beta | {e}^{{{{\rm{i}}}}\theta })=\frac{{e}^{-\lambda }\cdot {\lambda }^{n}}{n!},$$
(24)

with \(\lambda =| \alpha {| }^{2}/L+| \beta {| }^{2}-2| \alpha | | \beta | \cos \left(\phi -\theta \right)/\sqrt{L}\). Then, for L adaptive steps with results n = (n1, …, nL), the likelihood function is the product of L probability functions of the form of Eq. (24):

$${{{\mathcal{L}}}}({{{\bf{n}}}},\phi )=\mathop{\prod }\limits_{i=1}^{L}p({n}_{i}| \phi ;{\beta }_{i}=| \beta | {e}^{{{{\rm{i}}}}{\theta }_{i}})=\mathop{\prod }\limits_{i=1}^{L}\frac{{e}^{-{\lambda }_{i}}\cdot {\lambda }_{i}^{{n}_{i}}}{{n}_{i}!}.$$
(25)

Here, each θi depends on the cost function and previous POVM outcomes. The choice of the experimental designs with β = f(θ) can force the adaptive strategy to change θi in each step. In this case, the likelihood function in Eq. (25) becomes identifiable because the product of probability functions with different θi produces a likelihood with a global maximum. To see this, note that if θi = θ is fixed, even if the parameter β is different in each adaptive step of the protocol, the likelihood function from a sequence of independent random variables with probability function given by Eq. (24) has two maxima over Φ. One of the maxima will always be around ϕ and the other will depend on the value of θ. On the other hand, if the strategy allows θ to change at each adaptive step, the second maximum is suppressed, since the functions whose product constitutes the likelihood Eq. (25) will each have a second maxima at different positions θi ≠ θj (i ≠ j). As a result, with the experimental designs \(\beta =f(\theta )\exp ({{{\rm{i}}}}\theta )\), the likelihood functions satisfy all the regularity conditions described in section “Bayesian optimal design of experiments”.

Bayesian optimal design of |β|

Any viable estimation strategy should aim to achieve the QCRB (14). Therefore, natural choice for β is the one that maximizes the Fisher information. For a discrete random variable, the Fisher information is given by72,75:

$$F(\phi )=\mathop{\sum }\limits_{n=0}^{\infty }p(n| \phi ){\left(\frac{\partial }{\partial \phi }\log \left(p(n| \phi )\right)\right)}^{2}.$$
(26)

The Fisher information for a particular design β = βeiθ for Poisson distributions Eq. (24) results in

$$F(\phi ;\beta )=\frac{4{\left|\alpha \right|}^{2}{\left|\beta {\sin }^{2}(\phi -\theta )\right|}^{2}}{{\left|\alpha \right|}^{2}+L{\left|\beta \right|}^{2}-2| \alpha | | \beta | \cos (\phi -\theta )\sqrt{L}}.$$
(27)

Optimizing over β, the Fisher information becomes:

$$F(\phi ,{\beta }_{{{{\rm{opt}}}}})=4{\left|\alpha \right|}^{2}/L,$$
(28)

where

$${\beta }_{{{{\rm{opt}}}}}=\frac{| \alpha | }{\sqrt{L}\cos (\phi -\theta )}$$
(29)

is the value of β that maximizes the Fisher information. Unfortunately, since βopt depends on ϕ, its implementation is not practical, because it would be required to already know a priori the very same parameter that one wants to estimate. To address this problem, the optimal Bayesian design βopt can be estimated as:

$${\widehat{\beta }}_{{{{\rm{opt}}}}}=\frac{| \alpha | }{\sqrt{L}\cos (\widehat{\phi }-\theta )},$$
(30)

where \(\widehat{\phi }\) is the MAP estimator. As a result, the non-Gaussian estimation strategy has now only one design, the phase θ. With \(\beta ={\widehat{\beta }}_{{{{\rm{opt}}}}}\), the Fisher information becomes:

$$F({{\Delta }},{\widehat{\beta }}_{{{{\rm{opt}}}}})=\frac{4\,{\sin }^{2}({{\Delta }}){\alpha }^{2}}{L({\cos }^{2}\left(\delta +{{\Delta }}\right)-2\,\cos ({{\Delta }})\,\cos \left(\delta +{{\Delta }}\right)+1)},$$
(31)

where \(\delta =\widehat{\phi }-\phi\) and Δ = ϕ − θ.

Note that \(F({{\Delta }},{\widehat{\beta }}_{{{{\rm{opt}}}}})\) becomes a random variable with outcomes depending on \(\widehat{\phi }\) through \({\widehat{\beta }}_{{{{\rm{opt}}}}}\). Moreover, for a random initial design θ1 and a cost function given by the expected sharpness or the mutual information with \({\widehat{\beta }}_{{{{\rm{opt}}}}}\) in Eq. (30), the likelihood function in Eq. (25) becomes identifiable. As a result, the non-Gaussian strategy then leads to consistent and efficient MAP estimators67.

Asymptotic behavior

In general \(\delta =\widehat{\phi }-\phi \ne 0\), and the Fisher information has a strong dependence on θ. For example, if θ → ϕ then \(F({{\Delta }}=0,{\widehat{\beta }}_{{{{\rm{opt}}}}})\to 0\), and negligible information is gained when the system is measured. Therefore, the optimal value of θ should result in the value of Δ that maximizes the expected value of \(F({{\Delta }},{\widehat{\beta }}_{{{{\rm{opt}}}}})\). By observing that the expected value E[δ2n+1] = 0, with \(n\in {\mathbb{N}}\), we see that the optimal value of Δ is π/2. As a result, an efficient adaptive strategy would make ΔL = ϕ − θL tend to π/2 as L increases. However, in the limit of Δ → π/2 and δ → 0, \(| {\widehat{\beta }}_{{{{\rm{opt}}}}}| \to \infty\) (see Eq. (30)), and we expect that when the strategy is implemented Δ < π/2. These findings are consistent with our numerical simulations of the non-Gaussian adaptive strategy, where we observe that for L 1, Δ → π/2 − ϵ, with ϵ a small positive real number, and β stays finite around a value that the PNR detector in the strategy can resolve. Note that \({\hat{\beta }}_{{{{\rm{opt}}}}}\), which maximizes the Fisher information given θ, does not necessarily maximize the cost function U(β) for \(\beta \in {\mathbb{C}}\). However restricting β to the set that maximizes the Fisher information makes the phase of β change in each step forcing the likelihood to be identifiable. Moreover, when \(\hat{\phi }\to \phi\), \({\hat{\beta }}_{{{{\rm{opt}}}}}\) tends to the value that maximizes U2.

Given that the Fisher Information is additive, for any L measurements made using the optimal design, βopt in Eq. (29), the total Fisher information for this design corresponds to the quantum Fisher information 4α2. However, since βopt is not known, it is not possible to choose a design β for which its Fisher information equals the quantum Fisher information. This is already implied by the fact that the canonical phase measurement does not saturate the Cramér-Rao bound. To show this, we observe that for and estimator very close to the true value \(\delta =\widehat{\phi }-\phi \ll 1\) for L adaptive measurements, the Fisher information (Eq. (31)) is

$$F(\phi ,{\widehat{\beta }}_{{{{\rm{opt}}}}})\approx 4{\left|\alpha \right|}^{2}(1-{\delta }^{2}).$$
(32)

On the other hand, the best possible estimator of δ in each step satisfies \(E\left[{\delta }^{2}\right]\ge 1/4| \alpha {| }^{2}\), so that

$$F(\phi ,{\widehat{\beta }}_{{{{\rm{opt}}}}})\lesssim 4{\left|\alpha \right|}^{2}-1,$$
(33)

independently of L. We conclude that the adaptive non-Gaussian strategies do not saturate the Cramér-Rao bound for finite α. Nevertheless, these adaptive estimation schemes outperform the most sensitive Gaussian strategy known to date, and show a similar asymptotic scaling as the canonical phase measurement.

Performance of non-Gaussian adaptive strategies

We numerically investigate the performance of the estimator produced by non-Gaussian adaptive strategies for phase estimation for different numbers of adaptive steps L, photon number resolution PNR(m), and average photon number α2. To assess the performance of these strategies, we compare our results with the best-known Gaussian measurement for phase estimation, termed Mark II (MKII) strategy21, and with the canonical phase measurement. As discussed in section “Bayesian optimal design of experiments”, the performance of non-Gaussian adaptive strategies using cost functions including the Kullback-Lieber divergence Eq. (6), conditional entropy Eq. (7), mutual information Eq. (8), and expected sharpness Eq. (22) are equivalent in the asymptotic limit. In our numerical simulations, however, we use the expected sharpness in Eq. (22) as the cost function for the optimization of the strategy, because it substantially reduces the number of operations for this optimization and the computational overhead.

In our numerical analysis, we use Monte Carlo simulations and generate sufficient numerical data samples to reduce statistical uncertainties. Figure 2a shows the Holevo variance for the non-Gaussian adaptive strategy, as a function of the number of adaptive steps L for α2 = 1, for different PNR: m = 1, 3, 6. We observe that the adaptive non-Gaussian scheme with PNR(1) and L≥100 (green dots with error bars) outperforms the most sensitive Gaussian strategy known to date, the MKII, (red dashed line). Moreover, strategies with PNR(3) (yellow dots with error bars) and PNR(6) (light blue dots with error bars) outperform the MKII strategy with only L ≈ 30 and L ≈ 20, respectively, achieving a smaller Holevo variance with fewer adaptive measurements compared to PNR(1). We have observed similar behavior for non-Gaussian strategies optimizing different cost functions, such as the mutual information.

Fig. 2: Asymptotic limit for the estimator of Holevo variance and optimal design.
figure 2

a Holevo variance for α2 = 1 and PNR m = 1, 3, 6 as a function of adaptive steps L. Note that the non-Gaussian strategy surpasses the MKII strategy (red dashed line at y = 0.767) with L > 100, L > 30 and L > 20 for PNR(1), PNR(3) and PNR(6), respectively. The lines are obtained fitting the exponential model Eq. (34). Estimates for these strategies result in (A, B, C, RSE) = (0.11 ± 0.02, 0.059 ± 0.014, 0.7145 ± 0.0030, 0.0005) for PNR(6), (A, B, C, RSE) = (0.22 ± 0.04, 0.082 ± 0.0157, 0.719 ± 0.0040, 0.0007) for PNR(3), (A, B, C, RSE) = (0.41 ± 0.05, B = 0.117 ± 0.013, 0.7517 ± 0.0027, 0.0005) for PNR(1). The shaded regions represent 1-σ standard deviation. b Optimal design as a function of L. As L increases the optimal design tends to π/2. Numerical data are obtained with 10000 Monte Carlo sequences. Error bars represent a 1-σ standard deviation.

To investigate the asymptotic behavior of the adaptive non-Gaussian strategy, we assume an exponential dependence for the Holevo variance with L (solid lines in Fig. 2a):

$$y(L,\alpha )={{{\rm{A}}}}{e}^{-{{{\rm{B}}}}\cdot L}+{{{\rm{C}}}}.$$
(34)

The exponential model is a few parameter model that allows us to quantify asymptotic trends when the datasets have rapidly decaying tails. This is a widely used model for studying the asymptotic scaling of estimators as a function of resources in diverse metrological problems64,76,77.

We fit the numerical data from Monte Carlo simulations to Eq. (34) to estimate the constants A, B, and C. Our results show that the asymptotic constant C = 0.751 ± 0.002 for PNR(1), C = 0.719 ± 0.004 for PNR(3), and C = 0.714 ± 0.003 for PNR(6). We observe that these values are smaller than the asymptote of the MKII Gaussian strategy, but larger than the one for the canonical phase measurement (0.673, blue dashed line). We note that while C for PNR(3) is larger than for PNR(6), they are statistically equal due to their uncertainties. This prevents us from drawing any conclusions for larger values of m.

Figure 2b shows the design \({{\Delta }}=\widehat{\phi }-\theta\), the phase of the displacement field, as a function of L. We observe that for non-Gaussian adaptive strategies with increasing PNR, Δ tends to π/2 for large L (L = 200). This observation is consistent with the theoretical framework of optimal design of experiments (see section Bayesian optimal design of β), which states that Δoptimal = π/2 for L → . Moreover, as PNR increases, the strategies show a faster convergence to the asymptotic value of the Holevo variance, which translates in a smaller error in the estimation (see Fig. 2a.) These observations further support our theoretical model of non-Gaussian strategies for phase estimation.

We investigate the asymptotic performance of the estimator variance produced by the non-Gaussian strategy for large α2. Figure 3 shows the Holevo variance for the non-Gaussian adaptive strategy as a function of α2 for different L normalized to the QCRB. We observe that for large α2 (with L = 100), the adaptive non-Gaussian strategy tends to the CRLB.

Fig. 3: Estimator variance produced by the non-Gaussian strategy.
figure 3

Holevo variance of adaptive Non-Gaussian strategies as a function of α2, normalized to QCRB, for different adaptive measurement steps (solid lines), together with the canonical phase measurement (blue dashed line). For L = 100 the Holevo variance differs from the QCRB by 3% exemplifying the tendency to the ultimate precision limit in the regime of large number of photons in the asymptotic limit. Simulations consist of 10000 Monte Carlo simulation sequences with PNR m = 3.

To build a model for the performance of this strategy for large α2, we propose three candidate models for the Holevo variance for L 1:

$${\widehat{y}}_{1}=\frac{{A}_{1}}{{\left|\alpha \right|}^{2}}+\frac{{A}_{2}}{{\left|\alpha \right|}^{3}}+\frac{{A}_{3}}{{\left|\alpha \right|}^{4}},$$
(35a)
$${\widehat{y}}_{2}=\frac{{A}_{1}}{{\left|\alpha \right|}^{2}}+\frac{{A}_{2}}{{\left|\alpha \right|}^{3}},$$
(35b)
$${\widehat{y}}_{3}=\frac{{A}_{1}}{{\left|\alpha \right|}^{2}}+\frac{{A}_{3}}{{\left|\alpha \right|}^{4}},$$
(35c)

to describe our numerical observations in Fig. 3 based on the Monte Carlo simulations. The model that best describes our observations allows us to determine, with a certain degree of confidence, the performance of non-Gaussian adaptive strategies, and compare them with the best Gaussian strategy, Eq. (1), and the canonical phase measurement, Eq. (2).

We discriminate among plausible candidate models using the technique of backward elimination78. We start with the candidate \({\widehat{y}}_{1}\), Eq. (35a), and test the deletion of each variable Ai using the p-value of a hypothesis testing procedure (H0: Ai = 0, HA: Ai ≠ 0). Given that p > 0.1 for A2 and p < 0.001 for A1 and A3, we conclude, with confidence larger than 99%, that the model reflecting the behavior of the data presented in Fig. 3 is \({\widehat{y}}_{3}\), Eq. (35c).

To find A1 and A3 in the limit L →  in model \({\widehat{y}}_{3}\), we fit the Holevo variance as a function of α2 for α2 > 5 to the model \({\widehat{y}}_{3}\) for different values of L. Given a number of adaptive steps L, each fitting provides a set of coefficients \(\left\{{A}_{1},{A}_{3}\right\}\). Hence, to obtain the trend of each coefficient A1 and A2 as L increases, we fit them to an exponential model of the form \({A}_{i}={D}_{i}\exp \left(-{E}_{i}L\right)+{F}_{i}\). Figure 4 shows the coefficients A1 and A3 as a function of L. Adjusting this exponential model and observing that in the limit L 1 Ai → Fi, we obtain A1 = 0.250 ± 0.001 and A3 = 0.52 ± 0.01. Therefore, we conclude with a 99% confidence level that the Holevo variance for the adaptive non-Gaussian strategy in the limit L →  for large α is:

$${{{{\rm{Var}}}}}_{{{{\rm{H}}}}}\left[\widehat{\phi }\right]\approx \frac{0.250\pm 0.001}{{\left|\alpha \right|}^{2}}+\frac{0.520\pm 0.010}{| \alpha {| }^{4}}.$$
(36)
Fig. 4: Coefficients of model \({\widehat{y}}_{3}\) in Eq. (35c) for the Holevo Variance \({{{{\rm{Var}}}}}_{{{{\rm{H}}}}}[\widehat{\phi }]\), as a function of L.
figure 4

Note that in the limit of L →  the coefficient A1 in panel a for the term 1/α2 tends to 0.25, and A3 in panel b of term 1/α4 tends to 0.52. Curves are fits to an exponential model \({A}_{i}={D}_{i}\exp (-{E}_{i}L)+{F}_{i}\) with coefficients (D, E, F, RSE) = (0.065 ± 0.013, 0.060 ± 0.010, 0.250 ± 0.001, 1.87 × 10−7) for panel a, and (D, E, F, RSE) = (0.095 ± 0.015, 0.279 ± 0.119, 0.522 ± 0.010, 1.82 × 10−5) for panel b. The shaded regions represent a 1-σ standard deviation.

This equation is the main result of this work, also shown in Eq. (3). We observe that the Holevo variance for the adaptive non-Gaussian strategy has a similar dependance with α as the canonical phase measurement, differing only in the scaling of the correction term of order 1/α4, see Eq. (2). Moreover, we note that the best Gaussian strategy known to date, the MKII adaptive homodyne, has a correction term of order 1/α320, see Eq. (1). Then, for large α, the non-Gaussian estimation strategies show a much better scaling in the Holevo variance than the best known Gaussian strategy, and closely follows the canonical phase measurement. Furthermore, these non-Gaussian phase estimation strategies can be implemented with current technologies2, and our work demonstrates their superior performance over all the physically realizable strategies for single-shot phase estimation of coherent states reported to date.

The optimized non-Gaussian adaptive strategies based on photon counting analyzed in this work are not the only possible strategies for single-shot phase estimation, and there could be other strategies based on photon counting with better performances. In this work, we studied estimation strategies using cost functions that are consistent with D-optimal designs. However, there may be other cost functions that could provide a further improvements to non-Gaussian strategies. Moreover, we note that the relation in Eq. (30) between the phase and the magnitude of the displacement field was used to obtain identifiable likelihood functions and ensure efficient estimators in the asymptotic limit. While we chose the relation in Eq. (30) because it maximizes the Fisher information, there is no mathematical proof that this choice is optimal, or that other choices for this relation would not provide higher sensitivities. Finally, in the presented adaptive non-Gaussian strategies for phase estimation, the displacement operations were optimized in every adaptive step at a time, i. e. using local optimizations in the adaptive steps. We note, however, that local optimal strategies do not necessarily ensure global optimality13. Strategies with global optimizations, where all adaptive steps are optimized simultaneously, could probably lead to better performances. However, the computational overhead required for performing global optimizations beyond L = 10 adaptive steps prevents us from being able to investigate global optimized strategies. To overcome these limitations, future investigations could make use of machine learning methods, such as neural networks and reinforcement learning79, to lower the complexity of these calculations.

Discussion

Non-Gaussian measurement strategies for phase estimation approaching the quantum limits in sensitivity, as set by the canonical phase measurement, can become a tool for enhancing the performance of diverse protocols in quantum sensing, communications, and information processing. Optical phase estimation approaching the quantum limit can be used to: prepare highly squeezed atomic spin states based on measurement backaction42 for quantum sensing and metrology80; enhance phase contrast imaging of biological samples with optical probes at the few photon level to avoid photodamage and ensure the integrity of the sample; and enhance fidelities in quantum communications with phase coherent states that require phase tracking close to the quantum level between receiver and transmitter with few-photon pulses81,82, while allowing for quantum receivers to decode information encoded in coherent states below the quantum noise limit13,52,83.

As a direct application for quantum information processing, non-Gaussian measurements based on photon counting and displacement operators can be used for full reconstruction of quantum states with on-off detectors84 and PNR detectors85 in a multi-copy state setting by sampling phase space with non-Gaussian projections. The theoretical methods to assess the performance of adaptive non-Gaussian measures for phase estimation described in this work could be used to study methods for adaptive quantum tomography86,87 based on photon counting for high dimensional quantum states, and investigate their asymptotic advantages over adaptive homodyne tomography88.

From the practical point of view, our work provides insight into the design of practical, highly efficient measurement strategies for phase estimation based on photon counting. It shows that non-Gaussian strategies optimizing any cost function within the family of D-optimal designs are equally efficient, and demonstrates the advantages of higher photon number resolution in the strategies to reduce estimation errors. This understanding allows the experimenter to chose cost functions that can be efficiently calculated and optimized to achieve higher convergence rates, while selecting a PNR given the desired/target error budget in the estimation problem for specific applications. This knowledge will be critical for the design and development of future sensors based on photon counting for diverse applications in communication, phase-contrast imaging, metrology, and information processing.

In conclusion, we investigate a family of non-Gaussian strategies for single-shot phase estimation of optical coherent states. These strategies are based on adaptive photon counting measurements with a finite number of adaptive steps implementing coherent displacement operations, optimized to maximize information gain as the measurement progresses2. We develop a comprehensive statistical analysis based on Bayesian optimal design of experiments that provides a natural description of adaptive non-Gaussian strategies. This theoretical framework gives a fundamental understanding on how to optimize these strategies to enable efficient estimators with a high degree of convergence towards the ultimate limit, the Cramér Rao lower bound.

We use numerical simulations to show that optimized adaptive non-Gaussian strategies producing an asymptotically efficient normal estimator achieve a much higher sensitivity than the best Gaussian strategy known to date, which is based on adaptive homodyne20. Moreover, we show that the Holevo variance of the estimator for the adaptive non-Gaussian strategy has a similar dependance as the canonical phase measurement in the asymptotic limit of large photon numbers, differing only by a scaling factor in the second-order correction term. Our work complements the work in single-shot phase measurements for single-photon wave packets in two dimensions22 using quantum feedback with Gaussian measurements, and paves the way for the realization of optimized feedback measurements approaching the canonical phase measurement for higher dimensional states based on non-Gaussian operations.