1 Introduction

The discovery of the accelerated expansion of the Universe (Perlmutter et al. 1998, 1999; Riess et al. 1998) has been one of the major breakthrough in modern cosmology, and also in physics in general. The general framework established in the previous century, where the entire evolution of the Universe was thought to be dominated by matter and radiation, needed to readjust to make space for a new form of energy with negative pressure that can be responsible for this acceleration (that was named dark energy), or, alternatively, to account for some breaking of the well-known general relativity at very large scales. Driven by these pioneering results, in the subsequent decades the scientific and technical efforts of the scientific community were dedicated to the study of methods to measure and characterize this accelerated expansion, and to the development of large facilities providing massive datasets to be analyzed. In this process, a few of these methods, also referred to as cosmological probes, have become standard approaches in the cosmological analysis given the large efforts spent in measurements, theoretical analyses, systematics characterization, and also investments.

A comprehensive review on these methods is provided in Huterer and Shafer (2018). Here we just recall that most of these approaches are based on the determination of some standard properties of astrophysical objects that can be used to calibrate observations and measure the expansion history of the Universe. In particular, it was discovered that the peculiar physical characteristics of some objects allow us to infer a-priori their absolute luminosity, making them standard candles (or standardizable candles) with which it became possible to measure their luminosity distance. Locally, it was found that some stars have a variable luminosity (Cepheids, RR-Lyrae) whose period of variability can be used to determine precisely their absolute luminosity; detached eclipsing binaries have been also used as local distance indicators to determine the distance to the Large Magellanic Cloud (LMC) to <1% (Pietrzyński et al. 2019). At larger distances, it was discovered that also the stars at the Tip of the Red Giant Branch (TRGB), easily identifiable in the upper part of the the Hertzsprung–Russell diagram, can be used as standard candles, having an almost constant I-band magnitude (Lee et al. 1993). Finally, at cosmological distances, Type Ia Supernovae (SNe) have been found to be ideal standardizable candles, since their peak luminosity is found to strictly correlate with their absolute luminosity after a proper calibration (Phillips 1993), allowing us to probe the Universe with precise distance indicators up to \(z\sim 1.5\). Similarly, the analysis of large-scale structures in the Universe highlighted, among other features, the presence of correlated over-densities in the matter distribution at a specific separation of \(r\sim 100\) Mpc/h. This effect, known as Baryon Acoustic Oscillations (BAO), was clearly seen both as wiggles in the power spectrum of galaxies and as a peak in the two-point correlation function (Percival et al. 2001; Cole et al. 2005; Eisenstein et al. 2005), and can be interpreted as the imprint of the sound horizon in the original fluctuations in the photo-baryonic fluid present in the very early Universe. These oscillations have been in particular used as a standard ruler to study the expansion history of the Universe. While the BAO is the most direct probe of the expansion history from large scale-stucture, the massive galaxies and quasars surveys that have enabled the BAO success, enclose also additional signals of great cosmological interest. These analyses are also well established, and, while they might not have reached their full potential yet, they do not qualify as “emerging”.

As a parallel effort, the observation and study of the first light emitted in the Universe, the Cosmic Microwave Background (CMB) radiation, done with several ground- and space-based missions (Smoot et al. 1992; Bennett et al. 2003; Planck Collaboration et al. 2014b; Swetz et al. 2011; Carlstrom et al. 2011) gave us a privileged view on the early Universe, providing fundamental insights on the process of formation and on the main components in that early times. In addition to those, other cosmological probes have been widely used in the past decades to constrain the expansion of the Universe and the evolution of the matter within it. Amongst the most important ones, here we just mention the weak gravitational lensing (see, e.g., Bartelmann and Schneider 2001) and the properties of massive clusters of galaxies, in particular the cluster counts (see, e.g., Allen et al. 2011). Weak gravitational lensing, while being a younger field than CMB or galaxy surveys, has matured tremendously in the past two decades; efforts in this direction have culminated recently with the DES analysis (Dark Energy Survey Collaboration et al. 2016; Abbott et al. 2022) and weak lensing is one of the science driver of future surveys such as the Legacy Survey of Space and Time (LSST) on the Vera Rubin Observatory.Footnote 1

While CMB, BAO, SNe, and the other previously quoted probes have increasingly gained interest in these years in the cosmological community, it soon became clear also that a single probe is not sufficient to constrain accurately and precisely the properties of the components of the Universe. Ultimately, each probe has its own strengths and weaknesses, being sensitive to specific combinations of cosmological parameters, to specific physical processes, specific range of cosmic time, and affected by specific set of systematics. In the end, the only road to move forward in our knowledge of the Universe is found to reside in the combination between complementary cosmological probes, allowing us to break degeneracies between the estimate of parameters, and also to keep under control systematic effects (see, e.g., Scolnic et al. 2018). This point was clearly first highlighted in the Dark Energy Task Force report (Albrecht et al. 2006), and since then the effort of the scientific community proceeded towards that direction, also with space missions specifically designed to take advantage of the synergy between different probes.Footnote 2

With the development of these cosmological probes, it soon begun the era of precision cosmology, where the advances in the instrumental technology, supported by a more mature assessment and reduction of systematic uncertainties and by an increasing volume of data, led to percent and sub-percent measurements of cosmological parameters. However, instead of eventually closing all the questions related to the nature of the accelerated expansion of our Universe and of its constituents, this newly achieved accuracy actually opened even more the Pandora’s box. One of the most pressing issues is that the Hubble constant \(H_0\) as determined from early-Universe probes (CMB) appears to be in significant disagreement with respect to the estimates provided by late Universe (Cepheids, TRGB, masers,...). Many analyses addressed whether this might be due to some systematics hidden in either measurement, but, as of the current status, this seems disfavored (Riess et al. 2011, 2016; Bernal et al. 2016; Di Valentino et al. 2016; Efstathiou 2020; Riess et al. 2020; Di Valentino et al. 2021; Efstathiou 2021; Riess et al. 2021; Dainotti et al. 2021; Riess et al. 2022). At the same time, smaller and less statistically significant differences started arising also in other cosmological parameters as estimated from early- and late-Universe probes, such as the tension in the estimate of dark matter energy density \(\varOmega _{\mathrm{m}}\) and of \(\sigma _8\), the matter power spectrum normalization at 8\(\, h^{-1} \, \mathrm{Mpc}\), often summarized in the quantity \(S_8\equiv \sigma _8\sqrt{\varOmega _{\mathrm{m}}/0.3}\) (Heymans et al. 2013; MacCrann et al. 2015; Joudaki et al. 2017; Hildebrandt et al. 2017; Asgari et al. 2020; Park and Rozo 2020; Joudaki et al. 2020; Tröster et al. 2021; Asgari et al. 2021; Heymans et al. 2021; Amon et al. 2022; Secco et al. 2022; Abbott et al. 2022). All these constraints are pointing toward significant differences of the order of 4–5\(\sigma \), and in the case (if confirmed) this is not attributable to some problems with the data, this may open the road to new physics with which to explain such discrepancies in the measurement of the same quantity probing different cosmic times.

Now that the precision in many standard cosmological probes is close to reaching its maximum, given the current analyses or the ones planned in the near future, a way to take a step forward in our understanding of the Universe is to look for new independent cosmological probes (as also highlighted by Verde et al. 2019; Di Valentino et al. 2021), that could either confirm the discrepancies found, pointing us toward the need of new models, or deny those, helping us to understand better possible systematics, or unknown unknowns. Moreover, the synergy and complementarity between different probes can also help to reduce, when different probes are combined, the uncertainty on cosmological parameters. In general, the diversity between different methods will not only enrich the panorama of ways to look at and study our Universe, but also possibly open new observational and theoretical windows, as happened in the past with the study of CMB, SNe, and BAO.

This is an exciting time for cosmology, and in this review we aim to provide a state-of-art review of the new emerging cosmological probes, discussing how to apply them, the systematics involved, the measurements obtained, and the forecasts of how they could contribute to understand the evolution of the Universe. In particular, we will review cosmic chronometers, quasars, gamma-ray bursts, gravitational waves as standard sirens, time-delay cosmography, cluster strong lensing, cosmic voids, neutral hydrogen intensity mapping, surface brightness fluctuations, stellar ages, secular redshift drift, and clustering of standard candles. In Sect. 2 we will provide a general overview of the basic notation and fundamental equations assumed in the review, in Sect. 3 we will discuss separately each emerging cosmological probe, in Sect. 4 we will discuss the synergy and complementarity between the various described cosmological probes, and in Sect. 5 we will draw our conclusions.

2 Notations and fundamental equations

One of the main assumptions in modern cosmology is the cosmological principle, which describes our Universe at very large scales based on two main premises: the homogeneity (the Universe is the same in every positions) and isotropy (there is no preferential spatial direction). Under this principle, the space-time metric can be described by the Friedmann–Lemaître–Robertson–Walker (FLRW) metric:

$$\begin{aligned} ds^2=-c^2 dt^2+a(t)^2\left( \frac{dr^2}{1-kr^2}+r^2d\theta ^2+r^2\sin ^2\theta d\phi ^2\right) , \end{aligned}$$
(1)

where a(t) is the scale factor, that describes how the universe is expanding relating physical and comoving distances as \(R(t)=a(t) r\), c is the speed of light, \(\theta \) and \(\phi \) are the angles describing the spherical coordinates, and k is the parameter describing the curvature of space; in particular, a \(k=0\) corresponds to a flat universe described by an Euclidean geometry, a positive \(k>0\) to a closed universe with a spherical geometry, and a negative \(k<0\) to an open universe with a hyperbolic geometry. Within a FRLW metric, it is also possible to relate the scale factor with the redshift z, having:

$$\begin{aligned} a(t)=\frac{1}{1+z} . \end{aligned}$$
(2)

If we define the expansion rate of the universe H(t) as the rate with which the scale factor evolves with time, \(H(t)\equiv \left( \frac{{\dot{a}}}{a}\right) \), we can describe how it evolves with cosmic time t through the Friedmann equations:

$$\begin{aligned} \left( \frac{{\dot{a}}}{a}\right) ^2&= \frac{8\pi G\rho }{3}-\frac{k}{a^2}+\frac{\varLambda }{3} , \end{aligned}$$
(3)
$$\begin{aligned} \frac{\ddot{a}}{a}&= -\frac{4\pi G}{3}(\rho +3p)+\frac{\varLambda }{3} , \end{aligned}$$
(4)

where G is the gravitational constant, \(\rho \) and p are the total energy density and pressure, \(\Lambda \) is the cosmological constant, and the dot indicates a derivative with respect to time. Historically, a critical value of density producing a flat universe has been defined by equating, in the absence of a \(\Lambda \) term, Eq. (3) to zero, obtaining \(\rho _\mathrm{crit}=\frac{3H^2}{8\pi G}\). This quantity has proven to be extremely useful to define adimensional density parameters for the various constituents of the universe as \(\varOmega _i=\frac{\rho _i}{\rho _\mathrm{crit}}\). This allows us to write the total energy density of the universe as the sum of the contribution of various components, namely matter and radiation; analogously, considering the terms on the right-hand side of Eq. (3), we can define an energy density for the curvature \(\varOmega _k\equiv \frac{k}{H^2}\) and for dark energy (in the case of a Cosmological Constant) \(\varOmega _{{\Lambda }}\equiv \frac{\varLambda }{3H^2}\). In this way, we have:

$$\begin{aligned} 1=\sum _i\varOmega _i(z)=\varOmega _{\mathrm{m}}(z)+\varOmega _\mathrm{r}(z)+\varOmega _{k}(z)+\varOmega _{\varLambda }(z) , \end{aligned}$$
(5)

where the density parameters are here defined at any given time, so as a function of redshift z. In this context, it is also useful to define the equation of state (EoS) parameter of a generic component as the value w relating its pressure and density, \(w=p/\rho \). In general, we can express the evolution of the energy density as:

$$\begin{aligned} \rho _{\mathrm{i}}(z)=\rho _{\mathrm{i,0}}\exp \left\{ \int _0^z \frac{3[1+w_i(z')]}{1+z'}dz'\right\} . \end{aligned}$$
(6)

While the EoS could depend on time, we recall here that the different components have different EoS parameters, namely \(w=1/3\) for radiation, \(w=0\) for matter, and \(w=-1\) for the term we referred as to dark energy (in the case it is a Cosmological Constant). If we consider Eq. (6) in the case of a constant \(w_i\), it simplifies to:

$$\begin{aligned} \rho _{\mathrm{i}}(z)=\rho _{\mathrm{i,0}}(1+z)^{3(1+w_i)}. \end{aligned}$$
(7)

Combining Friedmann equations (3) and (4) with Eqs. (5) and (7), it is possible to express the expansion rate of the universe as a function of the evolution with redshift of its main components:

$$\begin{aligned} H(z) = H_0 \left[ \varOmega _{\mathrm{r}}(1+z)^4+\varOmega _{\mathrm{m}}(1+z)^3 + \varOmega _k(1+z)^2 + \varOmega _{\mathrm{de}}(1+z)^{3(1+w)}\right] ^{1/2} , \end{aligned}$$
(8)

where each component evolves with a different power of \((1+z)\) due to the different EoS parameter of each term; here, we implicitly assumed the density parameters defined as constant, referred to as today’s values \(\varOmega _{i,0}\). We will assume this convention throughout the review, unless otherwise specified. In Eq. (8) we also introduced the dark energy density as \(\varOmega _{\mathrm{de}}\), since in this case its EoS parameter is considered having a generic value w. While, in principle, one could take into account also the contribution of radiation \(\varOmega _{\mathrm{r}}\) that scales as \((1+z)^4\), typically this is not considered given the current constraint \(\varOmega _\mathrm{r}\sim 2.47\,10^{-5}h^{-2}\) (Fixsen 2009), and in the following we will neglect its contribution.

So far, we have considered the dark energy as having a constant EoS parameter \(w=-1\); however, to be more generic, we can allow it to vary with cosmic time, as different cosmological model would actually suggest. The most widely used way to parameterize this evolution is the Chevallier, Polanski, Linder (CPL) parameterization (Chevallier and Polarski 2001; Linder 2003), where:

$$\begin{aligned} w(z)=w_0+w_a\left( \frac{z}{1+z}\right) . \end{aligned}$$
(9)

Considering Eqs. (6) and (9), we can therefore generalize Eq. (8) as follows:

$$\begin{aligned} H(z) = H_0 \left[ \varOmega _{\mathrm{m}}(1+z)^3 + \varOmega _k(1+z)^2 + \varOmega _{\mathrm{de}}(1+z)^{3(1+w_0+w_a)}e^{(-3w_a(z/(1+z))}\right] ^{1/2} . \end{aligned}$$
(10)

From this more general formulation where most of the cosmological parameters are let free to vary (which we will refer as to open \(w_{0}w_{a}\)CDM model, o\(w_{0}w_{a}\)CDM), it is possible to derive more specific cases. In the case we fix the curvature of the universe to be flat (\(\varOmega _k=0\)), we will have a flat \(w_{0}w_{a}\)CDM model (f\(w_{0}w_{a}\)CDM); in case we also fix the time evolution of the dark energy EoS to be null (\(w_{a}=0\)), we will have a flat wCDM model (fwCDM); finally, if we assume the dark energy EoS to be constant and equal to \(w=-1\), we will obtain the standard \(\Lambda \)CDM model. In this context, it is also useful to define the normalized Hubble parameter as:

$$\begin{aligned} E(z) = H(z)/H_0 . \end{aligned}$$
(11)

The previously discussed equations describe how the cosmological background evolves. From these, we can introduce several additional quantities that will be extremely relevant in describing astrophysical phenomena, namely distances and times. Following Huterer and Shafer (2018), the comoving distance can be defined as:

$$\begin{aligned} D(z)=\frac{c}{H_{0}\sqrt{|\varOmega _k|}}S{\left[ \sqrt{|\varOmega _k|}\int _0^z\frac{H_{0}dz'}{H(z')}\right] } \quad \mathrm{where} \quad S(x)= {\left\{ \begin{array}{ll} \sinh (x),\ &{} \varOmega _k>0\\ x,\ &{} \varOmega _k=0\\ \sin (x),\ &{} \varOmega _k<0\\ \end{array}\right. } . \end{aligned}$$
(12)

It is interesting to notice that in the case of a standard flat \(\Lambda \)CDM cosmology, this equation can be significantly simplified to:

$$\begin{aligned} D(z)=c\int _0^z\frac{dz'}{H(z')} . \end{aligned}$$
(13)

From this equation, we can define two fundamental quantities in astrophysics, namely the luminosity distance \(D_{\mathrm{L}}(z)\) and the angular diameter distance \(D_{\mathrm{A}}(z)\) as:

$$\begin{aligned} D_{\mathrm{L}}(z)=(1+z)D(z) \qquad ; \qquad D_\mathrm{A}(z)=\frac{1}{(1+z)}D(z) , \end{aligned}$$
(14)

where we have assumed that the Etherington relation holds, and therefore we rely on assumptions such as a metric theory and photon number conservation. Similarly, considering the previous definition of H(t) and considering Eq. (2), we can write:

$$\begin{aligned} H(z)=\frac{{\dot{a}}}{a}=-\frac{1}{(1+z)}\frac{dz}{dt} , \end{aligned}$$
(15)

and by integrating it we obtain the expression of the age of the universe as a function of redshift:

$$\begin{aligned} t(z)=\int _0^z \frac{dz'}{H(z')(1+z)} . \end{aligned}$$
(16)

3 Cosmology with emerging cosmological probes

All the new emerging cosmological probes are presented following a common scheme, introducing at the beginning of each section the basic idea of the method and its main equations, describing how to optimally select each probe, discussing how it can be (and has been) applied, reviewing the current status of the art of the measurements, and providing forecasts on how the method is expected to improve its performance in the near future. A fundamental part is dedicated, in particular, to the presentation of the systematics involved in each probe, discussing how they impact the measurements and possible strategies to handle and mitigate them.

3.1 Cosmic chronometers

The age of the Universe has been an important (derived) cosmological parameter, being closely related to the Hubble constant and the background parameters governing Universe’s expansion history. Determinations of the age of the Universe today from the age of old cosmological objects at \(z\sim 0\) (see, e.g., the reviews by Catelan 2018; Soderblom 2010; Vandenberg et al. 1996 and recent determinations by O’Malley et al. 2017; Valcin et al. 2020, 2021) and of the look-back time at higher redshifts (Dunlop et al. 1996; Spinrad et al. 1997) have been very influential in the establishment of the (now) standard cosmological model.

The age of the Universe or the look-back time, being an integrated quantity of H(z), has some limitations (both in terms of statistical power and in terms of susceptibility to systematics) that the cosmic chronometers approach attempts to overcome.

3.1.1 Basic idea and equations

The accurate determination of the expansion rate of the Universe, or Hubble parameter H(z) has become in recent years one of the main drivers of modern cosmology, since it can provide fundamental information about the energy content and on the main physical mechanisms driving its current acceleration. Its measurement is, however, very challenging, and while many works have focused on the estimate of its local value at \(z=0\) (the Hubble constant \(H_0\), see Sect. 2), we have nowadays few determinations of H(z), and mainly based on few methods (e.g., on the detection of the BAO signal in the clustering of galaxies and quasars, or on the analysis of SN data, see Font-Ribera et al. 2014; Delubac et al. 2015; Alam et al. 2017; Riess et al. 2018; Scolnic et al. 2018; Bautista et al. 2021; Hou et al. 2021; Raichoor et al. 2021; Riess et al. 2021). These measurements, while having their own strengths, rely on the adoption of a cosmological scenario such as assumption of flatness, on early physics assumptions (in the case of BAO) and on calibration of the cosmic distance ladder (in the case of SNe); without these assumptions, these probes yield the determination of the normalized expansion E(z) instead of H(z).

In this context, it is very important to explore alternative ways to determine the Hubble parameter, that can be compared, and eventually combined, with other determinations. The cosmic chronometers method is a novel cosmological probe able to provide a direct and cosmology-independent estimate of the expansion rate of the Universe. The main idea, introduced by Jimenez and Loeb (2002), is based on the fact that in a universe described by a FLRW metric the scale factor a(t) can be directly related with the redshift z as in Eq. (2). With this minimal assumption, it is therefore possible to directly express the Hubble parameter as a function of the differential time evolution of the universe dt in a given redshift interval dz, as provided by Eq. (15):

$$\begin{aligned} H(z)=-\frac{1}{(1+z)}\frac{dz}{dt}. \end{aligned}$$

Here dt/dz can be taken to be the look-back time differential change with redshift. Since redshift is a direct observable, the challenge is to find a reliable estimator for look-back time, or age, over a range of redshifts, i.e. to find cosmic chronometers (CC).

The novelty and added value of this method with respect to other cosmological probes is that it can provide a direct estimate of the Hubble parameter without any cosmological assumption (beyond that of an FLRW metric, see also Koksbang 2021). From this point of view, the strength of this method is its (cosmological) model independence: no assumption is made about the functional form of the expansion history or about spatial geometry; it only assumes homogeneity and isotropy, and a metric theory of gravity. Constrains obtained with this method, therefore, can be used under extremely varied cosmological models.

There are three main ingredients at the basis of the CC method:

  1. 1.

    the definition of a sample of optimal CC tracers. As highlighted in Eq. (15), a sample of objects able to trace, at each redshift, the differential age evolution of the Universe is needed. It is fundamental that this sample of cosmic chronometers is homogeneous as a function of cosmic time (i.e., the chronometers started ticking in a synchronized way independently of the redshift they are observed at), and optimized in order to minimize the contamination due to outliers. The optimal selection process will be described in detail in Sect. 3.1.2.

  2. 2.

    the determination of the differential age dt. The CC method is typically applied on tracers identified through spectroscopic analysis, where the redshift determination is extremely accurate (\(\delta z/(1+z)\lesssim 0.001\), see e.g. Moresco et al. 2012a). As a consequence, as can be seen from Eq. (15), the only remaining unknown is the differential age dt. Different techniques have been explored to obtain robust and reliable differential age estimates for CC, to estimate statistical and systematic uncertainties, and they will be presented in Sect. 3.1.3.

  3. 3.

    the assessment of the systematic effects. As any other cosmological probe, one of the fundamental issues to be assessed is the sensitivity of the method to effects that can systematically bias the measurement. All the various systematic effects will be examined in Sect. 3.1.4.

3.1.2 Sample selection

Cosmic chronometers are objects that should allow us to trace robustly and precisely the differential age evolution of the Universe across a wide range of cosmic times. For this reason, the most useful astrophysical objects are galaxies: with current ground and space-based facilities, these objects can be observed with reasonably high signal-to-noise over a wide area and range of redshifts. Two different approaches have been explored.

Imagine to select, in a given redshift range, a complete sample of galaxies, independently of their properties, and estimate their age as to homogeneously populate the age(z) plane. With enough statistics, it becomes possible to estimate the upper envelope (also called red envelope) of the age(z) distribution. Under the assumption that all galaxies formed at the same time independently of the observed redshift (which relies on the Copernican principle) and that the sample is complete, the envelope can be used to measure the differential age of the Universe. The advantage of this kind of approach is that the selection of the sample is very straightforward, at the cost of being significantly demanding, since to determine robustly the “edge” of the distribution and its associated error, very high statistics are needed in order not to be biased by random fluctuations in the determination of the ages of the population (e.g., see Jimenez and Loeb 2002; Jimenez et al. 2003; Simon et al. 2005; Moresco et al. 2012a, where over 11000 massive and passive galaxies have been selected to apply this method).

A more practical solution, therefore, is to (pre-)select an homogeneous population representing at each redshift the oldest objects in the Universe. The best cosmic chronometers that have been identified are extremely massive (\(\log (M/M_{\odot })>\)10.5–11) and passively evolving galaxies (sometime also inappropriately referred as early-type galaxies). These objects represent the most extreme tails in the mass function (MF) and luminosity function (LF), from the local Universe (Baldry et al. 2004, 2006, 2008; Peng et al. 2010) up to high redshift (Pozzetti et al. 2010; Ilbert et al. 2013; Zucca et al. 2009; Davidzon et al. 2017). Many recent studies (e.g., Daddi et al. 2004; Fontana et al. 2006; Ilbert et al. 2006; Wiklind et al. 2008; Caputi et al. 2012; Castro-Rodríguez and López-Corredoira 2012; Muzzin et al. 2013; Stefanon et al. 2013; Nayyeri et al. 2014; Straatman et al. 2014; Wang et al. 2016; Mawatari et al. 2016; Deshmukh et al. 2018; Merlin et al. 2018, 2019; Girelli et al. 2019) have identified a population of massive quiescent galaxies at high redshift (\(z \gtrsim 2.5\)). There is a large literature supporting the scenario in which these systems have built up their mass very rapidly (\(\varDelta t<0.3\) Gyr, Thomas et al. 2010; McDermid et al. 2015; Citro et al. 2017; Carnall et al. 2018) and at high redshifts (\(z>2-3\), Daddi et al. 2005; Choi et al. 2014; McDermid et al. 2015; Pacifici et al. 2016; Carnall et al. 2018; Estrada-Carpenter et al. 2019; Carnall et al. 2019), having quickly exhausted their gas reservoir and being then evolving passively. For this reason, such objects constitute a very homogeneous population also in terms of metal content, having been found to have a solar to slightly oversolar metallicity from \(z\sim 0\) up to \(z\sim 2\) (Gallazzi et al. 2005; Onodera et al. 2012; Gallazzi et al. 2014; Conroy et al. 2014; Onodera et al. 2015; McDermid et al. 2015; Citro et al. 2016; Comparat et al. 2017; Saracco et al. 2019; Morishita et al. 2019; Estrada-Carpenter et al. 2019; Kriek et al. 2019). The mere existence of a population of passive and massive galaxies already at \(z\sim 2\) further supports this scenario (Franx et al. 2003; Cimatti et al. 2004; Onodera et al. 2015; Kriek et al. 2019; Belli et al. 2019). A clear pattern has also been found strictly connecting the mass, the star formation history (SFH), and the redshift of formation of these galaxies; within this scenario, referred to to as mass downsizing, more massive galaxies are found to have been formed earlier, to have experienced a more intense, even if short, episode of star formation, and to have a very homogeneous SFH (Heavens et al. 2004; Cimatti et al. 2004; Thomas et al. 2010). To summarize, these galaxies represent a population where the age difference dt between two suitable separated (and suitably narrow) redshift bins is significantly larger than their internal time-scale evolution, making them optimal chronometers. For a more detailed review on massive and passive galaxies, we refer to Renzini (2006).

Many different prescriptions have been suggested in the literature to select passive galaxies, based on rest-frame colors (Williams et al. 2009; Ilbert et al. 2010, 2013; Arnouts et al. 2013), the shape of the spectral energy distribution (SED) (Zucca et al. 2009; Ilbert et al. 2010), star formation rate (SFR) or specific SFR (sSFR) (see, e.g., Ilbert et al. 2010, 2013; Pozzetti et al. 2010), presence or absence of emission lines (see, e.g., Mignoli et al. 2009; Wang et al. 2018), and even morphology. The important question in this context is whether these different selection criteria are all equivalent to select CC. The short answer is no. In several papers (Franzetti et al. 2007; Moresco et al. 2013; Belli et al. 2017; Schreiber et al. 2018; Fang et al. 2018; Merlin et al. 2018; Leja et al. 2019; Díaz-García et al. 2019) it has been found that a simple criterion is not able per-se to select a pure sample of passively evolving galaxies, and that, depending on the criterion, a conspicuous number of contaminants might remain. This is clearly shown in the left panel of Fig. 1, reproduced from Moresco et al. (2013). The reference and the figure highlight how passive galaxies selected with several different criteria still shows evidence of emission lines, with a residual contamination by blue/star-forming objects that, depending on the criterion, can be as high as 30–50%. In the same work, as also reported by the figure, it was also shown that a cut in stellar mass is helpful to increase the purity of the sample, and that, at fixed criterion, the contamination is significantly smaller at high masses (decreasing by a factor 2–3 from \(\log (M/M_{\odot })<10.25\) to \(\log (M/M_{\odot })>10.75\)).

Fig. 1
figure 1

Images reproduced with permission from [left] Moresco et al. (2013), copyrighty by ESO; and [right] from Borghi et al. (2022b), copyright by the authors

Impact of selection criteria on the purity of CC samples. Left panel: stacked spectra of differently selected samples of passive galaxies from the zCOSMOS survey in two different mass bins (\(\log (M/M_{\odot })<10.25\) and \(\log (M/M_{\odot })>10.75\)), showing how, in many selection criteria, the contamination by significant emission lines is still clearly evident, especially in the low mass bin. Note that in the high mass bin emission lines are not visible indicating much reduced contamination of the sample. Right panel: NUVrJ diagram for galaxies from the LEGA-C survey. The points have been colored by their H/K ratio, where the dashed line shows the division between passive and star-forming objects (Ilbert et al. 2013) (the shaded region identifies the green valley Davidzon et al. 2017), and the points highlighted in black the selected CC.

Both in Moresco et al. (2013) and in Borghi et al. (2022b) it has been demonstrated that, in order to maximize the purity of the sample and to select the best possible sample of CC, different criteria should be combined (photometric, spectroscopic, stellar mass/velocity dispersion cut, potentially morphological). In Moresco et al. (2018), a detailed selection workflow has been proposed, which can be summarized in the following three criteria:

  1. (i)

    a photometric criterion to select the reddest objects, based on the available photometric data. Among the best ones there is the one based on the NUVrJ diagram (Ilbert et al. 2013), but other alternatives are the UVJ diagram (Williams et al. 2009), or the NUVrK (Arnouts et al. 2013), or also selections based on full SED modeling (e.g., see Ilbert et al. 2009; Zucca et al. 2009). It is important to underline, however, that having information about the UV flux is proven to be very important to discard the contamination by a young (0.1–1 Gyr) population, and that the NUVrJ diagram has been demonstrated to be the most robust one to distinguish star-forming and passive populations.

  2. (ii)

    a spectroscopic criterion, in order to check that no residual emission lines, that might trace the presence of on-going star formation, are present in the spectrum. Depending on the redshift and on the wavelength coverage of the data, the most important emission lines to be checked are [OII]\(\lambda \)3727, H\(\beta \) (\(\lambda =4861\)Å), [OIII]\(\lambda \)5007, and H\(\alpha \) (\(\lambda =6563\)Å), and different kind of cuts can be adopted, based on the equivalent width (EW) of the line (e.g., EW<5ÅMignoli et al. 2009; Moresco et al. 2012a; Borghi et al. 2022b), on its signal-to-noise ratio (S/N, e.g., Moresco et al. 2016b; Wang et al. 2018), or a combination of these. In general, it is important that the selected spectra do not show any sign of emission lines (as an example, see Fig. 3).

  3. (iii)

    a cut in stellar mass, or, equivalently, in stellar velocity dispersion \(\sigma _{\star }\). As discussed above, the more massive a system is, the oldest, more coeval, and less contaminated it is. Therefore, typically a cut around \(\log (M/M_{\odot })>\)10.6–11 is adopted.

Fig. 2
figure 2

Selection workflow for CC (adapted from Moresco et al. 2018). The rounded boxes show the selected samples at the different steps, from the parent sample to the final CC sample, while the blue diamond boxes represent the incremental selection criteria adopted, with a green (red) arrow indicating when the criterion is met (or not) and the galaxy included (or excluded) from the sample. Each criterion is fundamental to maximize the purity of the sample, removing star-forming contaminants at different typical young ages (see Moresco et al. 2018)

Any other less stringent selection criteria will yield a sample with a residual degree of contamination by star-forming objects, which we will address in Sect. 3.1.4. It is interesting to notice that recently some alternative estimator has been suggested that can help to track the purity of the sample. In Moresco et al. (2018), the ratio between the CaII H (\(\lambda =3969\)Å) & K (\(\lambda 3934\)Å) lines has been introduced as a novel way to trace the degree of contamination by a star-forming component. The reason is that, while for a passive population typically the ratio H/K is larger than one (being the K line deeper than the H line), the presence of a young component affects this quantity, being characterized by non-negligible Balmer line absorptions, and in particular by the presence of H\(\epsilon \) (\(\lambda =3970\)Å) that get summed with the CaII H line, inverting the ratio. This new diagnostic has been demonstrated to be extremely powerful, since it correlates extremely well with almost all other indicators of ongoing star formation, as shown in the right panel of Fig. 1 (NUV and optical colors, SFR, emission lines, see Borghi et al. 2022b), and can be an useful independent indicator of the presence of a residual ongoing star formation. The workflow for the selection criteria is summarized in Fig. 2.

3.1.3 Measurements

Measuring the age of a stellar population presents several challenges. One of the main issues is the existence of degeneracies between the physical parameters, so that the spectral energy distribution (SED) of a galaxy can be approximately reproduced with quite different combinations of age and other parameters. The most well-known one is the age-metallicity degeneracy (Worthey 1994; Ferreras et al. 1999), and it is connected to the fact that both an older age and an higher metallicity produce a reddening of galaxies spectra; in particular, it has been found from synthetic stellar population models that the optical colors of early-type galaxies obtained by changing their ages and metallicities while keeping the ratio \(\varDelta {\mathrm{age}}/\varDelta [Z/H]\sim 3/2\) are almost the same. The degeneracy between the age of a galaxy and its star formation history (SFH) (Gavazzi et al. 2002) or the dust content should also be mentioned (even though we note that the second one is typically negligible at most for accurately selected passive galaxies, due to their low contamination by dust, see Pozzetti and Mannucci 2000). Therefore, while age estimates for galaxies obtained from multi-band SED-fitting are quite common in the literature, they are not suitable for this purpose.

With the advent of high-resolution spectroscopy over a wide wavelength range and for large galaxy samples, and more accurate stellar model and fitting methods, it has become possible to lift these degeneracies and estimate the ages of stellar population of galaxies much more accurately and precisely. Moreover the main strength of the CC method is that it is a differential approach, where the quantity to be measured is the differential age dt, and not the absolute age t. The advantage is that any systematic effect that might be introduced by any method in the estimate of t is significantly minimized in the measurement of dt; any systematic offset in the absolute age estimation will not impact the determination of dt. This is confirmed also by independent analysis (e.g., see Marín-Franch et al. 2009), demonstrating that the accuracy reached in the determination of relative ages is much higher than the one on absolute ages.

Different methods have been proposed in the literature to obtain a robust estimate of dt from galaxy spectra. These can be roughly classified in two “philosophies”: using the full spectral information versus selecting only specific features sensitive to the age and well localized in wavelength. Using the full spectral information extracts the maximal amount of information possible (minimizes statistical errors) but is more sensitive to systematics, i.e., other physical process than age that leave their imprint on the spectrum, and exhibit some dependence of the age estimate on evolutionary stellar population synthesis models. Using localized features attempts to mitigate that, at the expense of possibly larger statistical errors. To keep systematic errors well below the statistical ones, the preferred methodology might change depending on the statistical power of the datasets available. With very large, high statistics datasets becoming available, the focus has shifted from full spectral fitting to using only specific features.

The main methods to measure dt from galaxy spectra can be summarized as follows.

3.1.3.1 Full-spectrum fitting

The most straightforward approach is to take advantage of the full spectroscopic information available by fitting the entire spectrum with theoretical models. Different components, obtained from stellar population synthesis models, are typically combined with a mixture of different physical properties (age, metal content, mass), and properly weighted to reproduce the observed spectrum in a given wavelength window (usually within the optical range). The strength of this approach is therefore to be able to reconstruct, together with the age and metallicity of the population, also its star formation history, either in a parametric or non-parametric way. Currently, several codes have been developed and are publicly available to perform a full spectrum fitting, differing slightly for the model implemented, how the SFH is reconstructed, and on the statistical methods. The first such method that started the field is the MOPED algorithm (Heavens et al. 2000, 2004); after that, amongst the most used we can find STARLIGHT (Cid Fernandes et al. 2005), VESPA (Tojeiro et al. 2007), ULySS (Koleva et al. 2009), BEAGLE (Chevallard and Charlot 2016), FIREFLY (Wilkinson et al. 2017), pPXF (Cappellari 2017), and BAGPIPES (Carnall et al. 2018). In Fig. 3 we show as an example the typical spectrum of a passively evolving population obtained by stacking roughly 100000 spectra extracted from the Sloan Digital Sky Survey Data Relase 12 (SDSS-DR12). The figure also highlights the locations of relevant spectral features.

3.1.3.2 Absorption features (Lick indices) analysis

Another approach is to analyze, instead of the full spectrum, only some specific regions characterized by well understood absorption features, also known as Lick indices. These indices, originally introduced by Worthey (1994); Worthey and Ottaviani (1997) are characterized by a strength that can be directly linked to a variation of the property of the stellar population; some indices are more useful to trace to the age of the population (typically the Balmer lines), others the stellar metallicity (typically Fe lines), and others the alpha-enhancement (e.g. Mg lines). Also in this case, public codes exist to measure Lick indices (see, e.g., indexfCardiel 2010 and pyLick Borghi et al. 2022b). The specific dependence of each index (shown in Fig. 3) on physical properties has been at first assessed in Worthey (1994). A significant step forward in their use to quantitatively determine the age of a stellar population has been done by Thomas et al. (2011); this consists in constructing stellar population models specifically suited for modeling Lick indices, including a variable element abundance ratio, that can be compared with the data (e.g., with a Bayesian approach). This step is fundamental since it overcomes the limitation of the full-spectrum fitting, allowing also the possibility to determine, together with the age and metallicity, also the alpha-enhancement of a stellar population. It is worth noting that more recently other models with variable element ratios that could be used for this purpose have also been proposed by Conroy and van Dokkum (2012) and Vazdekis et al. (2015).

Fig. 3
figure 3

Stacked spectrum of \(\sim \)100,000 massive and passive CC selected from SDSS DR12. It is possible to see clearly how it is characterized by a red continuum, several absorption lines (identified by the black boxes), and by the absence of significant emission lines (whose position is highlighted by the red boxes)

3.1.3.3 Calibration of specific spectroscopic features

Finally, one of the more commonly adopted approach in the CC works, is to focus on a single spectroscopic feature found to have a tight correlation with the age of the population. This approach was introduced by Moresco et al. (2012a), who proposed to use the break in the spectrum at 4000 Å rest-frame (D4000, one of the main characteristic of the spectrum of a passive galaxy, as also shown in Fig. 3). The D4000 has been demonstrated to correlate extremely well with the stellar age (at fixed metallicity). Moreover it has been shown that the dependence of D4000 on the two quantities (age and metallicity Z) can be described by a simple (piece-wise) linear relation in the range of interest for the analysis:

$$\begin{aligned} D4000=A(Z, SFH)\times \mathrm{age}+B , \end{aligned}$$
(17)

where B is a constant and A(ZSFH) is a parameter, which for a broad age range depends only on the metallicity Z and on the SFH, and can be calibrated on stellar population synthesis (SPS) models. By differentiating Eq. (17), it is possible to derive the relation between the differential age evolution of the population dt and the differential evolution of the feature, dD4000, in the form \(dD4000=A(Z, SFH)\times \mathrm{dt}\). This allows us to rewrite Eq. (15) as:

$$\begin{aligned} H(z)=-\frac{A(Z, SFH)}{1+z}\frac{dz}{dD4000} \end{aligned}$$
(18)

with the advantage of having decoupled statistical (all included in the observationally measurable term dz/dD4000) from systematic effects (captured by the coefficient A(ZSFH)). We note here that different definitions have been proposed in the literature to measure the D4000, which is the ratio between the average flux \(F(\nu )\) in two windows adjacent to 4000 Å rest-frame, one assuming wider bands (\(D4000_w\), [3750–3950] Å and [4050–4250] Å, Bruzual A. 1983) and one with narrower ones (\(D4000_n\), [3850–3950] Å and [4000–4100] Å, Balogh et al. 1999); in the following, we will consider \(D4000_n\), since it has been shown that it has been demonstrated to have a significant smaller dependence on potential reddening effects (Balogh et al. 1999).

Fig. 4
figure 4

Application of the CC method. In the left panel is shown an example of averaged \(D4000-z\) (with uncertainties smaller than the symbol size, thus differences can be robustly computed) relations for a CC sample extracted from SDSS-DR12, in different velocity dispersion bins as shown in the label. Each point has been estimated from a stacked spectrum of \(\sim 1000\) objects, and its uncertainty is the error on D4000 measured from the stacked spectrum. It is clearly evident a downsizing pattern, for which more massive (with higher \(\sigma \)) galaxies have also larger D4000 values, corresponding to higher ages; it also shows the expected decrease of D4000 with redshift. The brackets show for an illustrative couple of points the calculation of dD4000 and dz. The right panel shows theoretical D4000-age relations obtained with SPS models by Maraston and Strömbäck (2011) used to calibrate Eq. (18). Lines from the upper to the lower ones show different stellar metallicities, from twice as solar, to solar and half solar. Different lines present, at fixed metallicities, different SFH, namely with \(\tau =[0.05,0.1,0.2,0.3]\) Gyr (from left to right). The colored lines show, for one SFH for each metallicity, the best fit obtained with a piece-wise linear relation. The arrows indicate how different parameters affect the D4000-age relations, being important to keep in mind that the calibration parameter A(ZSFH) is the slope of the relation

To apply the improved CC method as described by Eq. (18), it is therefore necessary to measure the following quantities:

  1. 1.

    the differential \(\varDelta D4000\) of a sample of CC over a redshift interval \(\varDelta z\). Since this process involves the estimate of a derivative, to increase its accuracy and minimize the noise due to statistical fluctuation of the signal, it should be done both averaging the D4000 of galaxies in redshift slices and then estimating dD4000, or stacking multiple spectra of CC to increase the spectral S/N, and measuring the D4000 on the stacked spectra, as shown in Fig. 4. Equation (18) disentangles observational errors from the systematic errors associated with the interpretation (such as dependence on the SSP model, degeneracies with metallicity, etc.). The D4000 is a purely observational quantity, and thus, barring observational systematics such as wavelength calibration or instrument response, its measurement is affected only by statistical uncertainty which can be reduced by increasing the number of objects with spectra and/or increasing the S/N per spectrum.

  2. 2.

    the metallicity Z and SFH of the selected sample. As a result of the strict selection criteria (see Sect. 3.1.2), selected galaxies are characterized by a SFH with a very small duration: \(\tau <0.5\) Gyr (in many cases \(<0.2\) Gyr) when parameterized with an exponentially declining SFH with \(\tau \) the formation time scale (in Gyr). Nevertheless, SFH should be taken into account and correctly propagated in the measurement, since despite the selection, describing those system as single stellar population (SSP) would be over-simplistic. The method to estimate the SFH are mostly based on SED-fitting or on full-spectrum fitting, or on a combination of those (see, e.g, Tojeiro et al. 2007; Chevallard and Charlot 2016; Citro et al. 2016; Carnall et al. 2018, 2019). Despite the fact that by construction the CC population is very homogeneous also in metal content, and it has been observed to have a solar to slightly over-solar metallicity over a very wide range of cosmic times (see Sect. 3.1.2), the stellar metallicity Z need to be determined too. Also in this case, different approaches are viable, from considering a data-driven prior on it (Moresco et al. 2012a), estimating it with full-spectrum fitting considering different codes and models (Moresco et al. 2016b), or measuring it from Lick index analysis (Gallazzi et al. 2005; Borghi et al. 2022b).

  3. 3.

    the calibration parameter A(Z/SFH) to connect variations in D4000 to variations in the age of the stellar population, assuming different SPS models. This involves generating several D4000-age relations exploring different metallicities and SFH, and adopting several different SPS models.

    As already discussed, these relation can be well approximated to be linear (or, better, piece-wise linear, as shown in Fig. 4), whose slopes are the parameter A(ZSFH) in the regime of interest. At fixed metallicity and in a given D4000 regime, it is then possible to estimate the spread in the slopes obtained by varying the SFH within the observed ranges, and use this as associated uncertainty to the calibration parameter, i.e. \(A(Z,SFH)=A(Z)\pm \sigma _A(SFH)\). These measurements, available from models at given metallicities (e.g. \(Z/Z_{\odot }=0.5,1,2\) for the example in Fig. 4), can be afterwards interpolated, to obtain a value with its error for any given metallicity. The correct calibration parameter for each point will be therefore estimated from the measured (or assumed) metallicity, together with its error, for a global \(A\pm \sigma _A\) that takes into account both the uncertainty on SFH and on metallicity. We will explore the impact of the SPS model choice on the systematic error budget in Sect. 3.1.4.

All these quantities will be combined in Eq. (18) to obtain an estimate of the Hubble parameter H(z) and of its uncertainty.

A final, yet important point to keep in mind is that, in order to be cosmology-independent, the CC approach must rely on age estimates that do not assume any cosmological prior. This is a very important point, since in many (if not in most) analyses, a cosmologically-motivated upper prior on age is adopted in order to break or minimize the previously discussed degeneracies. Of course, for the CC method to be used as a test for cosmology, it is of paramount importance to obtain a robust age estimate without introducing any (prior) dependence on a cosmological model, in order to avoid circularity and, basically, retrieve the cosmological model used as a prior.

3.1.4 Systematic effects

In this section, we give an overview of the possible systematic effects that can affect the CC method, discussing approaches to minimize them and propagate them to the total covariance matrix. We begin by discussing effects and assumptions that have a direct impact on the uncertainty on H(z), and conclude presenting additional possible issues that might impact on the measurement, but that turn out to be negligible.

The main systematic effects can be divided into four components, and are summarized below. Each one of those will provide a contribution in the total systematic covariance matrix \({\mathrm{Cov}}_{ij}^\mathrm{sys}\).

3.1.4.1 Error in the CC metallicity estimate

\({\mathrm{Cov}}_{ij}^{\mathrm{met}}\). The metallicity estimate enters in Eq. (18) by changing the calibration parameter A(ZSFH). An error in its value, therefore, directly affects the H(z) measurement and its associated error budget. In Moresco et al. (2020), this issue has been addresses by performing a Monte Carlo simulation of SSP-generated galaxy spectra considering a variety of SPS models, with metallicities spanning different ranges (±10%,5%,1%) and estimating the Hubble parameter. In this way, it was estimated that the error induced on H(z) scales almost linearly with the uncertainty on the stellar metallicity, which is corroborated observationally by the analysis in Moresco et al. (2016b), where a 10% error on the metallicity was found to correspond to a 10% error on the Hubble parameter. Hence, the uncertainty on stellar metallicity (if known and quantified correctly) can be quantitatively propagated to an error on H(z) following the procedure highlighted in Sect. 3.1.3. This contribution does not introduce off-diagonal terms in the covariance matrix because it depends on the stellar metallicity of each spectra (be of an individual object or a co-add) and does not correlate different spectra.

3.1.4.2 Error in the CC SFH

\({\mathrm{Cov}}_{ij}^{\mathrm{SFH}}\). Even if CC have SFH characterized by very short timescales, assuming that the entire SFH is concentrated in a single burst (SSP) introduce a systematic error which must be accounted for as described in Sect. 3.1.3. This is typically a systematic contribution of the order of 2–3%; as an example, in Moresco et al. (2012a), where the estimated uncertainty on the SFH timescale was \(0<\tau <0.3\) Gyr, the contribution to the final error on H(z) was of \(\sim \)2.5%. Also this contribution to the covariance matrix is taken to be purely diagonal.

3.1.4.3 Assumption of SPS model

\({\mathrm{Cov}}_{ij}^\mathrm{model}\). The major source of systematic uncertainty in the CC method, independently of the process adopted to estimate dt, is the assumption of the SPS model. This is also by definition a term that introduces non-diagonal elements in the total covariance matrix, as the errors are highly correlated across different spectra. The estimation of its impact on the H(z) error was assess in Moresco et al. (2020). In this work, a wide combination of models was studied, including a variety of SPS models (BC03 and BC16 Bruzual and Charlot 2003, M11 Maraston and Strömbäck 2011, FSPS Conroy et al. 2009; Conroy and Gunn 2010, and E-MILES Vazdekis et al. 2016), initial mass functions (IMF, including Salpeter 1955, Kroupa 2001 and Chabrier 2003), and stellar libraries (STELIB Le Borgne et al. 2003 and MILES Sánchez-Blázquez et al. 2006). These models have then been used with a MC approach by simulating a measurement assuming a model and measuring the Hubble parameter with all the other ones, estimating in this way the contribution to the total covariance matrix due to the assumption of a specific SPS model, IMF and stellar library. It was demonstrated that the error introduced on H(z) is, on average, smaller than 0.4% for the IMF contribution, and of the order of 4.5% for the SPS model contribution.

The component due to stellar library is slightly higher, however this estimate is overly-conservative as the effect is driven by the inclusion of a stellar library model that has now been superseded. More importantly, it has been found that this uncertainty is also redshift dependent, and an explicit estimate for each component is provided as a function of z.

3.1.4.4 Rejuvenation effect \({\mathrm{Cov}}_{ij}^{\mathrm{young}}\)

Another possible bias to take into account is if the CCs selected present a residual contamination by a young component. We can divide this systematic effect into two cases. On the one side, we can have a part of the selected CC population composed by star-forming or intermediate systems; this event should be avoided, or maximally mitigated, by the accurate and combined selection process described in Sect. 3.1.2. On the other side, despite the accurate selection we could have that the population of a single CC, even if dominated by an old component, still have a minor contribution by a young underlying component of stars. This effect can bias the H(z) determination because it influences the overall shape of the spectrum due to the bluer color of younger stars, causing the measurement of younger ages and hence a biased dt. This issue has been studied in detail in Moresco et al. (2018), where several indicators have been explored and proposed to trace the eventual presence of o residual young sub-population, from the UV flux (Kennicutt 1998) to the presence of emission lines (see, e.g., Magris C. et al. 2003) or of strong absorption higher-order Balmer lines (like H\(\delta \) Le Borgne et al. 2006). In particular, by studying theoretical SPS models, the previously discussed CaII H/K indicator was proposed to quantitatively trace the percentage level of contamination, taking advantage of the fact that the H\(\epsilon \) line, characteristic of a young stellar component, directly affects the CaII H line, and therefore the ratio. It was then assessed, given a certain degree of contamination, how much the D4000 would be decreased, and, therefore, how much the estimate of H(z) is impacted, giving in this was a direct recipe between the measured CaII H/K (or upper limit due to non-detection) and an additional error on the Hubble parameter. A contamination by a star-forming young component of 10% (1%) of the total light was found to propagate to an H(z) error of 5% (0.5%); in particular, for the CC samples analyzed so far (Moresco et al. 2012a; Moresco 2015; Moresco et al. 2016b; Borghi et al. 2022b), it has been found this contamination to be below the detectable threshold, with an eventual additional error on \(H(z)<\)0.5%. In case of a lack of detection and given the stringent upper limit on a possible residual contamination this contribution to the covariance is also taken to be diagonal.

Following Moresco et al. (2020), the total covariance matrix for CC is defined as the combination of the statistical and systematic part as:

$$\begin{aligned} \mathrm{Cov}_{ij}= \mathrm{Cov}_{ij}^{\mathrm{stat}}+ \mathrm{Cov}_{ij}^{\mathrm{syst}}, \end{aligned}$$
(19)

where \(\mathrm{Cov}_{ij}^{\mathrm{syst}}\), for simplicity and transparency, is decomposed the several contributions discussed above:

$$\begin{aligned} \mathrm{Cov}_{ij}^{\mathrm{syst}}= \mathrm{Cov}_{ij}^{\mathrm{met}}+ \mathrm{Cov}_{ij}^{\mathrm{young}}+ \mathrm{Cov}_{ij}^{\mathrm{model}} , \end{aligned}$$
(20)

where the latest component can be further decomposed in:

$$\begin{aligned} \mathrm{Cov}_{ij}^{\mathrm{model}}=\mathrm{Cov}_{ij}^{\mathrm{SFH}}+\mathrm{Cov}_{ij}^{\mathrm{IMF}}+\mathrm{Cov}_{ij}^{\mathrm{st. lib.}}+\mathrm{Cov}_{ij}^{\mathrm{SPS}}. \end{aligned}$$
(21)

As discussed above, \(\mathrm{Cov}_{ij}^{\mathrm{met}}\), \(\mathrm{Cov}_{ij}^\mathrm{SFH}\) and \(\mathrm{Cov}_{ij}^{\mathrm{young}}\) are purely diagonal terms, since they are related to the estimate of physical property of a galaxy (the stellar metallicity, and the eventual contamination by a younger subdominant population) uncorrelated for objects at different redshifts. \(\mathrm{Cov}_{ij}^{\mathrm{model}}\), instead, has been conservatively estimated as the contribution from different redshifts are fully correlated. In the published analyses of currently available datasets, the contributions \(\mathrm{Cov}_{ij}^\mathrm{met}\), \(\mathrm{Cov}_{ij}^{\mathrm{SFH}}\) and \(\mathrm{Cov}_{ij}^{\mathrm{young}}\) are already included in the errors provided (and discussed later in Sect. 3.1.5 and Table 1); the other terms have instead to be included following these recipes.Footnote 3

Other effects, which have been demonstrated to o have a negligible impact on the measurement, but which should be mentioned are the following:

  • progenitor bias. A common observational effect that can introduce biases in the analysis of early-type galaxies is the so called progenitor bias (Franx and van Dokkum 1996; van Dokkum et al. 2000): a given selection criterion might be effectively more stringent when applied at high redshift than at low redshift. In particular, high redshift objects that pass the sample selection might be older and more massive than those selected at low redshift, effectively representing the progenitor population of the low redshift sample. This bias becomes increasingly relevant when comparing objects spanning a wide range of redshifts, and, if not properly taken into account, could significantly affect the CC approach, since by definition it flattens the \(age-z\) relations, changing its slope and hence producing a biased H(z). The differential approach at the basis of CC by definition acts to minimize this effect, since in all cases galaxies being compared span a very small range of redshifts. A quantitative estimate of its impact on the CC approach has been done in Moresco et al. (2012a) with two different methods. On the one side, the analysis has been repeated considering only the upper envelope of the \(age-z\) distribution, that, by definition, could not be biased by the progenitor bias effect. The resulting H(z) obtained is in perfect agreement with the baseline analysis, even if with larger error-bars due to the lower statistics on which the upper envelope approach is based on (see Sect. 3.1.2). On the other side, the expected change in slope of the \(age-z\) relation, assuming a very conservative change in formation times for the CCs considered, has also been estimated. In this conservative estimate, it was found that the error induced on the estimated H(z) is \(\sim \)1% on average, which is negligible considering the rest of the error-budget.

  • mass-dependence. A final effect to be further explored is if the results have some mass-dependent bias. This effect has been explored thoroughly in many analyses (Moresco et al. 2012a, 2016b; Borghi et al. 2022a), and in all cases the H(z) measured in different mass (or velocity dispersion) bins have been found to be mutually consistent, and with no systematic trends. This is in agreement with the expectation since CC are selected to be already very massive galaxies (\(\log (M/M_{\odot })\gtrsim 11\)), comprising very homogeneous systems, as discussed in Sect. 3.1.2.

3.1.5 Main results

The first measurement with the CC method dates back to Simon et al. (2005), where they analyzed a sample of passively evolving galaxies from the luminous red galaxy (LRG) sample from SDSS early data release combined with higher redshift data from GDDS survey and archival data. The ages of these objects have been estimated with a full-spectrum fitting using SPEED models (Jimenez et al. 2004) estimating the age of the oldest components marginalizing over metallicity and SFH. Applying then the CC approach, 8 H(z) measurements were obtained in the range \(0<z<1.75\).Footnote 4

Fig. 5
figure 5

Hubble parameter measurements obtained with the CC method. Different colors refer to different methods adopted to estimate dt, as presented in Table 1. The dashed line shows the flat \(\Lambda \)CDM cosmological model from Planck Collaboration et al. (2020) as a pure illustrative reference

Similarly, also Zhang et al. (2014) and Ratsimbazafy et al. (2017) determined new values of the Hubble parameter measuring dt with a full-spectrum fitting technique. They studied a sample of \(\sim \)17,000 LRGs from SDSS Data Release Seven (DR7) and of \(\sim \)13,000 LRGs from 2dF-SDSS LRG and QSO catalog, respectively, both extracting differential age information for their sample using the UlySS code and BC03 models, obtaining four additional estimates of H(z) at \(z<0.3\) and one at \(z\sim 0.47\), respectively.

The results by Moresco et al. (2012a), Moresco (2015), and Moresco et al. (2016b) are instead based on the analysis of the D4000 feature described in Sect. 3.1.3. The first paper examined a compilation of very massive and passively evolving galaxies extracted from SDSS Data Release 6 Main Galaxy Sample and Data Release 7 LRG sample and from a combination of spectroscopic surveys at higher redshifts (zCOSMOS, K20, UDS), comprising in total \(\sim \)11,000 galaxies in the range \(0.15<z<1.3\). The second paper analyzed a significantly smaller sample (29 objects) of massive and passive galaxies available in the literature at very high redshifts \(z>1.4\). Finally, in the last paper considers the SDSS BOSS Data Release 9, selecting a sample of more than 130000 CC in the range \(0.3<z<0.55\). In total, 15 additional H(z) estimates are presented in the range \(0.18<z<2\).

Most recently, in Borghi et al. (2022b) a new approach was explored, using a Lick-indices-based analysis applied on CC extracted from the LEGA-C survey to derive information of the physical properties (age, metallicity and \(\alpha \)-enhancement) of the population, and in Borghi et al. (2022a) the resulting dt measurements were used to obtain a new estimate of the Hubble parameter.

The current, most updated compilation of H(z) measurements obtained with CC is shown in Fig. 5, and provided in Table 1. All these measurements have been obtained assuming a SPS model (BC03, Bruzual and Charlot 2003), except from the measurements from Moresco et al. (2012a), Moresco (2015), and Moresco et al. (2016b), that are available also with a different set of SPS models (M11, Maraston and Strömbäck 2011). Since, as discussed above, one of the main source of systematic uncertainties is the SPS model assumed, for a coherent analysis the systematic off-diagonal component to the covariance has to be added following the recommendations of Sect. 3.1.4, and with the recipes presented in Moresco et al. (2020).

These data have been widely used in the literature in a variety of applications, which we proceed to present below.

Table 1 H(z) measurements (in units of [\(\mathrm {km\, s^{-1}\ Mpc^{-1}}\)]) obtained with the CC method and their associated errors
3.1.5.1 Independent estimates of the Hubble constant \(H_0\)

In the framework of the well-established tension between early- and late-Universe-based determinations of the Hubble constant (Verde et al. 2019; Di Valentino et al. 2021), obtaining independent estimates of \(H_0\) is of great importance as it can provide additional information to test or constrain the underlying cosmological models. By providing cosmology-independent estimates of H(z), whose calibration does not depend on early-time physics or on the traditional cosmic distance ladder, CCs are of value and, by extrapolating H(z) to \(z=0\), could inform the current debate over the Hubble tension.

This analysis can be done either by directly fitting CC data with a cosmological model (Moresco et al. 2011, 2012b, 2016a), or to take full advantage of the cosmology-independent approach, employing extrapolation techniques that do not rely on cosmological models, such as Gaussian Processes or Pade’ approximation (Verde et al. 2014; Montiel et al. 2014; Haridasu et al. 2018; Gómez-Valent and Amendola 2018; Capozziello and Ruchika 2019; Sun et al. 2021b; Bonilla et al. 2021; Colgáin and Sheikh-Jabbari 2021), or also based on alternative diagnostics (e.g., see Sapone et al. 2014; Krishnan et al. 2021). For currently published analyses using CC alone, the size of the error-bars on \(H_0\) including systematic uncertainties is still too large to weigh in on the tension.

3.1.5.2 Comparison with independent probes

With respect to other probes, one of the strengths of CC method is that it is a direct probe of the Hubble parameter H(z), instead of one of its integrals (see, e.g., Eqs. (14)). As a consequence, as highlighted in Jimenez and Loeb (2002), it is more sensitive to cosmological parameters which affect the evolution of the expansion history, where a difference in luminosity distance of 5% correspond to a difference in H(z) of 10%. In several works the performance of CC in constraining cosmological parameters has been compared with that other probes. In Moresco et al. (2016b) constraints from CC have been compared with the ones from SNe Ia and BAO considering different cosmological models, finding that for a flat \(w_{0}w_{a}\)CDM model, the accuracy on cosmological parameters that can be obtained from CC and BAO are comparable, and that in comparison with other probes CC are in particular useful to measure \(H_0\) and \(\varOmega _{\mathrm{m}}\). Similar conclusions are also found by Vagnozzi et al. (2021) and Gonzalez et al. (2021), where the results from CC are found in good agreement with the ones of BAO and SNe over a wide range of cosmological models. Lin et al. (2020) focused the comparison in particular on \(H_0\) and \(\varOmega _{\mathrm{m}}\), confirming a good consistency between CC and an even broader collection of cosmological probes, and also highlighting the crucial synergy between the various probes.

3.1.5.3 Constraints on cosmological parameters using CC alone and in combination with independent probes

CC are a very attractive probe to study non-standard cosmological models, since no cosmological assumption is made in the derivation of H(z). For this reason, several works have explored how they can be used to put constraints and provide evidences in favor or against various cosmological models, from reconstructing the expansion history of the Universe with a cosmographic approach (Capozziello et al. 2018, 2019), to testing the consistency with concordance models (Seikel et al. 2012) or the spatial curvature of the Universe (Vagnozzi et al. 2021; Arjona and Nesseris 2021), to exploring more exotic cosmological models (such as interacting dark energy models, but not only, see e.g. Bilicki and Seikel 2012; Nunes et al. 2016; Colgáin and Yavartanoo 2019; von Marttens et al. 2019; Yang et al. 2019; Benetti and Capozziello 2019; Aljaf et al. 2021; Ayuso et al. 2021; Reyes and Escamilla-Rivera 2021; Benetti et al. 2021), to directly measuring cosmological parameters (see, e.g., Sect. 3.1.6). In particular, it has been found that CC are extremely useful in combination with other cosmological probes (SNe, BAO, CMB) to increase the accuracy on cosmological parameters (such as \(\varOmega _k\), \(\varOmega _{\mathrm{m}}\) and \(H_0\), see, e.g., Haridasu et al. 2018; Gómez-Valent and Amendola 2018; Lin et al. 2021), to determine the time evolution of the dark energy EoS (Moresco et al. 2016a; Zhao et al. 2017; Di Valentino et al. 2020; Colgáin et al. 2021), and also to provide tighter constraints on the number of existing relativistic species and on the sum of neutrino masses by breaking the existing degeneracies between parameters (Moresco et al. 2012b, 2016a). As suggested by Linder (2017), the measured H(z) data have also been used in combination with the growth rate of cosmic structures to construct a new diagram to disentangle cosmological models (Moresco and Marulli 2017; Basilakos and Nesseris 2017; Bessa et al. 2021). Finally, the CC data, in combination with BAO and SNe, have proven to be extremely useful also to test the distance-duality relation and measure the transparency (or equivalently, the opacity) of the Universe (Holanda et al. 2013; Santos-da-Costa et al. 2015; Chen et al. 2016b; Vavryčuk and Kroupa 2020; Bora and Desai 2021; Mukherjee and Mukherjee 2021; Renzi et al. 2021).

3.1.6 Forecasting the future impact of cosmic chronometers

Currently, there are two main limitations in the CC method: i) the error-bars are dominated by the uncertainty due to metallicity and SPS model, and ii) the absence of a dedicated survey (such as for SNe or BAO) to obtain a statistically significant sample of CC with high spectral S/N and resolution. For the first one, as highlighted in Moresco et al. (2020), there is a clear path to make progress, which involves a meticulous and detailed analysis and comparison of the various models with high-resolution and high S/N observations of CC spectra and SEDs. This program appears to be feasible, enabled by current or forthcoming observational instruments and facilities (e.g., X-Shooter, MOONS) possibly combined with some dedicated observations.

On the other hand, large campaigns to detect massive and passive galaxies with spectra at high S/N and resolutions are not directly foreseen at the moment, and for this science case one should rely on legacy data coming from other planned surveys. Nevertheless, future missions, either already planned (like Euclid, Laureijs et al. 2011), under study (ATLAS probe, Wang et al. 2019), or large data sets yet not fully exploited (SDSS BOSS Data Release 16, Ahumada et al. 2020), could provide significant large statistics of massive and passive galaxies either in redshift ranges previously poorly mapped (\(1.5<z<2\)) or previously exploited with significantly lower statistics (\(0.2<z<0.8\)).

In the following, we therefore explore two different scenarios, constructing their corresponding simulations and extracting forecasts on the expected performance of CC with future data. In the first scenario, we will assume to be able to exploit the available spectroscopic surveys at redshifts \(0.2<z<0.8\) (low-z, e.g. BOSS DR16), and to be able to obtain a sample large enough to measure 10 H(z) points with a statistical error of 1%, and including in the systematic error budget both the contribution of IMF and SPS models (as suggested by Moresco et al. 2020); note that already in the analysis by Moresco et al. (2016b) the statistical error was of the order of 2–3%. In the second scenario, we perform a simulation of CC measurements as they will be enabled by future spectroscopic surveys at higher redshifts (high-z), producing 5 H(z) points with a statistical error of 5% at \(1.5<z<2.1\); as an example, Euclid is expected to provide, especially with its Deep Fields, up to a few thousands very massive and passive galaxies in this redshift range, increasing by 2 orders of magnitude the currently available statistics (Laureijs et al. 2011; Wang et al. 2019) As a final step, we will analyze the combined measurements, and also a more optimistic scenario where the systematic error component is assumed to be minimized following the recipes described in Sect. 3.1.4 (and in particular considering the uncertainty due to SPS models resolved, remaining just with the covariance due to the IMF contribution).

The H(z) simulated data are generated with a given error (uncorrelated across data points) assuming cosmological parameters for the \(\Lambda \)CDM model from Planck Collaboration et al. (2020), and are shown, together with the current CC measurements, in the larger panel of Fig. 6. The associated covariance matrix is, then, calculated as presented in Sect. 3.1.4, considering the contributions previously discussed. To assess the capability of the CC method to constrain cosmological parameters, we explore the constraints current and future data can provide on an open wCDM cosmology, where both the spatial curvature density \(\varOmega _k\) and the dark energy EoS are let free to vary. We considered flat priors on [\(H_0\), \(\varOmega _{\mathrm{m}}\), \(\varOmega _{\mathrm{de}}\), \(w_0\)] (the free parameters in our fit), and analyzed the data in a Bayesian framework with a Monte Carlo Markov Chain (MCMC) approach using the public emcee (Foreman-Mackey et al. 2013) python code. The results are shown in Fig. 6 and in Table 2.

Table 2 Constraints with current and future CC measurements in an open wCDM (upper rows) and in a flat \(\Lambda \)CDM cosmology (lower rows)

As a first comment, we note that the reduced \(\chi ^2\) (\(\chi ^2_\mathrm{red}\)) of the analysis of the current dataset in the flat \(\Lambda \)CDM model is smaller than expected, with a value \(\sim \)0.5. This effect is driven by the fact that in some of the CC analyses (and hence for some of the H(z) points), some sources of error have been estimated a bit too conservatively. This is true in particular for the diagonal part of the covariance matrix, where the error on the metallicity (which is driving in most cases the total error) has been overestimated in some works, either for a large prior assumed due to the fact that a accurate measurement was not feasible (Moresco 2015), or due to a propagation in that error of additional error contributions (e.g., the one due to SPS modeling or SFH estimate, Moresco et al. 2016b; Borghi et al. 2022a) that in this way would have been counted twice, after the full covariance matrix has been considered. This is particularly evident since this effect disappears for the analyses where a full metallicity estimate was available and in which the error has not been overestimated, providing in those cases reasonable values of \(\chi ^2_{\mathrm{red}}\), like, e.g., in Simon et al. (2005) dataset where \(\chi ^2_\mathrm{red}\sim 1.1\), or in Moresco et al. (2012a) where \(\chi ^2_\mathrm{red}\sim 0.75\). This effect, however, does not have a significant impact on the results, since the points with larger errors (driving \(chi^2_{\mathrm{red}}\) to smaller values) are also the less relevant for the cosmological constraints, and the cosmological analyses of different CC subsamples provide compatible results.

As discussed in Moresco et al. (2016a), H(z) measurements at low redshift are crucial to better constrain the intercept the Hubble parameter at \(z\sim 0\), while measurements at higher redshift become more and more important to determine the shape of the H(z) evolution, critically dependent on dark energy and dark matter parameters. As expected, the simulated CC data at low\(-z\) significantly improve the current accuracy on the estimated Hubble constant by a factor of \(\gtrsim 2\) by increasing the precision on the extrapolation of H(z) to \(z\sim 0\). On the other hand, the high\(-z\) simulated data become fundamental to determine the dark energy EoS especially when combined with lower redshift data, improving the accuracy on w from 38 to 29% and on \(\varOmega _{\mathrm{m}}\) from 59 to 31%. When considering the optimistic scenario, CC data will enable an accuracy on \(H_0\) to the 3% level, and on \(\varOmega _{\mathrm{m}}\) and w to \(\sim \)30%.

Clearly, as the dimensionality of the problem decreases, the accuracy on the derived parameter increases. As a comparison, in Table 2 we show also, for the current dataset and the optimistic scenarios, the constraints on \(H_0\) and \(\varOmega _{\mathrm{m}}\) achievable in a flat \(\Lambda \)CDM model. In this regime, we observe a particular improvement in the accuracy on \(\varOmega _{\mathrm{m}}\) up to the 3% level.

Fig. 6
figure 6

Forecast of CC measurements with future surveys. In the bottom left panel, current CC data are shown with white points, while blue and yellow points present forecasts on the expected accuracy with the CC approach respectively at low redshift (with an accurate re-analysis of current surveys, e.g. SDSS) and from future surveys, like the ESA Euclid mission (Laureijs et al. 2011) or the ATLAS probe mission (Wang et al. 2019). For the blue points, the error-bars are smaller than the points. The outer plots shows the constraints for an open wCDM cosmology that can be obtained with current data (gray contours), and with different combinations of the simulated datasets

3.2 Quasars

There have been numerous proposals in the literature for standardising the emission of quasars (e.g., Watson et al. 2011; La Franca et al. 2014; Solomon and Stojkovic 2022); in the following we will focus on the one presented by Risaliti and Lusso (2015). Quasars are the most luminous persistent objects in the Universe, with integrated luminosities of \(10^{44-48}\) erg s\(^{-1}\) over the ultra-violet (UV) to the X-ray energy range. The UV emission is interpreted as the radiation produced by the material flowing towards the supermassive black hole, located in the center of a galaxy, in the form of an accretion disc, and it makes up to roughly 90% of the quasar bolometric budget (Shakura and Sunyaev 1973). The rest is released as X-rays, which are thought to originate in a hot plasma of relativistic electrons (Svensson and Zdziarski 1994), called corona for analogy with the Sun, that Compton up-scatter photons coming from the disk. The UV and X-ray fluxes have long been known to obey a non-linear relation between their UV (at the rest frame 2500 Å, \(L_{\mathrm{UV}}\)) and X-ray (at the rest frame 2 keV, \(L_{\mathrm{X}}\)) emission (e.g., Tananbaum et al. 1979; Zamorani et al. 1981; Avni and Tananbaum 1982, parameterized as \(L_{\mathrm{X}}\propto L_{\mathrm{UV}}^\gamma \), with \(\gamma \simeq 0.6\)), yet how the gravitational energy is partly transferred from the disc to the corona, preventing its fast cooling via the production of X-ray photons through the inverse Compton process, is unknown.

3.2.1 Basic idea and equations

The technique that makes use of quasars as cosmological probes hinges on the non-linear relation mentioned above to provide an independent measurement of their distances, thus turning quasars into standardizable candles and extending the distance modulus-redshift relation (or the so-called Hubble–Lemaître diagram) of supernovae Ia to a redshift range that is still poorly explored (\(z>2\); Risaliti and Lusso 2015). The applicability of this methodology is based on two key points. Firstly, the understanding that most of the observed dispersion in the \(L_{\mathrm{X}}-L_{\mathrm{UV}}\) relation is not intrinsic to the relation itself but due to observational issues, such as gas absorption in the X-rays, dust extinction in the UV, calibration uncertainties in the X-rays (e.g. Lusso 2019), variability, and selection biases associated with the flux limits of the different samples. In fact, with an optimal selection of clean sources (i.e., where the intrinsic UV and X-ray quasar emission can be measured), the observed dispersion drops from 0.4 dex to \(\simeq \)0.2 dex (Lusso and Risaliti 2016, 2017). The interested reader should refer to Lusso and Risaliti (2016) and Lusso et al. (2020) for further details on the sample selection. Specifically, Lusso and Risaliti (2016) determined how both slope and dispersion vary depending upon a given selection criterion by also including censored data at X-ray energies (see their Table 3). They also discussed the additional effect of X-ray variability and measurement uncertainties on the determination of the slope and the dispersion. Secondly, the slope of the \(L_{\mathrm{X}}-L_{\mathrm{UV}}\) relation does not evolve with redshift up to \(z\simeq 4\) (i.e., the highest redshift where the source statistics is currently sufficient to verify any possible dependence of the slope with distance). This point has been recently discussed also by Sacchi et al. (2022), who demonstrated that a one-by-one spectral analysis of a sample of quasar at redshift higher than 2.5, with high-quality X-ray and UV observations, further reduces the dispersion from 0.2 dex (by employing photometric data only) to 0.12 dex, whilst the observed slope of the relation is still around 0.6. Sacchi et al. (2022) also showed that the composite X-ray and UV spectra of these high-redshift quasars do not show any peculiar spectral feature or systematic difference with respect to the average spectra of quasars at lower redshifts. The absence of any spectral variance between high- and low-redshift quasars, combined with the tightness of the X-ray to UV relation, suggests that no evolutionary effects are present in the relation itself. A key consequence is that the \(L_{\mathrm{X}}-L_{\mathrm{UV}}\) relation must be the manifestation of a universal mechanism at work in the quasar engines.

To fit the Hubble diagram, the distance modulus for each object should be computed first. The method is based on the non-linear relation between \(L_{\mathrm{X}}\) and \(L_{\mathrm{UV}}\)

$$\begin{aligned} \log L_{\mathrm{X}}=\beta +\gamma \log L_{\mathrm{UV}}, \end{aligned}$$
(22)

from which the luminosity distance (e.g., see Risaliti and Lusso 2015, 2019) can be derived as:

$$\begin{aligned} \log D_{\mathrm{L}} = \frac{\left[ \log F_{\mathrm{X}} -\beta -\gamma (\log F_{\mathrm{UV}}+27.5) \right] }{2(\gamma -1)}-\frac{1}{2}\log (4\pi ) + 28.5, \end{aligned}$$
(23)

assuming that \(F=L/4\pi D_{\mathrm{L}}^2\), where \(F_{\mathrm{X}}\) and \(F_\mathrm{UV}\) represent the flux densities (in erg s\(^{-1}\) cm\(^{-2}\) Hz\(^{-1}\)) at X-ray and UV energies, respectively. \(F_{\mathrm{UV}}\) is normalized to the (logarithmic) value of 27.5 in the equation above, whilst \(D_{\mathrm{L}}\) is in units of cm and is normalized to 28.5 (in logarithm).Footnote 5 The slope of the \(F_{\mathrm{X}}-F_{\mathrm{UV}}\) relation, \(\gamma \), is a free parameter, and so is the intercept \(\beta \). The intercept \(\beta \) of the \(L_{\mathrm{X}}-L_{\mathrm{UV}}\) relation is related to the one of the \(F_{\mathrm{X}}-F_{\mathrm{UV}}\) relation, \({\hat{\beta }}\), as \({\hat{\beta }}(z)=2(\gamma -1)\log D_{\mathrm{L}}(z) + (\gamma -1)\log 4\pi + \beta \). The distance modulus, DM, is thus:

$$\begin{aligned} DM= 5 \log D_{\mathrm{L}} - 5 \log (10\,\mathrm{pc}) , \end{aligned}$$
(24)

and the uncertainty on DM, dDM, is:

$$\begin{aligned} d DM= & {} \frac{5}{2(\gamma -1)} \left[ \left( d\log F_\mathrm{X}\right) ^2 + \left( \gamma d\log F_{\mathrm{UV}}\right) ^2 + \left( d\beta \right) ^2 \right. \nonumber \\&\left. + \left( \frac{d\gamma \left[ \beta +\log F_\mathrm{UV}+27.5-\log F_{\mathrm{X}}\right] }{\gamma -1} \right) ^2\right] ^{1/2} , \end{aligned}$$
(25)

where \(d\log F_{\mathrm{X}}\) and \(d\log F_{\mathrm{UV}}\) are the logarithmic uncertainties on \(F_{\mathrm{X}}\) and \(F_{\mathrm{UV}}\), respectively. Equation 25 assumes that all the parameters are independent, and takes into account also the uncertainties on \(\beta \) and \(\gamma \). The fitted likelihood function, \({\mathcal {L}}\), is then defined as:

$$\begin{aligned} \ln {\mathcal {L}} = - \frac{1}{2} \sum _i^N\left( \frac{(y_i-\psi _i)^2}{s_i^2} - \ln s^2_i\right) , \end{aligned}$$
(26)

where N is the number of sources, \(s_i^2 = d y_i^2 +\gamma ^2 d x_i^2 + \exp (2\ln \delta )\) takes into account the uncertainties on both the \(x_i\) (\(\log F_{\mathrm{UV}}\)) and \(y_i\) (\(\log F_{\mathrm{X}}\)) parameters of the fitted relation. The parameter \(\delta \) represents what is left in the scatter of the relation once it is marginalized over all the parameters and thus it can be considered a proxy of the intrinsic dispersion under the assumption that all the systematics have been taken into account.Footnote 6 The variable \(\psi \) is the modeled X-ray monochromatic flux (\(F_{\mathrm{X,\, \mathrm mod}}\)), defined as:

$$\begin{aligned} \psi = \log F_{\mathrm{X,\, \mathrm mod}} = \beta + \gamma (\log F_{\mathrm{UV}}+27.5) +2(\gamma -1)(\log D_{\mathrm{L,\, \mathrm mod}} -28.5) , \end{aligned}$$
(27)

and it is dependent upon the data, the redshift and the model (cosmological or parametric) assumed for the distances (e.g., \(\Lambda \)CDM, wCDM or a polynomial function).

In the case of a parametric (cosmology independent) approach, the data are fitted with a luminosity distance described by a fifth-grade polynomial of \(\log (1+z)\), where the cosmographic function is:

$$\begin{aligned} D_{\mathrm{L,\,\mathrm mod}}(z)=k \ln (10)\frac{c}{H_0}\sum ^5_{i=1} a_i\log ^i(1+z) , \end{aligned}$$
(28)

where k and \(a_i\) (\(a_1\) is fixed to 1 to reproduce the local Hubble law) are free parameters. The polynomial order is chosen depending upon the range of redshift spanned by the quasars to ensure convergence (see Bargiacchi et al. 2021b).

For any analysis that involves a detailed test of cosmological models, the quasar distances should be cross-calibrated by making use of the distance ladder through supernovae Ia. In fact, the DM values of quasars are not absolute, thus a cross-calibration parameter (k) is needed. The parameter k should be fitted simultaneously for supernovae Ia and quasars (i.e., k is a rigid shift of the quasar Hubble diagram to match the one of supernovae).

The slope of the \(L_{\mathrm{X}}-L_{\mathrm{UV}}\) relation can be kept fixed in the procedure above. Yet, it is better to marginalize over \(\gamma \) to check whether any degeneracy of the slope with the other parameters is present, and whether the statistical significance of any deviation from a cosmological model can be affected by the assumption of a \(\gamma \) value that slightly deviates from the true one. The marginalization on \(\gamma \) is a more conservative procedure, as it reduces the significance of any observed deviation with respect to the same MCMC analysis with \(\gamma \) fixed. Therefore, if a statistical deviation persists with respect to a cosmological model even allowing for a variable \(\gamma \), its significance should be considered as an indicative lower limit with respect to the case where \(\gamma \) is fixed. Finally it should be noted that the Hubble constant \(H_0\) in Eq. (28) is degenerate with the k parameter, so it can assume any arbitrary value. In the following, the Hubble constant is assumed to be fixed to \(H_0\)=70 \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\)(see also Lusso et al. 2019a, 2020; Bargiacchi et al. 2021b).

3.2.2 Sample selection

To build a quasar sample that can be utilized for cosmological purposes, both X-ray and UV data are required to cover the rest-frame 2 keV and 2500 Å. The most up-to-date broad-line quasar sample considered for cosmological purposes has been assembled by combining seven different samples from both the literature and the public archives (Lusso et al. 2020). The former group includes the samples at \(z\simeq 3.0-3.3\) by Nardini et al. (2019), \(4<z<7\) by Salvestrini et al. (2019), \(z>6\) by Vito et al. (2019), the XMM-XXL North quasar sample published by Menzel et al. (2016), and one new optically-selected SDSS quasar at \(z=4.109\), J074711.14\(+\)273903.3, whose X-ray observation was obtained as part of a proposed large programme with XMM-Newton (cycle 18, proposal ID: 084497, PI: Lusso). This collection is complemented by including quasars from a cross-match of optical (i.e. the Sloan Digital Sky Survey) and X-ray public catalogs (i.e. XMM-Newton and Chandra), which will be labeled as SDSS-4XMM and SDSS-Chandra samples hereafter (Bisogni et al. 2021). A local subset of active galactic nuclei (AGN) with UV (i.e. International Ultraviolet Explorer) data and X-ray archival information was also added to improve the sampling at very low redshifts. The reader interested on the description of the different subsets should refer to Lusso et al. (2020). The main parent sample is composed by \(\sim \)19,000 objects, from local up to \(z=7.52\), where quasars with bright radio jets and broad absorption lines (BALs) have been removed. In fact, an excess of X-rays due to synchrotron emission is observed in bright radio quasars due to the presence of the jet, whilst the strong absorption features observed in BALs, and usually attributed to winds/outflows, hamper a robust measurement of the quasar continuum in the UV.

Fig. 7
figure 7

Image reproduced with permission from Lusso et al. (2020), copyright by ESO

Distribution of luminosities at rest-frame 2500 Å as a function of redshift for the main (grey points, \(\simeq 19,000\) objects) and the selected (cleaned) samples (Lusso et al. 2020). Brown and yellow squares show the high-z sample (Salvestrini et al. 2019; Vito et al. 2019), cyan points the SDSS-4XMM one, brown triangles the XMM-XXL one (Menzel et al. 2016), orange pentagons the local AGN sample, red stars the \(z\simeq 3\) quasar sample (Nardini et al. 2019), the green star represents the new \(z\simeq 4\) quasar from a dedicated XMM programme (see text for details), and gold pentagons the SDSS-Chandra one (Bisogni et al. 2021).

To select a sub-sample with accurate estimates of \(F_{\mathrm{X}}\) and \(F_{\mathrm{UV}}\), systematic effects should be taken into account and low-quality measurements should be neglected. A minimum signal-to-noise (S/N) of 1 on the soft and hard X-ray band fluxes should be considered, whilst no such a filter is required in the UV since the S/N at these wavelengths is typically significantly higher than 1. The main possible sources of contamination or systematic error that may affect the flux measurements are: dust reddening and host-galaxy contamination in the optical/UV, gas absorption in the X-rays, and the Eddington bias associated with the flux limit of the X-ray observations.

Regarding the latter, any flux limited sample is biased towards brighter sources at high redshifts and this should be more relevant to the X-rays, since the relative observed flux range is narrower than in the UV. Specifically, AGN with an average X-ray intensity close to the flux limit of the observation will be observed only in case of a positive fluctuation. This introduces a systematic, redshift-dependent bias towards high fluxes, known as Eddington bias, which has the effect to flatten the \(F_\mathrm{X}-F_{\mathrm{UV}}\) relation. Samples with datasets of only detections might thus be affected by such a bias. One possibility is to include censored data in the analysis. Yet, the investigation of both the \(F_{\mathrm{X}}-F_{\mathrm{UV}}\) and the distance modulus-redshift relations is far from trivial, since it strongly depends on the weights assumed in the fitting algorithm. Therefore, one needs to find an alternative method to obtain an (almost) unbiased sample.

To minimize this bias, one possible approach is to neglect all X-ray detections below a threshold defined as \(\kappa \) times the intrinsic dispersion of the \(F_{\mathrm{X}}-F_{\mathrm{UV}}\) relation (\(\delta \)) computed in narrow redshift intervals (Lusso and Risaliti 2016; Risaliti and Lusso 2019), specifically:

$$\begin{aligned} \log F_{2\,{\mathrm{keV,\,exp}}} - \log F_{\min } < \kappa \delta , \end{aligned}$$
(29)

where \(F_{2\,{\mathrm{keV,\,exp}}}\) is the monochromatic flux at 2 keV expected from the observed rest-frame quasar flux at 2500 Å with the assumption of a true \(\gamma \) of 0.6; it is calculated as follows:

$$\begin{aligned} \log F_{2\,{\mathrm{keV,\,exp}}} =(\gamma -1)\log (4\pi ) + (2\gamma -2)\log D_{\mathrm{L}} + \gamma \log F_{\mathrm{UV}} + \beta , \end{aligned}$$
(30)

where \(D_{\mathrm{L}}\) is the luminosity distance calculated for each redshift with a fixed cosmology, and the parameter \(\beta \) represents the pivot point of the non-linear relation in luminosities, \(\beta =26.5-30.5\gamma \simeq 8.2\).Footnote 7

Fig. 8
figure 8

The \(L_{\mathrm{X}}-L_{\mathrm{UV}}\) relation for the \(\simeq 2400\) quasars published by Lusso et al. (2020). Symbol keys are the same as in Fig. 7. The red line represents the linear regression fit of the data obtained through the hierarchical Bayesian model linmix (Kelly 2007). The light black lines represent some random realisations of the \(L_{\mathrm{X}}-L_{\mathrm{UV}}\) relation. The resulting slope and intercept of the best-fit regression line are \(\gamma =0.667\pm 0.007\) and \(\beta =6.25\pm 0.23\). The observed dispersion along the \(L_{\mathrm{X}}-L_{\mathrm{UV}}\) relation is 0.24 dex. Luminosity values are computed by assuming a flat \(\Lambda \)CDM model with \(\varOmega _{\mathrm{M}}=0.3\)

The parameter \(F_{\min }\) in the Eq. (29) represents the flux limit of a given observation or survey, whilst the product \(\kappa \delta \) is a value that should be estimated for all the sub-samples constructed from archives (e.g., SDSS-4XMM, SDSS-Chandra) or surveys (XXL). The Eddington bias is then reduced by including only X-ray detections for which the minimum detectable flux \(F_{\min }\) in that given observation is lower than the expected X-ray flux \(F_{2\,{\mathrm{keV,\,exp}}}\) by a factor that is proportional to the dispersion in the \(F_{\mathrm{X}}-F_{\mathrm{UV}}\) relation in narrow redshift bins (see Appendix A in Lusso and Risaliti 2016 and Risaliti and Lusso 2019).

A complete description and implementation of these filters to obtain the final best sample for a cosmological analysis is presented in Lusso et al. (2020, see their Section 5). The most up-to-date quasar sample is composed by 2,421 quasars spanning a redshift interval \(0.009\le z\le 7.52\), with a mean (median) redshift of 1.442 (1.295) and it is shown in Fig. 7. Figure 8 presents the \(L_{\mathrm{X}}-L_{\mathrm{UV}}\) relation for this sample, where the best fit regression line is obtained through the hierarchical Bayesian model linmix (Kelly 2007).

3.2.3 Measurements

Ideally, spectroscopy can deliver cleaner measurements of the relevant parameters (i.e. the X-ray and UV rest frame fluxes), but since a detailed spectroscopic UV and X-ray analysis can be carried out only for a relatively small number of sources, the currently published quasar sample also still heavily relies on broadband photometry in both UV and X-rays to compute the monochromatic UV and X-ray fluxes, as well as the UV colors and X-ray slopes. These parameters are thus derived from the photometric AGN spectral energy distribution (SED).

To compile the quasar SEDs, multi-wavelength data from radio to UV should be considered, such as the FIRST survey in the radio Becker et al. (1995), the Wide-Field Infrared Survey (WISE Wright et al. 2010) in the mid-infrared, the Two Micron All Sky Survey (2MASS Cutri et al. 2003; Skrutskie et al. 2006) and the UKIRT Infrared Deep Sky Survey (UKIDSS Lawrence et al. 2007) in the near-infrared, SDSS in the optical and the Galaxy Evolution Explorer (GALEX Martin et al. 2005) survey in the UV. Most of the relevant broadband information, as well as the spectroscopic redshifts, are compiled in the SDSS quasar catalogs. Galactic reddening must be taken into account by utilizing the selective attenuation of the stellar continuum \(k(\lambda )\) (e.g. Fitzpatrick 1999), along with the relative Galactic extinction (e.g. Schlegel et al. 1998) for each object. For each source, the observed flux and the corresponding frequency in all the available bands should be computed. The data used in the SED computation are then blue-shifted to the rest-frame (with no K-correction). All the rest-frame luminosities are then determined from a first-order polynomial between two adjacent points. At wavelengths bluer than about 1400 Å, significant absorption by the intergalactic medium (IGM) is expected in the continuum (\(\sim \)10% between the Ly\(\alpha \) and C iv emission lines, see Lusso et al. 2015, for details). Hence, when computing the relevant parameters, all the rest-frame data at \(\lambda <1500\)Å should be excluded from the SED (or corrected for such an absorption if possible).

By compiling a broad photometric coverage, the rest-frame luminosity at 2500 Å can be computed via interpolation for the majority of the quasars whenever the reference frequency is covered by the photometric SED. Otherwise, the value can be extrapolated by considering the slope between the luminosity values at the closest frequencies. Uncertainties on monochromatic luminosities (\(L_\nu \propto \nu ^{-\gamma }\)) from the interpolation (extrapolation) between two values \(L_1\) and \(L_2\) are computed as:

$$\begin{aligned} \delta L = \sqrt{\left( \frac{\partial L}{\partial L_1}\right) ^2 (\delta L_1)^2 + \left( \frac{\partial L}{\partial L_2}\right) ^2 (\delta L_2)^2} . \end{aligned}$$
(31)

To obtain the rest-frame luminosities at 2 keV, a detailed X-ray spectral analysis of all the quasars is impractical, given the overall large number of sources, while a photometric approach is a viable solution (Risaliti and Lusso 2019; Lusso et al. 2020). Briefly, for sources having an entry in the 4XMM-DR9 serendipitous source catalog,Footnote 8 the rest-frame 2 keV fluxes and the relative (photometric) photon indices, \(\varGamma _X\) (along with their 1\(\sigma \) uncertainties), can be derived from the tabulated 0.5–2 keV (soft, \(F_{\mathrm{S}}\)) and 2–12 keV (hard, \(F_{\mathrm{H}}\)) fluxes. These band-integrated fluxes are blue-shifted to the rest-frame by considering a pivot energy value of 1 keV (\(E_{\mathrm{S}}\)) and 3.45 keV (\(E_{\mathrm{H}}\)), respectively, and by assuming the same photon index used to derive the fluxes in the 4XMM catalog (i.e. \(\varGamma _X=1.42\), Webb et al. 2020). For the soft band, the monochromatic flux at \(E_{\mathrm{S}}\) is then:

$$\begin{aligned} F_E(E_S)=F_{\mathrm{S}}\frac{(2-\varGamma _X) E_{\mathrm{S}}^{1-\varGamma _X}}{(2\,{\mathrm{keV}})^{2-\varGamma _X}-(0.5\,\mathrm keV)^{2-\varGamma _X}} , \end{aligned}$$
(32)

in units of erg s\(^{-1}\) cm\(^{-2}\) keV\(^{-1}\). An equivalent expression holds for the hard band, with the obvious modifications. Flux values must be corrected for Galactic absorption. The photometric photon index is then estimated from the slope of the power-law connecting the two soft and hard monochromatic fluxes at the rest-frame energies corresponding to the observed pivot points. The rest-frame photometric 2 keV flux (and its uncertainty) is interpolated (or extrapolated) based on such a power-law. A similar approach can be adopted for any X-ray catalog (e.g., the Chandra source catalog,Footnote 9 see Bisogni et al. 2021).

3.2.4 Systematic effects

This method may still have several shortcomings, thus it is mandatory to demonstrate that the observed deviation from \(\Lambda \)CDM at a redshift \(>2\) is neither driven by systematics in the quasar sample selection nor by the procedure adopted to fit the quasar Hubble–Lemaître diagram. Potential convergence issues may arise from the use of the polynomial expansion (Eq. (28)) to fit the Hubble diagram when observational data go beyond \(z \simeq 1\) (see Bargiacchi et al. 2021b for an in-depth discussion). Moreover, the choice of these monochromatic luminosities is rather arbitrary, and mostly based on historical reasons. It is possible that the \(L_{\mathrm{X}}-L_{\mathrm{UV}}\) relation is tighter with a different choice of the indicators of UV and X-ray emission (see e.g. Young et al. 2010). A careful analysis of this issue may also provide new insights as to the physical process responsible for this relation. A small fraction of moderate/bright radio sources may still be present in the sample. Deep all-sky radio surveys and multi-wavelength approaches (Mingo et al. 2016) are necessary to better remove these sources from the clean samples.

One serious issue that could affect the precision of the flux estimates at X-rays is gas absorption. Previous studies based on large AGN surveys show that about 25% of optically selected un-obscured AGN display some levels of X-ray absorption (Merloni et al. 2014) in excess of the Galactic value. If not corrected for, this absorption leads to an underestimate of the X-ray flux, and an overestimate of the distance. As absorption mostly affects the low energy part of the X-ray spectrum, this bias is expected to be more relevant at low redshift (\(z<1\)). Nonetheless, the global effect on the Hubble diagram will be a decrease of the ratio between high-redshift and low-redshift distances, i.e., qualitatively, this effect may lead a discrepancy with the concordance model. In fact, including AGN with \(\varGamma _X<1.5\) produces a flattening of the Hubble diagram, as expected if absorbed sources start to contaminate the sample. A conservative threshold should thus be \(\varGamma _X>1.7\). Therefore, sources with an X-ray photon index below that value are removed from the sample.

Work still needs to be done regarding the effect of the X-ray and UV variability on the relation (Lusso and Risaliti 2016). Variations in the UV brightness are on the order of about 10% (i.e. 0.04 dex in logarithmic units) on time scales of months to years (e.g. Vanden Berk et al. 2001). The X-ray variability is on the order of 5% on long time scales at high luminosity and somewhat larger at lower luminosity (e.g. Zheng et al. 2017) and it represents about 30% on the dispersion of the X-ray/UV relation overall (about 0.12 dex compared to the observed 0.24 dex, see Lusso and Risaliti 2016, for details). Moreover, it is well known that the UV and X-ray variability are not correlated on short timescales (e.g., NGC5548 Edelson et al. 2015), so the intrinsic variance on the relation could be even lower than 0.1 dex. Yet, regarding the X-ray variability, the increase of dispersion due to variability does not modify the slope of the relation (Lusso and Risaliti 2016), even when using simultaneous datasets (Grupe et al. 2010; Wu et al. 2012; Lusso and Risaliti 2016). Although in the case of low fluxes, X-ray and UV variability may bias our data towards brighter states, both X-ray and UV variability have the only effect of producing higher uncertainties on the final computation of the parameters, without introducing any major systematic.

Another key issue that could affect the analysis of the distance modulus-redshift relation is the correction for the Eddington bias, which flattens the \(F_{\mathrm{X}}-F_{\mathrm{UV}}\) relation and thus the Hubble diagram, especially at high redshifts. At present, such a correction is at the expenses of the sample statistics. Depending on the flux limit of the given observation/survey, the sample statistics of the parent sample may drop by more than 50%. Additionally, the assumption that the true slope of the \(F_{\mathrm{X}}-F_{\mathrm{UV}}\) is \(\gamma =0.6\) may leave some hidden trends in the residuals of the Hubble diagram as a function of redshift. Nonetheless, the analysis of the residuals of the Hubble diagram as a function of redshift and \(\gamma \) for different values of the threshold \(\kappa \delta \) does not show any obvious trend (see Sect. 9.1 by Lusso et al. 2020 and appendix A in Lusso and Risaliti 2016).

The presence of an additional contribution of dust reddening in the UV band should be considered amongst the possible residual (and redshift-dependent) observational systematics in the Hubble diagram. Going to higher redshifts, the rest-frame optical/UV spectra shift to higher (shorter) frequencies (wavelengths), where the dust absorption cross-section is higher. This might underestimate \(F_\mathrm{UV}\) measurements, which would imply an intrinsically larger value of the luminosity distance (and thus the distance modulus) than the measured one (see Section 9.4 by Lusso et al. 2020 for details).

The results on the quasar Hubble diagram crucially depend on the assumption of the non-evolution of the relation, which has been verified by the constancy of the slope across a wide redshift interval (see e.g. Figure 8 in Lusso et al. 2020) and by the agreement with supernovae (and, hence, with the flat \(\Lambda \)CDM model) up to z \(\sim \) 1.5. Both findings suggest a non-evolution as also implied by the analysis of Dainotti et al. (2022a, and references therein). The standardization of quasars through the \(L_{\mathrm{X}}-L_{\mathrm{UV}}\) relation is a rather young technique and it is still subject to extensive tests by several independent groups ( e.g. Dainotti et al. 2022a; Colgáin et al. 2022b, a, see also Khadka and Ratra 2020b, c, 2021).

Finally, we note that, in a Bayesian framework, handling data with uncertainties on both the x and y parameters is quite subtle, and can lead to biases unless a hierarchical model is adopted where the true value of a certain parameter is considered in the fitting procedure and then marginalized over. As uncertainties on the X-ray fluxes are on average higher than the ones on the UV ones, one can fit the data by including uncertainties on the X-ray emission only. To alleviate possible issues, the priors should incorporate as much information as possible in the Bayesian formulation in order to constrain the set of plausible solutions. Something along this line was performed by Lusso and Risaliti (2016), who considered the hierarchical Bayesian model for linear regression by Kelly (2007, see also Fig. 8). They find statistically consistent results with the Bayesian analysis performed with emcee in the case of the fit of the \(L_\mathrm{X}-L_{\mathrm{UV}}\) relation, so the implementation of the latter algorithm does not seem to show any clear bias. Nonetheless, the use of hierarchical models for the analysis of the quasar Hubble diagram is surely a point that should be investigated further.

3.2.5 Main results and forecasts

Quasars have been now extensively used to determine cosmological constraints by fitting their Hubble diagram in combination with the one of supernovae Ia, as discussed in Sect. 3.2.1 (e.g. López-Corredoira et al. 2016; Bisogni et al. 2017; Lusso et al. 2019b; Melia 2019; Wei and Melia 2020; Demianski et al. 2020; Zhao and Xia 2021; Li et al. 2021a; Bargiacchi et al. 2021a; Leizerovich et al. 2021). Amongst the main results, it has been found that the expansion rate of the Universe based on the combined quasar and supernovae Ia Hubble diagram shows a deviation from the concordance model at high redshifts (\(z>1.4\)), with a statistical significance of \(\sim 3-4\sigma \). Figure 9 presents the Hubble diagram for the most up-to-date samples of quasars (Lusso et al. 2020) and Type Ia supernovae from the Pantheon survey (Scolnic et al. 2018). The best MCMC cosmographic fit (see Eq. (28)) is shown with a red line, whilst black points are the means (along with the uncertainty on the mean) of the distance modulus in narrow (logarithmic) redshift intervals, plotted for visualization purposes only. The residuals are displayed in the bottom panel with the same symbols, and do not reveal any apparent trend with redshift. The MCMC fit assumes uniform priors on the parameters (see Bargiacchi et al. 2021b, for more details on the cosmographic technique employed).

Fig. 9
figure 9

Image reproduced with permission from Lusso et al. (2020), copyright by ESO

Distance modulus-redshift relation (Hubble diagram) for the clean quasar sample and Type Ia supernovae (Pantheon, magenta points). Symbol keys are the same as in Fig. 7. The red line represents a fifth order cosmographic fit of the data, whilst the black points are averages (along with their uncertainties) of the distance moduli in narrow (logarithmic) redshift intervals. The dashed black line shows a flat \(\Lambda \)CDM model fit with \(\varOmega _{\mathrm{m}}\) \(=0.3\). The bottom panel shows the residuals with respect to the cosmographic fit and the black points are the averages of the residuals over the same redshift intervals.

Fig. 10
figure 10

Left panel. Marginalized posterior distributions (1, 2 and 3\(\sigma \)) of the (\(w_0\),\(w_{\mathrm{a}}\)) parameters for the combined quasars (Lusso et al. 2020) and supernovae Ia (Scolnic et al. 2018) samples (blue contours). The constraints from the combination of Planck TT,TE,EE+lowE+lowl + BAO are also shown (green contours, Planck Collaboration et al. 2020). The dashed lines mark the point corresponding values to the \(\Lambda \)CDM model. The resulting (\(w_0\),\(w_{\mathrm{a}}\)) for the combined quasars + SNe are statistically consistent with the phantom regime (\(w<-1\)) and at variance with the \(\Lambda \)CDM model at more than the \(3\sigma \) statistical level. Right panel. Marginalized posterior distributions (1, 2 and 3\(\sigma \)) of the (\(H_0\),w) parameters. The green contours are the same as the left panel, whilst the red, orange and yellow contours represents the constraints from Planck TT,TE,EE+lowE+lowl only (i.e. excluding BAO). The dashed grey line marks the \(H_0\) value resulting from the baseline model for the Cepheid-supernovae Ia sample along with the 1\(\sigma \) and 2\(\sigma \) uncertainty (i.e. \(H_0\) \(=73.04\pm 1.04\) km s\(^{-1}\) Mpc\(^{-1}\); Riess et al. 2022). The dashed blue line marks the best fit w value in a flat wCDM model for the combined quasars and supernovae Ia (i.e. \(w=-1.49\pm 0.14\); Bargiacchi et al. 2021a)

The constraints on \(w_0\) and \(w_a\) in a \(w_0w_a\)CDM cosmological model combining the latest quasar and supernovae samples are shown in Fig. 10. The constraints from the combination of Planck18 (Planck Collaboration et al. 2020) TT,TE,EE+lowE+lowl + BAO are also shown for reference. The dashed lines mark the point corresponding to the flat \(\Lambda \)CDM model for \(w_0=-1\) and \(w_a=0\). The resulting (\(w_0\),\(w_a\)) for the combined quasars+SNe are statistically consistent with the phantom regime (\(w<-1\)) and at variance with the \(\Lambda \)CDM model at more than the \(3\sigma \) statistical level. A summary of the cosmological fits to the combined quasar and supernovae samples are presented in Table 3. The detailed discussion of the cosmological implications of this deviation and its statistical significance is discussed at length by Risaliti and Lusso (2019); Lusso et al. (2019a). Figure 10 also presents the marginalized posterior distributions for the \(H_0\) and w parameters. The red, orange and yellow contours represents the 1, 2 and 3\(\sigma \) constraints from Planck TT,TE,EE+lowE+lowl only (i.e. base w model excluding BAO). The dashed grey line marks the \(H_0\) value resulting from the baseline model for the Cepheid-supernovae Ia sample along with the 1\(\sigma \) and 2\(\sigma \) uncertainty (i.e. \(H_0\) \(=73.04\pm 1.04\) km s\(^{-1}\) Mpc\(^{-1}\); Riess et al. 2022), whilst the dashed blue line marks the best fit w value in a flat wCDM model for the combined quasars and supernovae Ia (i.e. \(w=-1.49\pm 0.14\); see Table 2 by Bargiacchi et al. 2021a). As discussed in Sect. 3.2.1, the technique presented here is degenerate on \(H_0\) but it provides constraints on the w parameter. Notably, CMB alone predicts high values on \(H_0\) (i.e. \(H_0\) \(=87^{+8}_{-11}\) km s\(^{-1}\) Mpc\(^{-1}\)) and w constraints in the phantom regime (i.e. \(w=-1.6^{+0.3}_{-0.2}\)), and it is only when BAO are included that Planck becomes consistent with the concordance model (we refer the interested reader to the Section 7.4.1 by Planck Collaboration et al. 2020).

Concerning the deviation from the concordance model, Bargiacchi et al. (2021a) also presented a detailed analysis of BAO, SNe, and quasar data to understand their compatibility as well as their implications for extensions of the standard cosmological model. Specifically, they considered a flat and non-flat \(\Lambda \)CDM cosmology, a flat and non-flat dark energy model with a constant dark energy equation of state parameter, and four flat dark models with variable w. They find that a joint analysis of quasars and SNe with BAO is only possible in the context of a flat Universe. BAO confirm the flatness condition assuming a curved geometry, whilst SNe+QSO show evidence of a closed space. They also find \(\varOmega _{\mathrm{M}}=0.3\) in all data sets assuming a flat \(\Lambda \)CDM model. Yet, all the other models show a statistically significant deviation at \(2-3 \sigma \) with the combined SNe+quasars+BAO data set. In the models where the dark energy density evolves with time, SNe+QSO+BAO data always prefer \(\varOmega _{\mathrm{M}}> 0.3\), \(w_0<-1\) and \(w_{\mathrm{a}}>0\). They finally argue that this phantom behaviour is mainly driven by SNe+QSO, while BAO are closer to the flat \(\Lambda \)CDM model. Recently, Solomon and Stojkovic (2022) have also presented a combined Type Ia SNe and quasar Hubble diagram in the redshift interval \(z\simeq 0.5-3.5\), by making use of a variability-absolute magnitude relation in quasar light curves. Their analysis seems to show a similar discrepancy with the \(\Lambda \)CDM at redshift higher than 2. Type Ia SNe at redshift \(z=1-2\) also appear to show a similar trend (see Figure 6 in Bargiacchi et al. 2021b), although only \(\sim \)23 Type Ia SNe are currently observed in that redshift range. Future surveys that will target a higher number of Type Ia SNe at high redshift could provide compelling evidences that the discrepancy may indeed be confirmed by a completely independent method.

Table 3 Summary of the cosmological constraints for the combined quasars (Lusso et al. 2020) and supernovae Ia (Scolnic et al. 2018) sample for three different cosmological models: flat \(\Lambda \)CDM, open \(\Lambda \)CDM (o\(\Lambda \)CDM) and flat \(w_0-w_a\)CDM (see Bargiacchi et al. 2021a, for more details)

With currently operating facilities, dedicated observations of well-selected high-z quasars will greatly improve the test of the cosmological model and the study of the dispersion of the \(L_\mathrm{X}-L_{\mathrm{UV}}\) relation, especially at \(z\simeq 4\) and beyond. The extended Roentgen Survey with an Imaging Telescope Array (eROSITA, Predehl 2012; Merloni et al. 2012), flagship instrument of the ongoing Russian Spektrum-Roentgen-Gamma (SRG) mission, will represent a powerful and versatile X-ray observatory in the next decade. The eROSITA sky will be dominated by the AGN population, with \(\sim \)3 million AGN with a median redshift of \(z\sim 1\) expected by the end of the nominal 4-year all-sky survey at the sensitivity of \(F_{0.5{-}2\,{\mathrm{keV}}} \simeq 10^{-14}\, \mathrm {erg\, s^{-1}\, cm^{-2}}\) and for which extensive multi-wavelength follow-ups are already planned. Concerning the constraints on the cosmological parameters (such as \(\varOmega _{\mathrm{m}}\), \(\varOmega _{\mathrm{de}}\), and w) through the Hubble diagram of quasars, the 4-year eROSITA (launched from Baikonur on July 13, 2019) all-sky survey alone, complemented by redshift and broadband photometric information, will supply the largest quasar sample at \(z<2\) (average redshift \(z\simeq 1\)). Nonetheless, a relatively small population should survive the Eddington bias cut at higher redshifts (see, e.g., Medvedev et al. 2020 for the highest redshift radio bright quasar), thus being available for cosmology as eROSITA samples the brighter end of the X-ray luminosity function (Lusso 2020, but see also Sect. 6.2 in Comparat et al. 2020). The large number of eROSITA quasars at \(z\simeq 1\) will be essential for both a better cross-calibration of the quasar Hubble diagram with supernovae and a more robust determination of \(\varOmega _{\mathrm{de}}\), which is sensitive to the shape of the low redshift part of the distance modulus-redshift relation (see Figure 2 in Lusso 2020. In the mid and long term, surveys from Euclid (planned launch in 2023) and LSST (first light in July 2023, with the start of operations beginning of 2024) in the optical and UV, and Athena (currently in phase B1 study) in the X-rays, will also provide statistical samples of millions of quasars. With these datasets, it will be possible to obtain constraints on the observed deviations from the standard cosmological model, which will rival and complement those available from the other cosmological probes.

3.3 Gamma-ray bursts

Observations of SNe Ia obtained at the end of 1990s by two different teams (Perlmutter et al. 1998, 1999; Riess et al. 1998; Schmidt et al. 1998) found that starting from \(z\sim 0.5\) SNe Ia appeared dimmer by \(\sim \)0.25 mag. Given the nature of standard candles of SNe Ia (Phillips 1993) this result suggested that we are living in an Universe characterized by an accelerated expansion. In the following decades, other cosmological probes (e.g. CMB and BAO) provided further support to the existence of an unknown form of “dark energy” propelling the acceleration. By combining SNe data with the constraints from CMB measurements, several groups (e.g., Riess et al. 2004) found \(w_0 \sim -1\) and \(w_a \sim 0\). This result might identify the dark energy as originated from a genuine cosmological constant. In subsequent years, new SN surveys have shown that the Hubble diagram does not exploit the growing number of SN discoveries (Fig. 11) in terms of the accuracy of cosmological parameter measurements. This is likely due to the fact that SN observations are affected by numerous sources of systematic effects, such as different classes of progenitor systems and different explosion mechanisms, anomalous reddening law, contamination of the Hubble diagram by non standard SNe Ia and/or bright SNe Ibc. Taking advantage from the existence of this “systematic wall” some authors (e.g., Nielsen et al. 2016) have questioned, on statistical basis, the evidence for cosmic acceleration from SNe Ia. In fact, SNe Ia detected in the Supernova Legacy Survey (e.g., Astier et al. 2006; Guy et al. 2010) confirm the acceleration, although their measurements suggest different values for the cosmological parameters. The cosmological interpretation of SN Ia peaks decreased by 0.25 mag is based on the lack of evolutionary effects of their progenitors.

Gamma-ray bursts (GRBs) are the brightest cosmological sources in the Universe, and detectable up to the first hundred millions years after the Big-Bang thanks to the enormous energy that they release in the X/gamma-rays (the isotropic radiated energy, \(E_{\mathrm{iso}}\), can reach \( \simeq 10^{54}\) erg released typically in a few tens or hundreds of seconds). Their redshift distribution extends from 0.0085 (GRB 980425) up to \(\sim 9.4\) (GRB 090429B). In addition, they emit most of their radiation in the hard X-rays, so that they do not suffer for dust absorption. These phenomena are not standard candles, given that their total radiated energies or peak luminosities span several orders of magnitude, but the discovery and intensive study of empirical correlations between distance-dependent quantities and rest-frame observables has opened to us the possibility of standardizing these sources as cosmological probes, and extend the Hubble diagram in a previously unexplored range of redshift. The use of GRBs for cosmology through the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) (“Amati”) relation and other correlations involving radiated energy or luminosity of the prompt and/or afterglow emission has been subject of tough investigation by many research groups worldwide since more than one decade ago. The main power of GRBs as cosmological probes lies in their huge brightness in the X- and soft gamma-rays domain, which makes them detectable up to redshift 10 or more, combined to the huge follow-up efforts in the optical/NIR follow-up, allowing the redshift measurement. This allows to extend the Hubble diagram substantially beyond the redshift range of Type-Ia SNe and even BAO, in a regime where only high-z AGNs can partly compete. As demonstrated by analysis and simulations, this redshift extension is fundamental for testing DE models and, more in general, cosmological scenarios alternative to the standard \(\Lambda \)CDM. In the following, we describe three methods, based on gamma-ray bursts, to measure \(\varOmega _{\mathrm{m}}\)independently of SNe Ia, and to constrain the dark energy EoS aimed at describing the expansion history of the Universe.

Fig. 11
figure 11

Image reproduced with permission from Izzo et al. (2015), copyright by ESO

Residual distance modulus for different values of the density cosmological parameters up to \(z = 2.0\). We consider the best fit to be the standard \(\Lambda \)CDM model, where \(\varOmega _{\mathrm{m}}\)=0.27, \(\varOmega _{\varLambda }\)=0.73, and \({\ddot{0}}=71\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\)(black line). Union2 SNe Ia data residuals are shown in grey. The large spread (more than 1 mag) shown by \(\mu \) at \(z = 1.5\) and at \(z = 0.145\) (the two vertical dashed lines) where the scatter is almost 0.2 mag is clearly evident.

3.3.1 Basic idea and equations

GRBs are very promising probes for investigating the history and evolution of the Universe, understanding the nature and evolution of dark energy, and testing alternative cosmological models. For recent general reviews on the GRB phenomenon, we refer to Mészáros (2002); Zhang (2014); Kumar and Zhang (2015); Pe’er (2015). Although GRBs are not standard candles, as their peak luminosity and radiated energy span several orders of magnitude, some empirical correlations between distance-dependent quantities and rest-frame observables have opened up the possibility of using GRBs as distance indicators (see, for instance, Amati et al. 2008; Amati and Della Valle 2013; Lin et al. 2015, 2016a, b; Wei and Wu 2017; Si et al. 2018; Fana Dirirsa et al. 2019; Khadka and Ratra 2020a; Zhao et al. 2020a; Cao et al. 2021; Khadka et al. 2021; Cao and Ratra 2022). Actually, from a phenomenological point of view GRBs show a prompt emission, consisting of \(\gamma \)-rays and hard X-rays high-energy photons, and an afterglow emission, which is a long-lasting multi-wavelength emission from X-ray, to infrared and sometimes also radio, which follows the prompt emission and shows a typical power-law decay (e.g., Gehrels et al. 2009). In addition, GRBs can be generally classified into short (with duration \(T_{90}<2\,{\mathrm{s}}\), SGRBs) and long (with \(T_{90}>2\,\mathrm{s}\), LGRBs Kouveliotou et al. 1993), where \(T_{90}\) is the time interval in which \( 90\%\) of the GRB burst fluence is accumulated, starting from the time at which \(5\%\) of the total fluence was detected. The classification is very important for standardizing GRBs since most of these correlations hold for long GRBs only. In Table 4 we list some of the correlations widely investigated in the literature, based on both prompt and afterglow emission properties (see references above for the definitions of the parameters mentioned in the Table).

Throughout this section, we mostly focus on the \(E_\mathrm{p,i}\)\(E_{\mathrm{iso}} \) correlation for measuring cosmological parameters and investigating dark energy properties and evolution. In addition, as an example of the potentiality of combining prompt and afterglow emission properties, we will also discuss the perspectives for cosmology of the so called Combo-relation (Izzo et al. 2015), obtained by combining the \(E_{\gamma ,iso}\)\(E_{X, \mathrm iso}\)\(E_{p,i}\), the \(E_{\mathrm{p,i}}\)\(E_\gamma \) correlations, and the analytical formulation of the X-ray afterglow component given in (Ruffini et al. 2014).

Table 4 List of the most investigated GRB correlations
3.3.1.1 The \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}} \) (“Amati”) correlation

GRBs show non thermal spectra which can be empirically modeled with the Band function (Band et al. 1993), which is a smoothly broken power law with parameters \(\alpha \), the low-energy spectral index, \(\beta \), the high energy spectral index and the roll-over energy \(E_0\):

$$N(E)=\left\{ \begin{array}{ll} A \left(\frac{E}{100\,\mathrm{keV}}\right)^{\alpha} \exp{\left(-{\frac{E}{E_0}}\right)} & \left(\alpha-\beta\right)E_0\geq E \,,\\ A \left(\frac{\left(\alpha-\beta\right)E}{100\,\mathrm{keV}}\right)^{\alpha-\beta} \exp{\left(\alpha-\beta\right)\left(\frac{E}{100\,\mathrm{keV}}\right)^{\beta}} & \left(\alpha-\beta\right)E_0\leq E \,.\\ \end{array} \right. $$

Given that \(\beta \) is almost always found to be \(<-2\), GRB \(\nu \)F\(\nu \) spectra show a peak corresponding to a value of the photon energy \(E_{\mathrm{p}} = E_0 (2 + \alpha )\) (Fig. 12), ranging typically from \(\sim \)5–10 up to 1000–5000 keV (see, e.g., Zhang 2014). For those GRBs with well measured prompt emission spectrum and redshift, it is possible to evaluate the “intrinsic” (i.e., in the cosmological rest-frame) peak energy, \(E_{\mathrm{p,i}} = E_{\mathrm{p}} (1 + z)\) and the isotropic-equivalent radiated energy, defined as:

$$\begin{aligned} E_{\mathrm{iso}}= 4 \pi D_{\mathrm{L}}^2(z,{\mathrm \theta }) \left( 1+z\right) ^{-1}\int ^{10^4/(1+z)}_{1/(1+z)} E N(E) dE , \end{aligned}$$
(33)

or equivalently

$$\begin{aligned} E_{\mathrm{iso}}= 4 \pi D_{\mathrm{L}}^2(z,{\mathrm \theta }) \left( 1+z\right) ^{-1}S_{\mathrm{bolo}}, \end{aligned}$$
(34)

being \(S_{\mathrm{bolo}}\) the bolometric fluence.

Fig. 12
figure 12

A typical \(\nu \)F\(_\nu \) spectrum of a GRB

The quantity \(E_{\mathrm{iso}} \) spans several orders of magnitude, typically ranging from \(10^{50}\) to \(10^{54}\) erg. It is important to note that, while there are observational and theoretical evidences suggesting that the GRB emission is collimated within a few tens of degrees or less, we are still lacking a firm and reliable method for estimating the jet opening angle of single GRBs. This is why, conservatively, \(E_{\mathrm{iso}}\), or the isotropic-equivalent peak luminosity, \(L_{\mathrm{iso}}\), are still used as indicators of the GRB “brightness”.

The existence of a strong correlation between \(E_{\mathrm{p,i}} \) and \(E_{\mathrm{iso}}\) of long GRBs was inferred more than 20 years ago based on the systematic analysis of GRB spectra and fluences (Lloyd et al. 2000), and was actually discovered in 2002 (Amati et al. 2002) based on the first sample of BepppoSAX GRBs with measured redshift. The \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) (“Amati”) correlation was then confirmed by later measurements by several different GRB detectors and can be modeled as a linear relation between the logarithms of the two quantities:

$$\begin{aligned} \log \left[ \frac{E_{\mathrm {p, i}}}{\mathrm {keV}}\right] =b+a \log \left[ \frac{E_{\mathrm{iso}}}{10^{52}\,\mathrm {erg}}\right] , \end{aligned}$$
(35)

The \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation (see Fig. 13) is characterized by an intrinsic additional extra-Poissonian scatter, \(\sigma _{int}\), around the best-fit line that has to be taken into account and determined together with (ab) by the fitting procedure. A commonly used method is the maximization of the likelihood implemented by Reichart et al. (2001). According to this method the data \(\left( x_i, y_i \right) \) are correlated by a linear function \(y = a x + b \) with the addition of an extrinsic scatter \(\sigma _{int}\), and the best fit value of the parameters \(\left( a, b and \sigma _{int} \right) \) are obtained by minimizing the -log(likelihood) function, in which the uncertainties,\( \sigma _{x,i}\) and \(\sigma _{y,i}\) on both \(\left( x_i, y_i \right) \) are taken into account. The general log(likelihood) is:

$$\begin{aligned} \displaystyle \log \; {\mathcal {L}}_{\mathrm{Reichart}}(a, b, \sigma _{int})= & {} -\frac{1}{2}\,\sum _{i=1}^N \left[ \log {\left( \frac{1+a^2}{2 \pi (\sigma _y^2 + a^2\,\sigma _{x}^2 + \sigma _{y,i}^2 + a^2\,\sigma _{x,i}^2)}\right) }\right. \nonumber \\&\left. -\frac{(y_i - a\,x_i - b)^2}{\sigma _y^2 + a^2\,\sigma _{x}^2 + \sigma _{y,i}^2 + a^2\,\sigma _{x,i}^2}\right] , \end{aligned}$$
(36)

where \(x=\log (E_{\mathrm{p,i}})\) or \(x=\log (E_{\mathrm{iso}})\) (depending on whether one wants to investigate the correlation in the form \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) or \(E_{\mathrm{iso}}\)\(E_{\mathrm{p,i}}\)), \(\sigma _x\) = 0 and \(\sigma _y =\sigma _{int}\). Here the sum is over the N objects in the sample. We note that this maximization can actually be performed in the two-parameter space \((a, \sigma _{int})\) only, since b may be calculated analytically by solving the equation \(\displaystyle {\frac{\partial }{\partial b}L(a, b, \sigma _{int})=0}\):

$$\begin{aligned} b = \left[ \sum {\frac{y_i - a x_i}{\sigma _{int}^2 + \sigma _{y_i}^2 + a^2 \sigma _{x_i}^2}} \right] \left[ \sum {\frac{1}{\sigma _{int}^2 + \sigma _{{y_i}}^2 + a^2 \sigma _{x_i}^2}} \right] ^{-1} . \end{aligned}$$
(37)

The values of the normalization, slope and intrinsic dispersion of the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation in the logarithmic form expressed above are found to be \(\sim \)2, \(\sim \)0.5 and \(\sim \)0.2 dex, respectively, with slight variations depending on the sub-sample considered (e.g., Amati et al. 2002; Ghirlanda et al. 2004; Amati 2006; Amati et al. 2008; Amati and Della Valle 2013; Demianski et al. 2017).

Fig. 13
figure 13

The \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation for long GRBs based on the updates sample of 208 events used for this review. Blue points indicate GRBs detected and localized by the Swift satellite

The existence and properties of this correlation have been widely investigated by many research groups in the last twenty years, because of its key role for the understanding of the GRB prompt emission physics, jet structure and geometry and viewing angle effects, as well as for the identification and nature of different sub-classes of these events, such as: short vs. long, X-Ray Flashes and under-luminous GRBs, ultra-long GRBs, etc. (see, e.g., Zhang and Mészáros 2002; Amati 2006; Zhang 2014; Kumar and Zhang 2015; Pe’er 2015).

3.3.1.2 Independent measurements of cosmological parameters through the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation of GRBs

The “Amati” relation becomes a distance indicator through the measurement of \(E_{\mathrm{iso}}\) that is derived from the observed fluence, which in turns depends on the geometry and expansion rate of our Universe through the so-called luminosity distance. Unlike historical “standardized” candles as SNe Ia that can be calibrated via Cepheids (e.g., Riess et al. 2021), we don’t have a statistically significant sample of GRBs at low redshift allowing us to determine the parameters of the correlation in a cosmology-independent way. Which means that the existence and properties of the correlation were found by assuming a fiducial cosmological model. Thus, if we wish to use it for measuring cosmological parameters we are obviously affected by a circularity problem. The most straight way to get rid of it is to simultaneously constrain the calibration parameters \((a, b, \sigma _{int})\) and the set of cosmological parameters by considering a chosen likelihood function. In practice, this task consists in determining the multi-dimensional probability distribution function (PDF) of the parameters \(\{ a,b,\sigma _{\mathrm{int}}, {{\mathbf {p}}} \} \), where \({{\mathbf {p}}}\) is the N-dimensional vector of the cosmological parameters.

This is the method adopted by Amati et al. (2008) in the first work aimed at verifying if the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation could actually be used for cosmology. By assuming a flat \(\Lambda \)CDM cosmology it was found that, actually, the goodness of fit of the correlation varied as a function of \(\varOmega _{\mathrm{m}}\) following a nice parabolic shape with a minimum at about 0.2–0.3, as shown in Fig. 14. The analysis performed on larger samples in the following years made this result more and more reliable and accurate (see, e.g., Amati and Della Valle 2013), showing that GRBs provide—in the framework of \(\Lambda \)CDM cosmology—a firm and independent evidence for the case of an accelerating Universe with \(\varOmega _{\mathrm{m}}\) \(\sim \)0.3. This result is further confirmed by releasing the flat universe assumption, i.e., by letting both \(\varOmega _{\mathrm{m}}\) and \(\varOmega _{\varLambda }\) free to vary (see next sections).

Fig. 14
figure 14

Goodness of fit (in terms of \(-\log \)(likelihood) of the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation of long GRBs (based on the updates sample of 208 events used for this review) as a function of the value of \(\varOmega _{\mathrm{m}}\) assumed in the computation of the \(E_{\mathrm{iso}}\) values by assuming a flat \(\Lambda \)CDM cosmology

3.3.1.3 Calibrating the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation with SNe Ia and other probes

In addition to the clean and independent approach described above, different and alternative techniques for getting rid of the circularity issue when using the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) (\(L_{\mathrm{iso}}\)) correlation for cosmology have been developed and presented in literature (see for instance Montiel et al. 2021; Amati et al. 2019; Muccino et al. 2021; Izzo et al. 2015; Wang et al. 2015; Liang et al. 2008; Kodama et al. 2008; Wei 2010; Lin et al. 2015).

As anticipated, most of these methods use SNe Ia for calibrating the correlation for those GRBs at redshift lower than about 1.5 using the luminosity distances derived from SNe Ia (see Kodama et al. 2008; Liang et al. 2008; Demianski et al. 2017). It is worth pointing out that the use of GRBs as distance indicators has an advantage over SNe Ia: they can explore a broader range of redshifts, extending to \(z \sim 10\) instead of \(z \sim 2\). However, if we calibrate the GRBs with SNe Ia, the GRBs are no longer independent distance indicators. In the “distance scale” jargon, the GRBs have become “tertiary” indicators because in turn the SNe Ia are calibrated with the Cepheids. This is a different approach from that described in Sect. 3.3.1.2, where GRBs remain independent cosmological probes.

The typical regression procedure adopted in these approach can be schematically sketched as follows:

  1. 1.

    set the redshift range where the modulus of distance, \(\mu (z)\), has to be reconstructed;

  2. 2.

    sort the SNe Ia sample by increasing value of \(|z - z_i|\) and select the first \(n = \alpha N_{\mathrm{SNe Ia}}\), where \(\alpha \) is a user-selected value and \(N_{\mathrm{SNe Ia}}\) the total number of SNe Ia;

  3. 3.

    apply the weight function

    $$\begin{aligned} W(u) = \left\{ \begin{array}{ll} (1 - |u|^2)^2 &{} |u| \le 1 \\ ~ &{} ~ \\ 0 &{} |u| \ge 1 \end{array} , \right. \end{aligned}$$
    (38)

    where \(u = |z - z_i|/\varDelta \) and \(\varDelta \) is the highest value of the \(|z -z_i|\) over the previously selected subset;

  4. 4.

    fit a first-order polynomial to the data previously selected and weighted, and use the zeroth-order term as the best-fit value of the modulus of distance \(\mu (z)\);

  5. 5.

    evaluate the error \(\sigma _{\mu }\) as the root mean square of the weighted residuals with respect to the best-fit value.

Therefore, we use the reconstructed \(\mu (z)\) to obtain the luminosity distance, and fit the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation relation (i.e. determine the parameters \((a, b, \sigma _{int})\)) as expressed by Eq. (35), without assuming any particular cosmological model. We actually considered only GRBs with \(z \le 1.414\) to cover the same redshift range spanned by the SNe Ia data.

After that, the values (ab) have been estimated through the calibration, and if we further assume that the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation do not evolve with redshift, we obtain the energy \(E_{\mathrm{iso}}\) of each burst at high redshift through Eq. (35). We finally obtain the luminosity distance, \(D_{\mathrm{L}}(z)\), and construct the GRB Hubble diagram:

$$\begin{aligned} D_{\mathrm{L}}(z) = \left( \frac{E_{\mathrm{iso}}(1 + z)}{4 \pi S_{\mathrm{bolo}}}\right) ^{1/2} . \end{aligned}$$
(39)

The uncertainty of \(D_L(z)\) was estimated through the propagation of the measurement errors of the pertinent quantities. It turns out that

$$\begin{aligned} 5 \log {D_L(z)} = \left( \frac{5}{2}\right) \left\{ b+a\log \left[ \frac{E_{\mathrm {p,i}} }{300\;\mathrm {keV}}\right] -\log \left( 4 \pi S_{\mathrm{bolo}}\right) +\mu _0 \right\} , \end{aligned}$$
(40)

where \(\mu _0\) is a normalization parameter, due to the fact that the distance moduli of GRBs are not absolute; thus, this cross-calibration parameter is needed to match the GRB Hubble diagram and the one of SNe Ia (see for instance Demianski et al. 2021). In Fig. 15 we plot the GRB Hubble diagram obtained for a new sample of 212 objects. It is worth noting that the calibration technique of the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation, and its impact on reliability of the GRBs as distance indicators, deserves any possible attention, and for this reason in our analysis we tested the results also with different calibration techniques based on an approximated luminosity distance able to reproduce the exact function in different models (Demianski et al. 2021). Moreover, other approaches have been presented in literature, which exploit different interpolation of the luminosity distance and employ different data samples at intermediate redshifts (including BAO datasets, see for instance Amati et al. 2019; Muccino et al. 2021). As anticipated, it turns out that the price of applying all these calibration techniques is that GRBs are a cosmological probe not fully independent. On the other hand, if we simultaneously constrain the calibration parameters \((a, b, \sigma _{int})\) and the set of cosmological parameters, it turns out that the parameters of the correlations depend on the cosmological model and are coupled to the cosmological parameters. When future GRB missions will substantially increase the number of GRBs available to construct the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation, they may shed new light on the properties of this important correlation.

Fig. 15
figure 15

GRB Hubble diagram build up by calibrating the E\( _{p,i}\)–E\(_{iso}\) correlation for the updated sample of 208 GRBs used for this review

3.3.2 Measurements and sample selection

The use of GRBs for measuring cosmological parameters through the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation, or other correlations involving the spectral peak energy \(E_{\mathrm{p,i}}\) (see, e.g., Table 4), requires (i) the measurement of the redshift, through either absorption spectroscopy of the optical/NIR afterglow spectrum or emission line spectroscopy of the host galaxy, and (ii) the measurement of the prompt emission spectrum over a broad energy band and for most of the duration of the event, to allow an accurate characterization of the spectral continuum curvature. These combined requirements reduce the size of the sample from the several thousands of GRBs detected since the 1970s to less than three hundreds nowadays. For instance, while GRB broad band spectroscopy from 10 to 20 keV up to a few MeVs was already available in the 1980s and 1990s, thanks, e.g., to CGRO/BATSE and Konus-WIND GRB detectors, it was possible to discover GRB afterglow emission and hence get the first redshift measurements only in the late ’90s. On the other hand, the Swift mission, while providing great and fast localization of GRB prompt and afterglow emissions, thus substantially improving the efficiency in the follow-up process leading to redshift determination, is limited by the narrow energy band (15–150 keV) of its GRB detector.

The samples used up to now for this line of investigation (Amati et al. 2008; Amati and Della Valle 2013; Demianski et al. 2017; Amati et al. 2019; Demianski et al. 2021, e.g., ) include GRBs with measured redshift for which detection, localization, and spectral measurements come from the following main GRB missions: BATSE, BeppoSAX, HETE-2, Konus-WIND, Fermi/GBM, Swift/BAT. For this work, we consider a slightly updated sample wit respect to that used by Amati et al. (2019), Demianski et al. (2021), comprising a total number of 208 GRBs. This update is based on events for which redshift and spectral measurements became available in 2017 and 2018. A substantially updated sample including data form very recent Konus-WIND, Konus-WIND + Swift/BAT and Fermi/GBM spectral catalogs will be presented and analyzed in Amati et al. (in prep.), as well as in the next version of this review.

As discussed, e.g., in Demianski et al. (2017) and Demianski et al. (2021), the criteria behind selecting the measurements from a particular mission are based on objective conditions aimed at minimizing selection and systematic effects (see also Sect. 3.3.3):

  • given the broad energy band and good calibration, spectral measurements by Konus-WIND and Fermi/GBM are preferably chosen whenever available. The SWIFT BAT observations were chosen when no other preferred mission (Konus-WIND, Fermi/GBM) was able to provide information. They were considered only for GRBs with the observed value of \(E_{\mathrm{p,i}}\) within the energy band of the instrument.

  • in order to minimize biases due to event spectral evolution,and hence possible systematics on \(E_{\mathrm{p,i}}\), only GRBs for which the exposure time was at least 2/3 of the whole event duration are selected (this condition is satisfied by about 80% of the publicly available spectral catalogs);

  • those GRBs usually classified as “under-luminous events”, for which there is significant possibility that their radiated energy, luminosity and spectral parameters are strongly biased by off-axis viewing effects or very long-to-soft spectral evolution (see, e.g., Amati 2006; Martone et al. 2017), as well as being a different class of events with respect to classical cosmological long GRBs, are not included in the sample.

In the estimates of \(E_{\mathrm{p,i}}\) and \(E_{\mathrm{iso}}\), the values and uncertainties of all the observations are taken into account. When the observations were to be included in the data sample, it has been checked that the uncertainty on any value is not below 10 per cent in order to account for the instrumental capabilities. When the error was lower, it has been assumed to be 10%, which is a reliable level of accuracy in the calibration of these kind of detectors. When available, the Band model (Band et al. 1993) was considered since the cut-off power-law tends overestimate the value of \(E_{\mathrm{p,i}}\).

3.3.3 Systematic effects

Given their relevance for shedding light on the emission processes, on the jet properties (e.g., structure, degree of magnetization), on the identification and understanding of different sub-classes of GRBs (long, short, under-luminous, ultra-long, GRB-SN connection), and as well for their great potential for GRB cosmology, the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) and other main correlations involving prompt and afterglow emission properties have been subject of many tough investigations aimed at identifying, understanding, and overcoming, possible selection effects and systematic (see, e.g., Dainotti and Amati 2018 for an exhaustive review).

3.3.3.1 Reliability of the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation

Different GRB detectors are characterized by different thresholds and spectroscopic sensitivity, therefore they can spread relevant selection effects and biases in the observed \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation. In the past, there were claims that a high fraction (70–90%) of BATSE GRBs without redshift would be inconsistent with the correlation for any redshift (Band and Preece 2005; Nakar and Piran 2005). However, this “peculiar” conclusion was refuted by other authors (Ghirlanda et al. 2005; Bosnjak et al. 2008; Ghirlanda et al. 2008; Nava et al. 2011) who show that, in fact, most BATSE GRBs with unknown redshift were well consistent with the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation. We also note that the inconsistency of such a high percentage of GRBs of unknown redshift would have implied that most GRBs with known redshift should also be inconsistent with the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) relation, and this fact was never observed. Moreover, Amati et al. (2009) showed that the normalization of the correlation varies only marginally using GRBs measured by individual instruments with different sensitivities and energy bands, while Ghirlanda et al. (2010) show that the parameters of the correlations (m and q) are independent of redshift.

Furthermore, the Swift satellite, thanks to its capability of providing quick and accurate localization of GRBs, thus reducing the selection effects in the observational chain leading to the estimate of GRB redshift, has further confirmed the reliability of the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation (Amati et al. 2009; Ghirlanda et al. 2010; Sakamoto et al. 2011).

Finally, based on time-resolved analysis of BATSE, BeppoSAX, and Fermi GRBs, it was found that the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation also holds within each single GRB with normalization and slope consistent with those obtained with time-averaged spectra and energetic/luminosity (Ghirlanda et al. 2010; Lu et al. 2012; Frontera et al. 2012; Basak and Rao 2013). This ultimate test confirms the physical origin of the correlation, also providing clues to its explanation.

3.3.3.2 Possible evolutionary effects of the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation

Possible evolutionary effects that may affect the correlation and have been investigated by several authors. By dividing the GRB sample into subsets with different redshift ranges (e.g., \(0.1< z < 1\), \(1< z < 2\), etc.), it is found that slope, normalization, and dispersion of the correlation do not change significantly. This result also implies that Malmquist-like selection effects are negligible.

In any case, to take into account possible evolutionary effects due, for instance, to the effects of local inhomogeneities distribution along the GRB line of sight (see, for instance, Shirokov et al. 2020; Demianski et al. 2021), it is also possible to consider a sort of extended \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation, introducing terms representing the redshift evolution, in the form of power-law functions: \(g_{iso}(z)=\left( 1+z\right) ^{k_{iso}}\) and \(g_{p}(z)=\left( 1+z\right) ^{k_{p}}\), so that \(E_{\mathrm{iso}}^{'} =\displaystyle \frac{E_{\mathrm{iso}}}{g_{iso}(z)}\) and \(E_{\mathrm{p,i}}^{'} =\displaystyle \frac{E_{\mathrm{p,i}}}{g_{p}(z)}\) are the new fitting quantities (see also Shirokov et al. 2020; Demianski et al. 2021). In this approach, we consider a correlation with three parameters a, b, and \(k_{iso} - ak_{p}\):

$$\begin{aligned} \log \left[ \frac{E_{\mathrm{iso}}}{1\,\mathrm {erg}}\right] = b+a \log \left[ \frac{E_{\mathrm {p,i}} }{300\,\mathrm {keV}} \right] +\left( k_{iso} - a k_{p}\right) \log \left( 1+z\right) . \end{aligned}$$
(41)

The redshift dependence term in Eq. (41) can be expressed by a single average coefficient \(\gamma \):

$$\begin{aligned} \log \left[ \frac{E_{\mathrm{iso}}}{1\;\mathrm {erg}}\right] = b+a \log \left[ \frac{E_{\mathrm {p,i}} }{300\;\mathrm {keV}} \right] + \gamma \log \left( 1+z\right) . \end{aligned}$$
(42)

To calibrate this 3D relation we have to fit the coefficients a, b, \(\gamma \), and the intrinsic scatter \(\sigma _{\mathrm{int}}\). It turns out that low values of \(\gamma \) would indicate negligible evolutionary effects. Therefore, it is possible to consider a 3D Reichart general log(likelihood), which is:

$$\begin{aligned} \log \; {\mathcal {L}}^{3D}_{\mathrm{Reichart}}(a, \gamma , b, \sigma _{int})= & {} - \frac{1}{2} \sum {\log {\Big (\frac{(1+a^2)}{2\pi (\sigma _{int}^2 + \sigma _{y_i}^2 + a^2 \sigma _{x_i}^2)}\Big )}}\,\nonumber \\&-\frac{1}{2} \sum {\frac{(y_i - a x_i -\gamma z_i-b)^2}{\sigma _{int}^2 + \sigma _{x_i}^2 + a^2 \sigma _{x_i}^2}} . \end{aligned}$$
(43)

This likelihood can be maximized with respect to a and \(\gamma \), since b can be evaluated analytically by solving the equation:

$$\begin{aligned} {\frac{\partial }{\partial b}L^{3D}_{\mathrm{Reichart}}(a, k_{\mathrm{iso}}, \alpha , b, \sigma _{int})=0 .} \end{aligned}$$
(44)

Actually, it turns out that:

$$\begin{aligned} b = \left[ \sum {\frac{y_i - a x_i-\gamma z_i}{\sigma _{int}^2 + \sigma _{y_i}^2 + a^2 \sigma _{x_i}^2}} \right] \left[ \sum {\frac{1}{\sigma _{\mathrm{int}}^2 + \sigma _{{y_i}}^2 + a^2 \sigma _{x_i}^2}} \right] ^{-1} \;. \end{aligned}$$
(45)

3.3.4 Main results and forecasts

In this section, we show and discuss the current and perspective potentiality of the three methods described above for using GRBs as probes of the expansion rate and geometry of the Universe. The main results and forecasts reported are based on the partially updated sample of 208 GRBs described above and a sample of 208 real + 292 simulated GRBs which may be expected from future dedicated space missions, as described below (giving a sample of 500 GRBs in total), respectively. The latter sample was produced following the procedure and assumptions detailed in Amati and Della Valle (2013).

3.3.4.1 GRBs as independent probes

In Table 5, we show the 68% confidence level intervals for \(\varOmega _{\mathrm{m}}\) and \(w_0\) in a flat FLRW universe derived with the 70 GRBs of Amati et al. (2008), the partially updated sample of 208 GRBs and the partially simulated sample of 500 GRBs These values were obtained with the same approach as Amati et al. (2008), but using the likelihood function proposed by (Reichart et al. 2001), which has the advantage of not requiring the arbitrary choice of an independent variable among \(E_{\mathrm{p,i}}\) and \(E_{\mathrm{iso}}\). Interesting enough, we note that, after increasing the number of GRBs from 70 to 156, the accuracy of the estimate of \(\varOmega _{\mathrm{m}}\) improves by a factor of \(\sim \sqrt{N_2/N_1}\). The accuracy of these measurements is still lower than that obtained with supernova data, but promising in view of the increasing number of GRBs with measured redshift and spectra (see also Fig. 14, Fig. 16, and Sect. 3.3.4.2).

Table 5 Comparison of the 68% confidence intervals on \(\varOmega _{\mathrm{m}}\) and \(w_0\) (\(\varOmega _{\mathrm{m}}\)=0.3, \(w_a\)=0.5) for a flat FLRW universe obtained with the sample of 70 GRBs Amati et al. (2008), the updated sample of 208 GRBs considered in this work and simulated sample of 500 GRBs (see text)

In the last 3 lines of Table 5, we report the estimates of \(\varOmega _{\mathrm{m}}\) and \(w_0\) derived from the present and expected future samples by assuming that the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation is calibrated with a 10% accuracy by using, e.g., the luminosity distances provided by SNe Ia, GRBs self-calibration, or the other methods shortly described below. The perspectives of this method for improving estimates of \(\varOmega _{\mathrm{m}}\)and the investigation of the properties of dark energy, combined with the expected increase of the number of GRBs in the sample, are shown in Fig. 16. In particular, as an example, we are showing the current and expected accuracy on \(w_0\) in case of an evolving dark energy with \(w_a\) \(\sim \)0.5.

It is important to note that, as the number of GRBs in each z-bin increases, also the feasibility and accuracy of the self-calibration of the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation will improve. Thus, the expected results shown in the last part of Table 5 and in Fig. 16 may be obtained even without the need of calibrating GRBs against other cosmological probes.

The results presented in Table 5 show a sharp increase of the accuracy of \(\varOmega _{\mathrm{m}}\) as a consequence of the increasing number of GRBs in the \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) plane. Currently, the main contribution to enlarge the GRB sample comes from joint detections by Swift, Fermi/GBM or Konus-WIND. Hopefully, these missions will continue to operate in the next years, then providing us with an “actual” rate of \(\sim \)15–20 GRB/year. However, a real breakthrough in this field should come from next generation missions capable of promptly pinpointing the GRB localization and of carrying out broad-band spectroscopy. We build our hopes on the Chinese-French mission SVOM (Bertrand et al. 2019), for the very near future, and on mission concepts like THESEUS (Amati et al. 2018) for the next decade.

In Fig. 16 we show the confidence level contours in the \(\varOmega _{\mathrm{m}}\)-\(\varOmega _{\mathrm{de}}\) and \(\varOmega _{\mathrm{m}}\)-\(w_0\) planes by using the real data, and by adding to them the 292 simulated GRBs (resulting a sample of 500 GRBs in total, respectively). The simulated dataset was obtained via Monte Carlo techniques by taking into account the slope, normalization and dispersion of the observed \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation, the observed redshift distribution of GRBs and the distribution of the uncertainties in the measured values of \(E_{\mathrm{p,i}}\) and \(E_{\mathrm{iso}}\). These simulations indicate that with a sample of 500 GRBs (achievable within a few years from now) the accuracy in measuring \(\varOmega _{\mathrm{m}}\) will be comparable to that currently provided by SNe data.

Fig. 16
figure 16

Left: 68% confidence level contour in the \(\varOmega _{\mathrm{m}}\)-\(\varOmega _{\mathrm{de}}\) plane obtained by releasing the flat universe assumption with the sample of 208 GRBs considered in this work (red contour) compared to those obtained with a sub-sample of 120 GRBs and what expected in the next years with the increasing of GRBs in the sample (500 GRBs, blue). Right: 68% confidence level contour in the \(w_0\)-\(\varOmega _{\mathrm{m}}\) plane for a flat FLRW universe with \(\varOmega _{\mathrm{m}}\)=0.3 obtained for the same samples as for the left panel. As for the results and simulations reported in Table 5, for the dark energy equation of state \(w_a=0.5\) was assumed

3.3.4.2 Use of GRBs calibrated against SNe Ia

To test different cosmological models, we use a Bayesian approach based on the MCMC method. In order to set the starting points for our chains, we first performed a preliminary and standard fitting procedure to maximize the likelihood function \({{\mathcal {L}}}(\mathbf{p})\). We sample the space of parameters by running five parallel chains and use the Gelman–Rubin diagnostic approach to test the convergence. As a test probe, it uses the reduction factor R, which is the square root of the ratio of the variance between-chains and the variance within-chain. A large R indicates that the between-chains variance is substantially greater than the within-chain variance, so that a longer simulation is needed. We require that R converges to 1 for each parameter. We set \(R - 1\) of order 0.05, which is more restrictive than the often used and recommended value \(R - 1 < 0.1\) for standard cosmological investigations. After that, we ran multiple chains in parallel, we discarded the first 30% of the point iterations, and finally extracted the constrains on cosmological parameters by co-adding the thinned chains. The histograms of the parameters from the merged chains were then used to infer median values and confidence ranges. As a simple example, let us consider the CPL parameterization of the dark energy EoS described in Eq. (9). In Fig. 17 we plot the 2D confidence regions in the \(w_0-w_a\) plane for the CPL model, obtained from real (upper panel) and a simulated (bottom panel) GRBs Hubble diagram.

Fig. 17
figure 17

2D confidence regions in the \(w_0-w_a\) plane for the CPL model, obtained from a simulated (right panel) and real (left panel) GRBs Hubble diagram

We join our sample of 208 GRBs to a simulated sample of 792 objects. These simulated data have been obtained by implementing a Monte Carlo approach and taking into account the slope, normalization, dispersion of the observed \(E_{\mathrm{p,i}}\)\(E_{\mathrm{iso}}\) correlation. It is worth noting that the \(\Lambda \)CDM model, which in the CPL parameterization corresponds to \(w_0=-1\)  and \(w_a=0\), is disfavoured with respect to a dynamical model of dark energy.

3.3.4.3 The “Combo” relation: shedding light on the evolution of dark energy

As discussed by Izzo et al. (2015) and Muccino et al. (2021), an important step forward in this line of investigation may be provided by the use of the “Combo” relation, which extends the “Amati” relation through the inclusion of X-ray afterglow observables like the initial luminosity, the rest-frame duration of the shallow phase, and the index of the late power-law decay, combined with an innovative calibration method minimizing the dependence on the systematics possibly affecting SNe Ia. The main novelty provided with the Combo relation consists in the afterglow X-ray light-curve fitting procedure through a piece-wise function, first introduced by Willingale et al. (2007), that is capable to model the very early power-law decay and the following “plateau” emission (Izzo et al. 2015), getting rid of X-ray flaring emission over-imposed to the underlying afterglow behavior (Zaninoni et al. 2014). This procedure, similar to the analysis currently developed for SNe Ia, allows to measure with great accuracy the main observables of the Combo relation: indeed, among the entire sample of Swift long GRBs showing a complete light curve in X-ray, and characterized by a known peak energy of the corresponding prompt emission, no outliers have been found so far (Muccino et al. 2021; Xu et al. 2021; Wang et al. 2021a).

In a preliminary analysis on a sample of 60 GRBs with well measured parameters of both prompt and early X-ray afterglow emission, Izzo et al. (2015) showed that actually the Combo relation could provide a value of \(\varOmega _{\mathrm{m}}\) \( = 0.29_{-0.15}^{+0.23}\). By applying the Combo relation to an updated sample of 174 gamma-ray bursts, Muccino et al. (2021) could obtain tighter bounds on \(\varOmega _{\mathrm{m}}\), and investigate the possible evidence of evolving dark energy parameter w(z). As shown in Fig. 18, the w(z) evolution was studied by binning the GRB Hubble diagram in seven redshift intervals and assuming two priors over the Hubble constant in tension at 4.4\(\sigma \), i.e., \(H_0\)= (67.4\( \pm \) 0.5) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\)and \(H_0\)= (74.03 ± 1.42) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\). It was found that at \(z\le 1.2\) w(z) agrees within 1\(\sigma \) with the standard value \(w=-1\), whereas at larger z the w(z) estimated from GRBs seem to deviate from \(w=-1\) at 2\(\sigma \) and 4\(\sigma \) level, depending on the redshift bins (Fig. 18). These results indicate that dark energy equation of state parameter can be different from the \(\Lambda \)CDM value \(w = -1\) at larger z, although its contribution to the energy budget of the Universe is still negligible,Footnote 10 and also confirm the Combo relation as a powerful tool to investigate cosmological evolution of dark energy.

In view of the increasing size of the GRB database, thanks to incoming future missions, the Combo-relation is a promising tool for measuring \(\varOmega _{\mathrm{m}}\) with an accuracy comparable to that exhibited by SNe Ia, and to investigate a possible evolution of the dark energy up to \(z\sim 10\).

Fig. 18
figure 18

Image reproduced with permission from Muccino et al. (2021), copyright by AAS

The DE EoS reconstructed evolution through the redshift-binned parameterization of w(z) (1 and 2\(\sigma \) from the inner/darker to the outer/lighter) for the selected \(H_0\). The dashed red lines mark the value \(w=-1\) in the flat \(\Lambda \)CDM model. The darker region shows un-physical EoS, i.e., exceeding the stiff matter regime.

3.3.4.4 The promises of correlations involving \(L_X\) and \(T_a\)

As discussed at the beginning of this section, and shown in Table 4, the quest for correlations between GRB properties, aimed at shedding light on the emission processes and at enabling the use of these phenomena for measuring cosmological parameters, involved not only the X/Gamma-ray prompt phase but also the early X-ray afterglow emission. Among these, the most investigated are those involving the duration of the “plateau” phase, \(T_a\) and the luminosity at the end of this phase, usually referred to as \(L_X\). Indeed, as shown and discussed by several authors (e.g. Cardone et al. 2009; Dainotti et al. 2020 and references therein, Hu et al. 2021), there exists a significant correlation between these two quantities, as well as a 3D correlation obtained by including the peak luminosity of the prompt emission, \(L_p\). In particular, it has been found that these correlations become tight for sub-samples selected based on other characteristics, including the nature of the progenitor and multi-wavelength properties. This method, while still affected by the relatively low number of events that can be used for each sub-sample and sample selection effects, seems promising for the purpose of GRB cosmology, especially in view of the wealth of new data on GRB prompt and afterglow emission expected in the near future thanks to the continuing operation of Swift, Fermi, Konus-WIND and other GRB experiments, as well as increased efficiency of follow-up with ground facilities.

3.4 Standard sirens

As first pointed out by Schutz (1986), merging black holes and neutron stars, when observed in gravitational waves (GWs), can serve as powerful cosmological probes (Holz and Hughes 2005; Dalal et al. 2006). These merging binaries emit GW signals that directly encode the luminosity distance to the binary \(D_{\mathrm{L}}\), calibrated by the theory of general relativity. There are three primary approaches to standard siren cosmology: “bright”, “dark”, and “spectral” sirens, as detailed below. LIGO (LIGO Scientific Collaboration et al. 2015), Virgo (Acernese et al. 2015), and KAGRA (Akutsu et al. 2020) are observing a growing catalog of gravitational-wave events, with hundreds to thousands more detections expected in the coming years. Most standard siren measurements to date have relied on the closest standard sirens, with luminosity distances \(D_L \lesssim 400\ \mathrm {Mpc}\), and thus probe the local distance-redshift relation through the Hubble constant \(H_0\). However, these analyses are starting to take advantage of the full gravitational-wave catalog, which extends to \(D_L \gtrsim 5\ \mathrm {Gpc}\) with the current LIGO and Virgo detections, and will extend past 10 Gpc with upgrades to the gravitational-wave detector network over the next few years. Standard sirens are therefore starting to provide measurements of the expansion history out to \(z > 1\) in addition to measuring the Hubble constant. Furthermore, standard sirens are unique probes of modified gravitational wave propagation, a prediction of many cosmological modified gravity and dark energy theories.

3.4.1 Basic idea and equations

When two compact objects, such as black holes and/or neutron stars, orbit each other, the time-varying mass quadrupole sources space-time perturbations, or GWs. At sufficiently tight orbital separations, the energy and angular momentum radiated by GWs shrinks the orbit until the two objects merge, forming a bigger black hole or neutron star. Such sources of GWs are known as “compact binary coalescences”. A passing GW signal stretches and squeezes space-time, creating a relative change in length \(\varDelta L / L\), known as the strain h, or GW amplitude. The typical strain for a GW signal sourced by a compact binary coalescence is \(10^{-21}\). This stretching and squeezing of space-time happens at a certain frequency. The frequency of a GW from a compact binary coalescence is twice the orbital frequency, and it therefore evolves with time as the orbit shrinks. The frequency evolution is driven by a combination of the masses of the two compact objects known as the chirp mass.

For compact binary coalescences, the GW strain as a function of time h(t) scales inversely with the luminosity distance \(D_L\). To first order:

$$\begin{aligned} h(t) = \frac{{\mathcal {M}}_z^{5/3}f(t)^{2/3}}{D_L}F(\mathrm {angles})\cos (\varPhi (t)) , \end{aligned}$$
(46)

where f(t) is the GW frequency, \(F(\mathrm {angles})\) is a function of the source’s position on the sky, inclination and polarization, and \(\varPhi (t)\) is the orbital phase. The “intrinsic loudness” of the GW depends on the redshifted chirp mass \({\mathcal {M}}_z\):

$$\begin{aligned} {\mathcal {M}}_z = (1 + z) \frac{(m_1 m_2)^{3/5}}{(m_1 + m_2)^{1/5}} , \end{aligned}$$
(47)

for binary component masses \(m_1\) and \(m_2\), measured in the source-frame; the factor of \((1+z)\) converts between source-frame and detector-frame quantities. Interestingly, this same combination of masses governs the GW frequency evolution, f(t) and its derivative \({\dot{f}}(t)\):

$$\begin{aligned} {\mathcal {M}}_z = \left( \frac{5}{96}\pi ^{-8/3}\left( f(t) \right) ^{-11/3}{\dot{f}}(t) \right) ^{3/5} , \end{aligned}$$
(48)

so that by measuring both the amplitude and frequency evolution of the GW signal, the luminosity distance can be derived. Note that the amplitude also depends on source geometry encoded in \(F(\mathrm {angles})\). For example, a face-on binary will emit a louder GW signal than an edge-on binary. We also see that while the cosmological redshift z affects the measured GW frequency, this effect is degenerate with the binary’s mass; only redshifted masses appear in the equations describing the amplitude and frequency of the GW signal. In order to do cosmology with GW sources, we must identify external sources of redshift information. Matching GW source distances with their redshifts allows us to probe the cosmological parameters with the usual distance–redshift relation:

$$\begin{aligned} D_{\mathrm{L}}= c (1+z) \int _0^z\frac{dz'}{H_0 E(z')} \end{aligned}$$
(49)

We discuss methods for measuring the redshift of GW sources in the following Sect. 3.4.2.

3.4.2 Sample selection

In order to use GW sources as cosmological indicators and standard sirens, the required ingredients are i) estimating the GW distances, and ii) assigning redshifts to the GW sources.

3.4.2.1 Gravitational-wave distances

Every GW detection of a compact binary coalescence provides a measurement of the source’s luminosity distance. For a given source, the accuracy of the GW luminosity distance measurement is typically \({\mathcal {O}}(10\%)\), depending on the parameters of the source and its signal-to-noise ratio. For some systems, the distance constraints are much tighter because the distance-inclination degeneracy, which stems from the \(F(\mathrm {angles})/D_L\) factor in Eq. 46, can be broken. This occurs for binaries with misaligned spins leading to measurable orbital precession and binaries with asymmetric mass ratios that emit measurable higher-order GW harmonics (Vitale and Chen 2018; Abbott et al. 2020; Borhanian et al. 2020; Calderón Bustillo et al. 2021). Occasionally, electromagnetic observations of the same source (for example, observations of beamed emission from binary neutron star mergers) can be used to independently measure the source inclination, resulting in a tighter GW distance measurement (Mooley et al. 2018; Dobie et al. 2020). However, this introduces layers of astrophysical modeling, and in this case the standard siren is not calibrated by general relativity alone.

3.4.2.2 Assigning redshifts to gravitational-wave sources

The challenge for standard siren cosmology is to identify the redshifts of GW sources. Multi-messenger observations, such as neutron star mergers with electromagnetic counterparts like short gamma-ray bursts or kilonovae, provide the most straightforward measurement (Holz and Hughes 2005; Dalal et al. 2006). An electromagnetic counterpart like a kilonova can typically be pinpointed to a specific galaxy, thereby identifying the host galaxy of the GW merger. The GW signal provides the distance to the host galaxy, while its electromagnetic spectrum provides the redshift. These sources are typically referred to as bright sirens.

Without an electromagnetic counterpart, the GW event is usually too poorly localized on the sky to allow for a unique host galaxy identification (Abbott et al. 2018). Only the loudest, best-localized GW events (1 per several hundred events) are expected to have only a single galaxy in their localization volumes (Chen and Holz 2016). Nevertheless, if a sufficiently complete galaxy catalog is available, one can consider all of the galaxies within the GW localization volume as potential host galaxies, and statistically marginalize over them. This was the original proposal by Schutz (1986), and the method was further developed in a Bayesian context by Del Pozzo (2012), Chen et al. (2018). These sources are often called dark sirens. At the typical distances of GW events (greater than several hundred Mpc), spectroscopic galaxy catalogs are rare, although photometric galaxy catalogs (with redshifts inferred by photometry rather than spectra) can be useful when they overlap with the GW skymap (Soares-Santos et al. 2019; Palmese et al. 2020). New and upcoming large-scale spectroscopic galaxy surveys like DESI, Taipain, SDSS-V, and 4MOST may provide useful galaxy catalogs for statistical GW standard siren analyses, either by cataloging a large fraction of the sky or through targeted follow-up of GW event localizations.

In the absence of counterparts or galaxy catalogs, alternative sources of redshift information have been proposed. If galaxy catalogs are incomplete but GW events are well-localized, matching the spatial clustering of GW sources as a function of distance to the clustering of galaxies as a function of redshift can constrain cosmological parameters (MacLeod and Hogan 2008; Oguri 2016; Mukherjee and Wandelt 2018; Vijaykumar et al. 2020; Bera et al. 2020; Mukherjee et al. 2021).

Another extension of the statistical dark standard siren method is to use prior knowledge of the merger redshift distribution, derived from external measurements of the star formation rate and time delay distribution of binary mergers, to compare against the observed gravitational-wave distance distribution (Ding et al. 2019; Ye and Fishbach 2021; Leandro et al. 2021). Finally, a particularly promising avenue for gravitational-wave only standard siren analyses is to use known features in the source population to directly extract the redshift and distance from the gravitational-wave signal alone. These sources have been dubbed “spectral sirens” (Ezquiaga and Holz 2022). If information about the source-frame frequency is available, the redshift can be derived from the observed GW frequency. This source-frame GW frequency information can come from features in the source-frame mass distribution (Chernoff and Finn 1993; Taylor et al. 2012; Taylor and Gair 2012; Farr et al. 2019; You et al. 2021; Ezquiaga and Holz 2021, 2022) as well as tidal effects in neutron star mergers (Messenger and Read 2012; Del Pozzo et al. 2017; Chatterjee et al. 2021).

As Farr et al. (2019) showed, an especially promising feature in the black hole mass distribution is the lower edge of the pair-instability mass gap: a steep drop-off in the black hole mass distribution at \(\sim 40\)\(65\,M_\odot \), which may be accompanied by a pile-up of black holes immediately below the gap at \(\gtrsim 35\,M_\odot \). Stellar models (Fowler and Hoyle 1964; Rakavy et al. 1967; Fryer et al. 2001; Heger and Woosley 2002) show that when the black hole progenitor Helium star is in the mass range \(\sim 40\)\(120\,M_\odot \), after the helium burning stage, unstable electron-positron pair production occurs in the carbon-oxygen core. This pair production reduces the photon pressure in the stellar core, and causes oxygen to explosively ignite. This explosive oxygen burning generates an energetic outwards pulse, which can disrupt the star entirely, leaving behind no stellar remnant, or shed off enough mass so that when the star collapses to a black hole, its mass is below the mass gap. Because the physics of pair instability depends primarily on the mass of the carbon-oxygen core, the location of the lower and upper edge of the gap are expected to be independent of redshift (Farmer et al. 2019). By observing the redshifted mass distribution as a function of luminosity distance in gravitational waves, the location of the pair-instability feature(s) can be jointly inferred together with the redshift-distance relation (Farr et al. 2019; Mastrogiovanni et al. 2021). Gravitational-wave observations of binary black holes support the existence of bump, followed by a steepening of the black hole mass distribution at \(\sim 40\,M_\odot \) (Fishbach and Holz 2017; Abbott et al. 2021c, e). The interpretation of this feature as the imprint of pair-instability supernovae is still uncertain; however, as black hole population models improve, such features in the black hole mass distribution can be theoretically calibrated and reach their potential as robust cosmological probes. It is to be emphasized that all features in the mass distribution, including properties around the putative NS-BH lower mass gap, can be used as spectral sirens; the combination of these many features can be used to self-calibrate and control potential bias from systematic errors (Ezquiaga and Holz 2022).

3.4.3 Measurements

While GWs directly provide the luminosity distance to the source, there are multiple ways to estimate its redshift. As discussed in the previous section, standard siren redshift measurements fall under three main categories: electromagnetic counterparts, galaxy catalogs, and features in the GW source population.

3.4.3.1 Electromagnetic counterparts

The multi-messenger binary neutron star detection, GW170817, provided the first standard siren measurement of the Hubble constant (Abbott et al. 2017d, a). Gravitational-wave parameter estimation provided a luminosity distance of \(43.8^{+2.9}_{-6.9}\) Mpc. The kilonova optical counterpart allowed for the identification of a unique host galaxy NGC4993. Because this event was relatively nearby, the measured redshift of NGC4993 is significantly affected by its peculiar (non-Hubble flow) velocity. In this case, the peculiar velocity is large (\(\sim 300\) km/s) because NGC4993 is near to the Great Attractor. Correcting for inter-group and bulk flow velocities, the Hubble flow velocity is \(3017\pm 166\) km/s. At \(z \sim 0.01\), this event is only sensitive to the first-order linear redshift-distance relation, and the resulting Hubble constant measurement is \(H_0\) \(=70^{+12}_{-8}\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\) (maximum a-posteriori value and 68.3% highest density credible interval, taking a flat-in-log prior on \(H_0\)). With improved analysis of the gravitational-wave signal and slightly updated distance measurement, the Hubble constant measurement was updated to \(H_0\) \(=70^{+13}_{-7}\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\) (Abbott et al. 2019a). In addition to measuring the Hubble constant, GW170817 and its electromagnetic counterpart enabled impressively tight constraints on cosmological modified gravity theories, including the speed of gravity and gravitational-wave friction (Abbott et al. 2017b; Amendola et al. 2018; Ezquiaga and Zumalacárregui 2017; Sakstein and Jain 2017; Creminelli and Vernizzi 2017; Baker et al. 2017; Crisostomi and Koyama 2018; Boran et al. 2018; Lagos et al. 2019; Pardo et al. 2018; Abbott et al. 2019b).

3.4.3.2 Galaxy catalogs

To date, the only gravitational-wave event with a confident electromagnetic counterpart is GW170817. A possible AGN flare association was identified for the binary black hole event GW190521 (Graham et al. 2020), but the association is debatable (Ashton et al. 2021; De Paolis et al. 2020; Palmese et al. 2021). However, the statistical galaxy catalog method has been applied to several gravitational-wave events. As a proof of concept, Fishbach et al. (2019) demonstrated the statistical method with GW170817, marginalizing over galaxies in the GLADE catalog (Dálya et al. 2018), rather than using the uniquely identified host galaxy NGC4993. Because GW170817 was exceptionally loud and close-by, and all three detectors of the LIGO-Virgo network were operational, it was localized to only 16 deg\(^2\) with 90% credibility (215 Mpc\(^3\) assuming standard cosmological parameters from Planck Collaboration et al. 2015). This small localization volume contains only one large group of galaxies (the group containing NGC4993) at \(z \sim 0.01\), and so the statistical standard siren measurement of \(H_0\) from GW170817 is almost as informative as the counterpart measurement. In most cases, the gravitational-wave localization volume contains \({\mathcal {O}}(10^4\)\(10^5)\) potential host galaxies, and so the statistical standard siren method would be substantially less informative even if we had complete galaxy catalogs with well-measured redshifts. The two best statistical standard sirens, excluding GW170817, are the binary black hole event GW170814 (Abbott et al. 2017c) and the (probable) binary black hole event GW190814 (Abbott et al. 2020). (The secondary mass of GW190814 is ambiguous, and GW190814 may be a neutron star–black hole system.) Both of these events lack electromagnetic counterparts, but their sky position and gravitational-wave location are ideal for the statistical galaxy catalog method. Not only are they the best-localized events from the first three observing runs (other than the binary neutron star event GW170817), but they also both fall within the footprint of the Dark Energy Survey (DES, Dark Energy Survey Collaboration et al. 2016).

GW170814 was the first three-detector gravitational-wave event, observed by Virgo in addition to the two LIGO observatories in their second observing run. Using data from all three detectors enabled a 90% sky localization of only 60 deg\(^2\) (compared to 1160 deg\(^2\) using only data from the two LIGO detectors). Correlating the gravitational-wave sky map and distance measurement of \(540^{+130}_{-210}\) Mpc with the photometric galaxy catalog from DES, Soares-Santos et al. (2019) performed the first standard siren measurement of the Hubble constant using a binary black hole. With only a single event, the measurement was relatively broad, with the 68% posterior credible interval encompassing \(\sim 60\%\) of the prior, but nevertheless there was a clear peak at \(H_0\) \(\sim 75\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\)associated with an over-density of galaxies at \(z \sim 0.12\).

GW190814, detected by LIGO and Virgo in their third observing run, is the best-localized dark standard siren observed to date. It was localized to 18 deg\(^2\) (90% credibility) on the sky. At \(241^{+41}_{-45}\) Mpc, it is nearby and has an impressive signal-to-noise ratio of 25. Furthermore, because of its asymmetric masses (mass ratio of approximately 1:10), the gravitational-wave signal contains detectable higher harmonics, which reduce the distance-inclination degeneracy and yields a tighter distance measurement. Combining the gravitational-wave localization with the GLADE galaxy catalog, Abbott et al. (2020) performed a statistical standard siren measurement of the Hubble constant, finding a broad peak at \(H_0\) \(=75^{+59}_{-13}\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\) (with the 68% highest posterior density interval comprising 60% of the prior range). Although GW190814 is very nearby for a gravitational-wave event, it is at the limit of where currently-available spectroscopic galaxy catalogs are useful. At these distances, the GLADE catalog is 40% complete. Meanwhile, like GW170814, GW190814 lies within the DES footprint. Although the DES catalog contains photometric, rather than spectroscopic redshifts, which means larger errors on each galaxy’s redshift, it does not suffer from incompleteness. Palmese et al. (2020) used the DES galaxies within the GW190814 sky map to measure the Hubble constant to \(H_0\) \(=78^{+57}_{-13}\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\), consistent with the result of Abbott et al. (2020).

3.4.3.3 Standard siren population

In order to achieve competitive cosmological constraints, information must be combined across multiple standard sirens. Analyzing a population of standard sirens requires a careful treatment of measurement uncertainties and selection effects (Chen et al. 2018; Mandel et al. 2019; Mortlock et al. 2019). The importance of incorporating selection effects can be understood by considering that gravitational-wave detectors are significantly more likely to observe sources at smaller distances, but there are more potential host galaxies at higher redshifts. If the analysis did not account for selection effects, it would tend to overestimate the redshifts of gravitational-wave events and therefore overestimate the Hubble constant. Meanwhile, because the probability of detecting a gravitational-wave source is a strong function of its mass and distance (and, to a lesser degree, the component spins), we must simultaneously fit the gravitational-wave source distribution, particularly the astrophysical mass distribution and distance/redshift distribution, with the cosmological parameters. For example, if the wrong binary black hole mass distribution is assumed in the statistical galaxy catalog method, the recovered cosmological parameters will be biased (Abbott et al. 2021a; Mastrogiovanni et al. 2021; Abbott et al. 2021b). The assumed black hole and neutron star spin distribution can also affect the cosmological inference, both because the binary spin impacts the gravitational-wave detection probability and because of mild degeneracies between the measured binary spin, inclination and luminosity distance. The latter effect was already noted for the GW170817 standard siren measurement; assuming different priors on the neutron star spin magnitudes yielded slightly different posteriors on the Hubble constant (Abbott et al. 2019a). In addition to the gravitational-wave data, care must be taken in the statistical treatment of the redshift information. If redshifts are supplied from a galaxy catalog, particular attention is required in treating galaxy catalog incompleteness (Fishbach et al. 2019; Gray et al. 2020; Finke et al. 2021; Gray et al. 2021).

The latest gravitational-wave catalog consists of \(\sim \)90 events from three observing runs of LIGO and Virgo (Abbott et al. 2021d). Using redshift information either from galaxy catalogs or from the redshifted binary black hole mass spectrum, these events have been used in combination with the counterpart standard siren measurement of GW170817 to constrain the expansion history H(z) and several cosmological modified gravity theories (Finke et al. 2021; Abbott et al. 2021b; Palmese et al. 2021; Mancarella et al. 2021). With the relatively low-redshift sample, the best measured cosmological parameter remains the Hubble constant, and the constraints using all events represent a \(\sim 20\%\) improvement over the measurement from GW170817 and its counterpart (see Fig. 19, left panel).

Fig. 19
figure 19

Images reproduced with permission from [left] Abbott et al. (2021b), and [right] from Farr et al. (2019), copyright by AAS

Constraints on \(H_0\) and H(z) from GWs used as standard sirens. Left panel: \(H_0\) posterior obtained from the combination of the signal of 42 black hole-black hole mergers from GWTC-3 with the detection of GW170817. Right panel: Forecasts on H(z) measurements obtained from the simulation of five years (orange line, one year with the blue line) of detection from the Advanced LIGO and Virgo detectors.

3.4.4 Systematic effects

The limiting systematic uncertainty for standard siren measurements is the detector calibration, specifically the amplitude uncertainty. Each detector’s amplitude response uncertainty translates to a systematic distance uncertainty for the GW source, contributing at the few-percent level. For individual events, the statistical distance uncertainty of \({\mathcal {O}}(10\%)\) dominates calibration uncertainty. But when stacking events to infer cosmological parameters, unlike the statistical distance uncertainty, the calibration uncertainty may not average out. An important prerequisite for reaching a percent-level \(H_0\) measurement with standard sirens is to reduce the amplitude calibration uncertainty below 1% (Sun et al. 2021a).

As the standard siren catalog continues to grow, other uncertainties in the gravitational-wave distance measurements will become important. One of these uncertainties is the gravitational waveform model. Extracting the distance of the source from the gravitational-wave measurement requires a gravitational waveform model that is not perfectly known, especially for systems with strong matter effects, extreme spins or mass ratios (Huang et al. 2021). For standard sirens at larger distances, the gravitational-wave signal may be (de)magnified due to weak gravitational lensing by matter along the line of sight. Most of these uncertainties may be incorporated into the statistical framework and contribute to a statistical rather than systematic uncertainty. For example, if the distribution of lensing magnifications is known, this contribution can be marginalized over in the GW distance likelihood (Holz and Hughes 2005; Hirata et al. 2010; Sathyaprakash et al. 2010). As discussed in Sect. 3.4.3, the astrophysical distributions of the masses, spins and distances of black hole and neutron star mergers must be simultaneously inferred with the cosmological parameters, especially when analyzing a population of standard sirens at cosmological distances. Even compared to the current large statistical uncertainties, fixing the binary black hole mass distribution in the galaxy catalog standard siren analysis results in a significant systematic uncertainty, whereas the joint inference transfers the systematic uncertainty to a statistical uncertainty that converges with many events (Abbott et al. 2021b).

There are also uncertainties in the redshift measurements that, if not properly understood, can contribute to a systematic uncertainty. The counterpart standard siren method, where the redshift information comes directly from a unique host galaxy identification, is the least susceptible to systematic effects. A possible systematic uncertainty in the redshift measurement can come from errors in the peculiar velocity correction, but the statistics of peculiar velocities are well-understood and, especially at typical standard siren distances, contribute a negligible fraction of the uncertainty budget. On the other hand, when galaxy catalogs are used for the redshift information, they introduce more potential sources of systematic uncertainty. Factors such as catalog incompleteness, photometric redshift uncertainties, and the galaxies’ probabilities of hosting gravitational-wave sources must be understood. If the redshift information is supplied by features in the source distribution, it is important to check that the population model is not mis-specified. For example, fitting the binary black hole mass distribution to a power law, where the true distribution more closely resembles a mixture model between a power law and a Gaussian, would lead to biased recovery of the mass distribution and the cosmological parameters. In general, the source distribution also needs to be calibrated against theoretical models. If the source mass distribution evolves with redshift (Fishbach et al. 2021), theoretical guidance may help disentangle the source mass evolution with cosmological redshift, although analysis of the full distribution may help self-calibrate the sample (Ezquiaga and Holz 2022).

3.4.5 Main results and forecasts

The current best standard siren constraints are dominated by the Hubble constant measurement from GW170817 and its electromagnetic counterpart, which yielded \(H_0\) \(=70^{+13}_{-7}\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\). However, with \(\sim 90\) gravitational-wave events detected to date, the population of standard sirens without counterparts is beginning to contribute. Out of the gravitational-wave events without counterparts, a couple of events, namely GW170814 and GW190814, have been particularly well-localized so that comparing their localization posteriors to a galaxy catalog yields only \(\sim 1\) probable galaxy structure that contains the host galaxy, resulting in a uni-modal, fairly informative Hubble constant measurement. The remaining dozens of events have also been used for standard siren analyses in conjunction with galaxy catalogs, but care is required in the interpretation of these results. Unless the source population, particularly the binary black hole mass distribution, is simultaneously inferred with the cosmological parameters, hidden assumptions about the source population can impact the cosmological inference and result in overly optimistic constraints. So far, the only analyses that simultaneously fit the source population and cosmological parameters do so without incorporating galaxy catalog information (Abbott et al. 2021b). With the latest gravitational-wave catalog GWTC-3, these methods yield a 17% improvement in the Hubble constant measurement over the measurement from GW170817 and its counterpart (see the left panel of Fig. 19; Abbott et al. 2021b).

The most robust standard sirens are gravitational-wave sources with electromagnetic counterparts, typically binary neutron stars, although some neutron star-black hole mergers may also produce electromagnetic emission. With the current ground-based gravitational-wave detectors, these sources will predominantly be sensitive to the Hubble constant, and with N sources with counterparts, we expect the Hubble constant measurement to converge as \(15\%/\sqrt{N}\) (Chen et al. 2018).

For the majority of gravitational-wave events that lack counterparts, galaxy catalogs can be used for the redshift information. Further work needs to be done to develop galaxy catalogs specifically for the standard siren application, manage catalog incompleteness, and jointly fit the source population together with the cosmological parameters to avoid systematic bias. Another promising method is to use features in the source mass distribution to fit cosmological parameters together with the source population in a gravitational-wave only analysis. Farr et al. (2019) showed that leveraging the pair-instability feature in the black hole mass distribution can provide percent-level constraints on H(z) at \(z = 0.8\) within 5 years of Advanced LIGO observations (see the right panel of Fig. 19). By combining binary black holes, which can be observed at higher redshifts, with nearby binary neutron stars with counterparts, the expansion history can therefore be measured out to \(z \sim 1.5\). For this method to provide robust cosmological constraints, further progress is required in theoretical models of the black hole mass distribution. In particular, the redshift evolution of the source mass distribution must be theoretically understood, or controlled through self-calibration (Ezquiaga and Holz 2022).

Standard sirens are unique cosmological probes in that they simultaneously probe the background cosmology and gravitational perturbations, namely the propagation of gravitational waves. Beyond constraining the Hubble constant, standard sirens are therefore especially promising for constraining dark energy theories both through their effects on the background cosmology and their effects on gravitational-wave propagation.

The era of gravitational-wave cosmology has just begun. The gravitational-wave catalog is growing at an incredible rate, and by the late 2020s, the gravitational-wave detector network of LIGO, Virgo and KAGRA are expected to detect hundreds to thousands of events annually. In the coming decades, the space-based gravitational-wave detector LISA is expected to launch, and the next generation of ground-based gravitational-wave detectors may become a reality, including Cosmic Explorer (Reitze et al. 2019), Einstein Telescope (Punturo et al. 2010), Taiji (Zhao et al. 2020b), and TianQin (Wang et al. 2020). The growth of the gravitational-wave dataset is accompanied by new electromagnetic telescopes to hunt counterparts, galaxy surveys to expand redshift catalogs, theoretical developments to model the gravitational-wave source population, and computational techniques to carry out the standard siren inference. Standard siren cosmology is a rapidly growing field with a promising future.

3.5 Time-delay cosmography

Time-delay cosmography uses measurements of relative arrival times of multiply gravitationally lensed sources to measure an absolute scale of the Universe. The method was originally proposed by Refsdal (1964) over half a century ago, prior to the discovery of the first extra-galactic gravitational lens. The methodology provides a one-step measurement of the Hubble constant, completely independent of the local distance ladder or probes anchored with sound horizon physics, such as the cosmic microwave background (CMB). Figure 20 illustrates different galaxy-scale gravitational lenses with a multiply imaged quasar in different configurations (Suyu et al. 2017).

Fig. 20
figure 20

Image reproduced with permission from Suyu et al. (2017), copyright by the authors

Four quadruply lensed quasar systems and one doubly lensed quasar system from the H0LiCOW sample. The lens name is indicated above each panel. The color images are composed using 2 (for B1608+656) or 3 (for other lenses) HST imaging bands in the optical and near-infrared. North is up and east is left.

3.5.1 Basic idea and equations

The phenomena of gravitational lensing can be described by the lens equation, which maps the source plane coordinate \(\varvec{\beta }\) to the image plane \(\varvec{\theta }\):

$$\begin{aligned} \varvec{\beta } = \varvec{\theta } - \varvec{\alpha }(\varvec{\theta }) , \end{aligned}$$
(50)

where \(\varvec{\alpha }\) is the angular shift on the sky between the original un-lensed and the lensed observed position of an object.

For a single deflector plane, the lens equation can be expressed in terms of the physical deflection angle \(\hat{\varvec{\alpha }}\) as:

$$\begin{aligned} \varvec{\beta } = \varvec{\theta } - \frac{D_{\mathrm{ds}}}{D_{\mathrm{s}}}\hat{\varvec{\alpha }}(\varvec{\theta }) , \end{aligned}$$
(51)

where \(D_{\mathrm{s}}\) and \(D_{\mathrm{ds}}\) are the angular diameter distance from the observer to the source and from the deflector to the source, respectively. In the single lens plane regime, we can introduce the lensing potential \(\psi \) such that the reduced deflection angle is the gradient of the potential:

$$\begin{aligned} \varvec{\alpha }(\varvec{\theta }) = \nabla \psi (\varvec{\theta }) , \end{aligned}$$
(52)

and the lensing convergence as:

$$\begin{aligned} \kappa (\varvec{\theta }) = \frac{1}{2}\nabla ^2 \psi (\varvec{\theta }) . \end{aligned}$$
(53)

Physically, the lensing convergence in this regime corresponds to the projected surface mass density \(\varSigma \) normalized to the critical lensing surface density \(\varSigma _{\mathrm{crit}}\):

$$\begin{aligned} \kappa (\varvec{\theta }) = \frac{\varSigma (\varvec{\theta })}{\varSigma _{\mathrm{crit}}} , \end{aligned}$$
(54)

with the critical lensing surface density:

$$\begin{aligned} \varSigma _{\mathrm{crit}} = \frac{c^2 D_{\mathrm{s}}}{4\pi G D_{\mathrm{d}}D_{\mathrm{ds}}} , \end{aligned}$$
(55)

where \(D_{\mathrm{d}}\) is the angular diameter distance to the deflector, c is the speed of light and G is the gravitational constant.Footnote 11

The relative arrival time between two images \(\varvec{\theta }_{\mathrm{A}}\) and \(\varvec{\theta }_{\mathrm{B}}\), \(\varDelta t_{\mathrm{AB}}\), originated from the same source is given by:

$$\begin{aligned} \varDelta t_{\mathrm{AB}} = \frac{D_{\varDelta t}}{c} \left[ \tau (\varvec{\theta }_{\mathrm{A}}, \varvec{\beta }) - \tau (\varvec{\theta }_{\mathrm{B}}, \varvec{\beta }) \right] , \end{aligned}$$
(56)

where:

$$\begin{aligned} \tau (\varvec{\theta }, \varvec{\beta }) = \left[ \frac{\left( \varvec{\theta } - \varvec{\beta } \right) ^2}{2} - \psi (\varvec{\theta })\right] \end{aligned}$$
(57)

is the Fermat potential (Schneider 1985; Blandford and Narayan 1986), and:

$$\begin{aligned} D_{\varDelta t} \equiv \left( 1 + z_{\mathrm{d}}\right) \frac{D_{\mathrm{d}}D_\mathrm{s}}{D_{\mathrm{ds}}} \end{aligned}$$
(58)

is the time-delay distance (Refsdal 1964; Schneider et al. 1992; Suyu et al. 2010).

Constraints on the Fermat potential difference \(\varDelta \tau _\mathrm{AB}\) and a measured time delay \(\varDelta t_{\mathrm{AB}}\) allows one to constrain the time-delay distance \(D_{\varDelta t}\). This absolute physical distance anchors the scale in the Universe within the redshifts involved in the lensing configuration. The Hubble constant is inversely proportional to the absolute scales of the Universe, and thus scales with \(D_{\varDelta t}\) as:

$$\begin{aligned} H_0 \propto D_{\varDelta t}^{-1} , \end{aligned}$$
(59)

mildly dependent on the relative expansion history from current time (\(z=0\)) to the redshift of the deflector and source.

While the time delay \(\varDelta t_{\mathrm{AB}}\) can be directly measured (see Sect. 3.5.3), the relative Fermat potential \(\varDelta \tau _{\mathrm{AB}}\) is not a direct observable. The primary information to infer \(\varDelta \tau _{\mathrm{AB}}\) are positional constraints and extended distortions from the lensing effect. However, there are degeneracies inherent in gravitational lensing that limit the amount of information accessible by lensing distortions (e.g., Falco et al. 1985; Gorenstein et al. 1988; Kochanek 2002; Saha and Williams 2006; Schneider and Sluse 2013, 2014; Birrer et al. 2016; Unruh et al. 2017; Birrer 2021).

The most prominent lensing degeneracy impacting the time-delay prediction is the mass-sheet degeneracy (MSD, Falco et al. 1985). The MSD is a multiplicative transform of the lens equation (Eq. (50)) which preserves image positions (and any higher order relative differentials of the lens equation) under a linear source displacement \(\varvec{\beta } \rightarrow \lambda \varvec{\beta }\) combined with a transformation of the convergence field:

$$\begin{aligned} \kappa _{\lambda }(\varvec{\theta }) = \lambda \kappa (\varvec{\theta }) + \left( 1 - \lambda \right) . \end{aligned}$$
(60)

The term \((1 - \lambda )\) in Eq. (60) above describes an infinite sheet of convergence (or mass), and hence the name mass-sheet transform (MST). Only observables related to the unlensed apparent source size, to the unlensed apparent brightness, or to the lensing potential are able to break this degeneracy. Thus, the same relative lensing observables can result if the mass profile is scaled by the factor \(\lambda \) with the addition of a sheet of convergence (or mass) of \(\kappa (\varvec{\theta }) = (1-\lambda )\).

The Fermat potential (Eq. (57)) scales with \(\lambda \) as:

$$\begin{aligned} \varDelta \tau _{\mathrm{AB, \lambda }} = \lambda \varDelta \tau _{\mathrm{AB}} , \end{aligned}$$
(61)

and so does the time delay as:

$$\begin{aligned} \varDelta t_{\mathrm{AB, \lambda }} = \lambda \varDelta t_{\mathrm{AB}} . \end{aligned}$$
(62)

When transforming a lens model with a mass-sheet transformation, the inference of the time-delay distance (Eq. (58)) from a measured time delay and inferred Fermat potential transforms as:

$$\begin{aligned} D_{\varDelta t, \lambda } = \lambda ^{-1}D_{\varDelta t} . \end{aligned}$$
(63)

Thus, the Hubble constant, when inferred from the time-delay distance \(D_{\varDelta t}\), transforms from Eq. (59) as:

$$\begin{aligned} H_{0, \lambda } = \lambda H_0 . \end{aligned}$$
(64)

An MSD effect relative to a proposed deflector model might occur either within the mass distribution of the main deflector, referred as internal MSD with \(\lambda _{\mathrm{int}}\), or being caused due to homogeneities along the line-of-sight (LOS) of the strong lens system.

Mass over- or under-densities along the LOS of the strong lensing system cause, to first order, shear and convergence perturbations. Reduced shear distortions have a measurable imprint on the azimuthal structure of the strong lensing system (see e.g., Birrer 2021) while the convergence component of the LOS, denoted as \(\kappa _{\mathrm{ext}}\), is equivalent to an MST, and thus not directly measurable from imaging data. The lensing kernel impacting the linear distortions, both shear and \(\kappa _{\mathrm{ext}}\), is different from the standard weak lensing kernel (McCully et al. 2014, 2017; Birrer et al. 2017, 2020; Fleury et al. 2021b).

We define \(D^{\mathrm{lens}}\) as the specific angular diameter distance along the line-of-sight of the lens being corrected by LOS structure and \(D^{\mathrm{bkg}}\) as the angular diameter distance from the homogeneous background metric without any perturbative contributions. \(D^{\mathrm{lens}}\) and \(D^{\mathrm{bkg}}\) are related through the convergence terms as (Birrer et al. 2020):

$$\begin{aligned} D^{\mathrm{lens}}_{\mathrm{d}}= & {} (1 - \kappa _{\mathrm{d}})D_{\mathrm{d}}^{\mathrm{bkg}} \end{aligned}$$
(65)
$$\begin{aligned} D^{\mathrm{lens}}_{\mathrm{s}}= & {} (1 - \kappa _{\mathrm{s}})D_{\mathrm{s}}^{\mathrm{bkg}} \end{aligned}$$
(66)
$$\begin{aligned} D^{\mathrm{lens}}_{\mathrm{ds}}= & {} (1 - \kappa _{\mathrm{ds}})D_{\mathrm{ds}}^{\mathrm{bkg}} , \end{aligned}$$
(67)

where \(\kappa _{\mathrm{d}}\) is the weak lensing effect from the observer to the deflector, \(\kappa _{\mathrm{s}}\) from the observer to the source, and \(\kappa _{\mathrm{ds}}\) from the deflector to the source, respectively (Birrer et al. 2020). The lensing kernel impacting the time delay can be described as the product of three different angular diameter distances entering \(D_{\varDelta t}\) in Equation (58) (Birrer et al. 2020; Fleury et al. 2021a),

$$\begin{aligned} 1 - \kappa _{\mathrm{ext}} = \frac{(1 - \kappa _{\mathrm{d}})(1 - \kappa _\mathrm{s})}{1 - \kappa _{\mathrm{ds}}} . \end{aligned}$$
(68)

MSD uncertainties or biases may also arise relative to assumptions made in the radial density profile of the main deflector galaxy (see, e.g., Kochanek 2002; Read et al. 2007; Schneider and Sluse 2013; Coles et al. 2014; Xu et al. 2016; Birrer et al. 2016; Unruh et al. 2017; Sonnenfeld 2018; Kochanek 2020; Blum et al. 2020; Birrer et al. 2020; Kochanek 2021). Any lensing-only constraints on the radial density profile is over-constrained, and constraints rely on the functional form imposed.

The total MST, i.e. the relevant transform to constrain for an accurate cosmography and \(H_0\) measurement, is the product of the internal and external MST (e.g., Schneider and Sluse 2013; Birrer et al. 2016, 2020):

$$\begin{aligned} \lambda = (1-\kappa _{\mathrm{ext}}) \times \lambda _{\mathrm{int}} . \end{aligned}$$
(69)

The external line-of-sight lensing contribution can be estimated by tracers of the large-scale structure, either using galaxy number counts (e.g., Greene et al. 2013; Rusu et al. 2017), or weak-lensing measurements (Tihhonova et al. 2018). These measurements, paired with a cosmological model including a galaxy-halo connection are able to constrain the probability distribution of \(\kappa _{\mathrm{ext}}\) to few per cent per sight line.

Among those observations that are sensitive to the total MST \(\lambda \), stellar kinematics is the most prominent and commonly used one. The dynamics of stars is a direct tracer of the three-dimensional gravitational potential and provides an independent mass estimate. Joint lensing and dynamics constraints have been used to provide measurements of galaxy mass profiles (e.g., Grogin and Narayan 1996; Romanowsky and Kochanek 1999; Treu and Koopmans 2002). The modeling of the kinematic observables in lensing galaxies range in complexity from spherical Jeans modeling (Binney and Tremaine 2008) to Schwarzschild (Schwarzschild 1979) methods.

Regardless of the approach, the prediction of any \(\sigma _{\mathrm{v}}\) from any model can be decomposed into a cosmological-dependent and cosmology-independent part as (see e.g., Birrer et al. 2016, 2019):

$$\begin{aligned} \sigma _{\mathrm{v}}^2 = \lambda \frac{D_{\mathrm{s}}}{D_{\mathrm{ds}}}c^2 J(\varvec{\xi }_{\mathrm{lens}}, \varvec{\beta }_{\mathrm{ani}}) , \end{aligned}$$
(70)

where c is the speed of light, J is a dimensionless quantity dependent on the deflector model (\(\varvec{\xi }_{\mathrm{lens}}\)), the stellar anisotropy distribution (\(\varvec{\beta }_{\mathrm{ani}}\)) and the observational conditions and luminosity-weighting within the aperture (e.g., Binney and Mamon 1982; Treu and Koopmans 2004; Suyu et al. 2010).

The constraints obtained from joint lensing and dynamics are either able to determine the MST component of the deflector model, or provide additional cosmographic constraints on the relative expansion history through the involved angular diameter distance ratio (\(D_{\mathrm{s}}/D_{\mathrm{ds}}\), Eq. (70)). When adding a time delay, the joint cosmographic constraints from a combined analysis of time-delay, lensing, and dynamics can be translated into a two-dimensional angular diameter distance plane (Birrer et al. 2016, 2019). When mapped into the \(D_{\varDelta t}\)-\(D_{\mathrm{d}}\)-plane, the projection in \(D_{\mathrm{d}}\) is invariant under any pure MSD parameter \(\lambda \) (Paraficz and Hjorth 2009; Jee et al. 2015; Birrer et al. 2019).Footnote 12

An alternative approach to constrain the MSD is with absolute lensing magnifications. The MSD transforms the lensing magnification \(\mu \) by:

$$\begin{aligned} \mu _{\lambda } = \lambda ^{-2}\mu . \end{aligned}$$
(71)

Thus, a known apparent unlensed brightness of an object \(F_\mathrm{unl}\) with a measured flux \(F_{\mathrm{obs}}\) can directly measure the target magnification:

$$\begin{aligned} \mu _{\lambda } = \frac{F_{\mathrm{obs}}}{F_{\mathrm{unl}}} . \end{aligned}$$
(72)

Gravitationally lensed supernovae (glSNe) can provide, in addition to measurable time delays, lensing magnification constraints when knowledge about the unlensed apparent brightness of the explosion is imposed. This measurement does not require an absolute bolometric calibration of the exploding transient, but only relative to an unlensed field (e.g., Kolatt and Bartelmann 1998; Oguri and Kawano 2003; Foxley-Marrable et al. 2018; Birrer et al. 2021).

3.5.2 Sample selection

The primary requirement to provide an absolute distance measurement is a measured relative time delay between a multiply imaged source. A time delay can only be measured if the source is bright and time-variable, or a transient. The original proposed source by Refsdal (1964) were lensed supernovae before the discovery of the strong-lensing phenomena on cosmological scales. The first extra-galactic lens discovered was a doubly lensed quasar (Walsh et al. 1979). Lensed quasars were quickly identified as excellent sources for time-delay cosmography as they are variable on short time scale, making the time-delay measurements possible, and they are sufficiently bright to be observed at cosmological distances. Lensed quasars are typically found at redshift \(z_{\mathrm{s}} \sim \)1–3, lensed by massive early-type galaxies located around redshift \(z_{\mathrm{d}} \sim \)0.2–0.8. This configuration typically produces multiple images separated by 1–3\(^{\prime \prime }\).

Strongly lensed quasars are rare objects on the sky. The discovery of currently known lensed quasars followed different paths. Some lenses were serendipitously discovered by visual inspection of astronomical images, in particular in the early days (e.g., Sluse et al. 2003). More recently, with the advent of large ground and space-based imaging surveys, more systematic searches could be conducted, involving astrometric and color selections on post-processed catalogs (Krone-Martins et al. 2018; Agnello et al. 2018; Lemon et al. 2019), and more recently directly employed machine learning techniques on both catalogs and images. The discovery process is made in phases of certainty of the lensing nature with increased follow-up efforts. The first step with wide-field surveys often results in hundreds of candidates, of which a subset of the highest ranked candidates is followed-up with spectroscopic observations to confirm the identical redshift of the pair or quartet of quasar images, and with deep high-resolution imaging to detect the deflector galaxy and extended lensed features from the quasar host galaxy.

The most prominent lensing system being utilized are galaxy-scale lenses with quadruply imaged quasars. These systems can offer several relative time delays, additional constraints on the lens model from both positional constraints of the quasars and the often Einstein-ring-like lensed structure of the quasar host galaxy. Thus, a significant effort in the search and follow-up work has been spent to find quadruply lensed quasars. Quadruply lensed quasars are less frequent than doubly lensed quasars by a factor of about \(\sim \)5 (Oguri and Marshall 2010). The more abundant population of doubly lensed quasars provide less constraints per individual lens, but come with a potential in a population-level analysis.

More recently, the first multiply imaged supernovae were discovered in a galaxy cluster environment (Kelly et al. 2015) and on a galaxy-scale lens (Goobar et al. 2017). This opens the path, as envisioned by Refsdal (1964), to use lensed supernovae as the time-variable source to measure \(H_0\) and with it the opportunity to utilize an entirely new source population.

3.5.3 Measurements

In order to measure the distances \(D_{\varDelta t}\), or more generally the \(D_{\varDelta t}\)-\(D_{\mathrm{d}}\) combination, from a time-delay lens system for cosmography, we need the following data products:

  1. 1.

    discovery of a lens with a time-variable source;

  2. 2.

    spectroscopic redshifts of the lens \(z_{\mathrm{d}}\) and source \(z_{\mathrm{s}}\);

  3. 3.

    time delays between the multiple images;

  4. 4.

    lens mass model to determine the Fermat potential;

  5. 5.

    lens environment studies to constrain external lensing effects related to the mass-sheet degeneracy.

The dataset required for each step are observationally cheap in comparison to other cosmological probes. However, the combined analysis, even of a single lens, requires the coordination of multiple independent observations. The analysis can be impossible or severely limited in its precision and reliability by a single missing ingredient. For the discovery datasets, we refer to Sect. 3.5.2 and references therein.

3.5.3.1 Spectroscopic redshifts.

The spectroscopic redshifts of the quasar sources \(z_{\mathrm{s}}\) are often easy to obtain given the frequent emission lines in quasars. The redshift of the lens \(z_\mathrm{d}\) can be challenging since the bright quasar images can outshine the lens galaxy. Getting \(z_{\mathrm{d}}\) of lensed quasar systems often require spectra taken under good seeing condition, to deblend the lensing galaxy from the quasar.

3.5.3.2 Time delays

Without measurements of a time delay, no constraints on absolute distances involved can be inferred, and thus, regardless of the approach chosen, no direct constraints on the Hubble constant can be achieved. Relative time delays are measured with monitoring campaigns to extract light curves from individual images. Lensed quasars with images separated by 1–3\(^{\prime \prime }\)are sufficient to be resolved with small ground-based telescope. The monitoring of lensed quasars is thus challenging but possible with 1-m or 2-m class telescope. To perform the measurement, several conditions need to be met: i) photometric accuracy with few milli-magnitudes are required to catch the low-amplitude variability signal, ii) a good sampling of the light curves is necessary if one targets the fast variations of small amplitude, and iii) the duration of the monitoring campaign also need to be sufficient to cover the duration of time delays and to ensure that enough variations of the quasar are recorded. Furthermore, seasonal gaps are unavoidable in optical light curves since most lensed quasars are not visible all year long. In addition, extrinsic variations caused mainly by the micro-lensing of the quasar images, but also a variety of other astrophysical effects, are often observed in the light curves. These extrinsic variations and gaps can severely bias time-delay measurements if not appropriately modeled for. Once well-sampled light-curves have been acquired, the next step consists in identifying features that can be matched in all light curves, and measure the time delays. We refer to (Vuissoz et al. 2007, 2008; Courbin et al. 2011; Tewes et al. 2013; Eulaers et al. 2013; Rathna Kumar et al. 2013; Courbin et al. 2018a; Millon et al. 2020a) for recent measurements and methodology taking into account various aspects of model and data uncertainties.

3.5.3.3 Lens mass model

The Fermat Potential (Eq. (57)) is a crucial component we need to know precisely to be able to use time-delay measurements to probe cosmic distances (Eq. (58)). High-resolution imaging of gravitational lenses is a crucial observation to achieve a precise determination of the relative Fermat potential between multiple images of a time-variable source. Imaging modeling is primarily performed on high-resolution space-based Hubble Space Telescope (HST; Suyu et al. 2010; Birrer et al. 2016; Wong et al. 2017; Rusu et al. 2020), or ground-based adaptive-optics (AO; Chen et al. 2016a, 2019, 2021a) imaging. To derive constraints on the lensing deflector from imaging data, all components that affect the imaging data need to be modeled and accounted for simultaneously with the lens model. This includes, but is not limited to, the extended source component of the AGN or transient host that is lensed, the image positions of the time variable source and its resulting point-like flux emission, the surface brightness of the deflector galaxy, differential dust extinction, and any other sources of surface brightness. In addition, instrument effects, such as the point spread function (PSF), noise (both shot-noise and instrumental noise), pixelization, and potential data reduction artifacts need to be accurately taken into account. Different techniques have been developed to jointly marginalize over a complex and unknown source morphology. These consist of regularized pixelated source reconstruction (e.g., Suyu et al. 2006, 2009), a set of basis functions such as shapelets (e.g., Birrer et al. 2015; Birrer and Amara 2018), or parameterized surface brightness profiles, such as Sersic profiles. The surface brightness amplitude components of all these methods have in common that they create a linear response on the pixels. The maximum likelihood of the data given a proposed model for the amplitude components is thus a linear problem, and the Gaussian covariance matrix of the linear coefficients can be used to analytically marginalize over the prior (e.g., Suyu et al. 2006; Birrer et al. 2015).

In the absence of knowledge of an absolute source size or brightness, imaging data constraints can not break the MST (as discussed in Sect. 3.5.1) and its generalization, the Source-Position-Transform (SPT; Schneider and Sluse 2014). The quantity that is constrained by imaging data along the radial direction is (Kochanek 2002; Sonnenfeld 2018; Kochanek 2020; Birrer 2021):

$$\begin{aligned} \xi _{\mathrm{rad}} \equiv \frac{\theta _{\mathrm{E}} \alpha _{\mathrm{E}}^{\prime \prime }}{1-\alpha _{\mathrm{E}}^{\prime }} \propto \frac{\theta _{\mathrm{E}} \alpha ^{\prime \prime }_{\mathrm{E}} }{1 - \kappa _{\mathrm{E}}} , \end{aligned}$$
(73)

where \(\alpha ^{\prime }_{\mathrm{E}}\) is the derivative and \(\alpha ^{\prime \prime }_{\mathrm{E}}\) is the double derivative of the deflection angle at the Einstein radius \(\theta _{\mathrm{E}}\), respectively, and \(\kappa _{\mathrm{E}}\) is the convergence at \(\theta _E\). We refer to Birrer (2021) for a discussion on azimuthal constraints.

The currently used data to break the MST is a measurement of the lens velocity dispersion (see Eq. (70)). The measurement is performed with high-spectral resolution spectrographs on large ground-based adaptive-optics supported instruments targeting stellar absorption lines in the rest-frame of the lensing galaxy, such as Keck-DEIMOS, Keck-KCWI, or VLT-MUSE. The velocity dispersion measurement is then a joint fit of the spectra taking into account the observation conditions, including the atmospheric absorption, the stellar templates matching the lensing galaxy type in age distribution and metallicity, and the dispersion width in the stellar distribution on top of the line-spread function. For measurements of velocity dispersion used in current time-delay cosmography studies we refer to Koopmans et al. (2003), Suyu et al. (2010), Suyu et al. (2013), Courbin et al. (2011), Wong et al. (2017), Agnello et al. (2016), Sluse et al. (2019), Buckley-Geer et al. (2020).

3.5.3.4 Line-of-sight and lens environment

The contribution of large-scale density perturbations and individual massive objects along the line-of-sight alter the lensing deflections. To first order, these effects can be captured as cosmic shear and convergence. The reduced cosmic shear term is a commonly used model component. The convergence component, however, is equivalent to an external mass-sheet \(\kappa _{\mathrm{ext}}\) (Eq. (68)), and can not be measured from imaging data. Higher-order effects from nearby groups or individual groups need to be explicitly modeled. Explicit modeling of individual groups has been done by, e.g., Fassnacht and Lubin (2002), Momcheva et al. (2006), Wilson et al. (2016), Sluse et al. (2017). For theoretical aspects of the approximation made and in which regime they hold we refer to McCully et al. (2014), McCully et al. (2017), Birrer et al. (2017), Fleury et al. (2021b).

Typically, methods taking advantage of the knowledge of the galaxy-halo connection are employed, and using luminous tracers of the underlying dark matter distribution. The most commonly used approach adopts galaxy number counts in different weighting schemes (e.g., Suyu et al. 2010; Greene et al. 2013; Rusu et al. 2017). The comparison of these weights (summary statistics) with control fields and numerical simulations with an imposed galaxy-halo connection allows the computation of the posterior density in \(\kappa _{\mathrm{ext}}\). Weak lensing mass mapping is an alternative and complementary approach (Tihhonova et al. 2018, 2020). The required data for galaxy number counts are deep multi-band photometry within several square arc minutes of the deflector, and spectroscopy of the nearby galaxies and group identification (e.g., Rusu et al. 2017; Buckley-Geer et al. 2020). For weak lensing, preferentially deep space-based images are used to reduce the shape noise and enhance the signal.

3.5.4 Systematic effects

The currently two main uncertainties that, if not properly taken into account, can lead to systematic uncertainties are the mass profile assumptions of the main deflectors and the selection effects of the lens sample used for the analysis.

3.5.4.1 Mass profile assumptions

The dominant uncertainty in the current measurement of the Hubble constant with strong gravitational lensing time delays is attributed to uncertainties in the mass profiles of the main deflector galaxies. The currently employed models mitigating the MST effect is parameterized with a pure MST parameter \(\lambda \) (Birrer et al. 2020). This parameterization is purely of mathematical nature, and leaves the physical interpretation (e.g., Blum et al. 2020) ambiguous, or, in certain regimes even un-physical, with e.g. mass profiles with negative density in the outskirts. Such a one-parameter extension to the previously considered more simple and rigid mass profiles may also not encompass the necessary flexibility beyond the pure MST that can affect kinematics observations (e.g., Birrer et al. 2020; Yıldırım et al. 2021). To make progress, the full degeneracy of the MST needs to be folded into flexible, but physically motivated, mass profile parameters, an approach explored by (Shajib et al. 2021), but not yet employed for time-delay cosmography. The kinematics observations add additional potential systematics in the inference of \(H_0\) when employed to break the MST. The primary limitation of the kinematics is the mass-anisotropy degeneracy (Binney and Mamon 1982), as well as projection effects in the light and mass profile and de-projection assumptions employed, and rotation and ellipticity moments in the data. These assumptions have to be validated sufficiently to quarantee an unbiased interpretation of the mass density profiles and hence \(H_0\) form time-delay cosmography.

3.5.4.2 Selection effects

Strong lenses are inherently tracing a narrow and rare distribution of matter in the Universe. Quantifying the selection effects, including the differential selection effects among different samples of lenses, is going to be crucial to maintain accuracy in the years to come. Selection effects can impact the line-of-sight distribution, the main deflector mass density and ellipticity, the galaxy properties of the deflector as well as of the source, and projection effects. Many of these effects can not precisely quantified on a lens-by-lens basis.

There are two approaches to mitigate selection effects. First, one can try to understanding selection from first principles, and explicitly account for the theoretical selection function in the analysis procedure. This approach requires extensive simulations and a reproducible selection function, including the discovery channel and follow-up decision. Second, one can empirically measure selection functions from a set of observables at hand with assumptions of self-similarity among galaxies and line-of-sights with identical properties, such as stellar mass, morphology, redshift and environment, and explore empirical scaling relation among them. With the anticipated large number of lenses in the near future, and the more uniform dataset of large and deep surveys, both approach will become feasible and we advocate analyses that take into account the specific discovery channel in the analysis.

The two limiting systematics, the mass profile assumptions and selection effects, result to uncertainties on the combined \(H_0\) measurement of few per cent. Pinning down these systematics to sub-percent levels with new observations and methodology is a major current undertaking of the field.

3.5.5 Main results

The H0LiCOW collaboration (Suyu et al. 2017) inferred from the independent analysis of six lensed quasar systems (Suyu et al. 2010, 2013; Wong et al. 2017; Bonvin et al. 2017; Birrer et al. 2019; Chen et al. 2019; Rusu et al. 2020) a Hubble constant value of \(H_0\) \(=73.3^{+1.7}_{-1.8}\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\), describing deflector mass density profiles by either a power law or stars (constant mass-to-light ratio) plus standard dark matter halos (Wong et al. 2020). This is a 2% precision on H\(_0\), in excellent agreement with the local distance ladder measurement by the SH0ES team (Riess et al. 2019, 2021) and more than 3\(\sigma \) statistical tension with early-Universe probes (e.g., Planck Collaboration (2020), Aiola et al. 2020). The STRIDES collaboration presented an additional lens with the most precise single-lens measurement of \(H_0\) \(=74.2^{+2.7}_{-3.0}\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\)with the same mass profile assumptions as the H0LiCOW collaboration (Shajib et al. 2020). Millon et al. (2020b) found, combining six lenses from H0LiCOW, SHARP and STRIDES, that the previous result are valid when assuming that all lenses are either one or the other of the two previously assumed forms of the mass density profile. In sum, if the mass density profiles are well described by a power-law or a constant mass-to-light ratio plus a Navarro–Frank–White (NFW, Navarro et al. 1997) dark matter halo,Footnote 13 and covariant assumptions and priors are negligible, the tension from the strong lensing measurements alone with early-Universe results is significant, corroborating other measurements, and new physics may be required.

The attention thus turned to relaxing the radial profile assumption (see Sect. 3.5.4) and the covariant treatment of population priors that can not be constrained on a lens-by-lens basis. Birrer et al. (2020) addressed the issue in the most direct way, by choosing a parameterization of the radial mass density profile that is maximally degenerate with H\(_0\), via the MST. With this more flexible parameterization, H\(_0\) is only constrained by the measured time delays and stellar kinematics, increasing the uncertainty on H\(_0\) from 2 to 8% for the TDCOSMO sample of 7 lenses resulting in \(H_0\) \(=74.5^{+5.6}_{-6.1}\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\), without changing the mean inferred value significantly.

Fig. 21
figure 21

Comparison between current cosmological constraints with strongly lensed quasars on \(H_0\)-\(\varOmega _{\mathrm{m}}\) from H0LiCOW/TDCOSMO. Purple contours: Results from H0LiCOW by Wong et al. (2020) based on six lensed quasars with assertive mass profile assumptions when averaging power-law and composite NFW plus stars (with constant mass-to-light ratio). Green contours: Results from TDCOSMO by Birrer et al. (2020) with a maximally conservative assumption on the mass density profile constraining the MST solely by kinematics data. The constraints are based on seven lensed quasars (six in common with Wong et al. (2020) and one added from STRIDES by Shajib et al. (2020), as well as 33 SLACS lenses with imaging and kinematics data (Bolton et al. 2008; Shajib et al. 2021)

Birrer et al. (2020) introduce a hierarchical framework in which external datasets can be combined with the time-delay lenses to improve the precision. They achieved a 5% precision measurement on H\(_0\) by combining the TDCOSMO lenses with stellar kinematic measurements of a sample of lenses from the Sloan Lens ACS (SLACS) survey with no time-delay information (Bolton et al. 2008; Auger et al. 2009; Shajib et al. 2021), and measure \(H_0\) \(=67.4^{+4.1}_{-3.2}\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\). The mean of the TDCOSMO+SLACS measurement is offset with respect to the TDCOSMO-only value, in the direction of the CMB value, although still statistically consistent given the uncertainties. The Birrer et al. (2020) measurements are in statistical agreement with each other. The analysis by Birrer et al. (2020) can not rule out the mass profile assumptions by earlier H0LiCOW/SHARP/STRIDES measurements with statistical significance. Birrer et al. (2020) is also consistent, by construction, with the study by Shajib et al. (2021), since they share the same measurements for SLACS. Shajib et al. (2021) concluded that using a mass profile combining an NFW profile for the dark matter component and starsFootnote 14 is a sufficiently accurate description of the mass density profile of the SLACS lenses. However, small departures from those forms are allowed by the data, resulting in the uncertainties quoted by Birrer et al. (2020). The shift in the mean could be real or it could be due to an intrinsic difference between the deflectors in the TDCOSMO and SLACS samples, arising from selection effects. For example, the two samples could be well matched in stellar velocity dispersion, but they differ in redshift, or the TDCOSMO sample could be source selected and composed mostly of quadruply imaged quasars, while SLACS is deflector selected and dominated by doubly imaged galaxies. In Fig. 21 is shown the comparison between the constraints on \(H_0\)-\(\varOmega _{\mathrm{m}}\) obtained from the H0LiCOW and TDCOSMO analyses.

3.5.6 Outlook in the near future

On the full sky, we expect to exist several 10,000 galaxy-galaxy lenses and several hundred quadruply lensed quasars (e.g., Oguri and Marshall 2010; Collett 2015). With the upcoming wide and deep ground- and space-based surveys, we expect many of those to be discovered within a decade by the Vera Rubin Observatory (LSST Science Collaboration et al. 2009), Nancy Grace Roman Space Telescope (Spergel et al. 2015), and Euclid (Laureijs et al. 2011). This is an e-folding of the number of lenses possibly suitable for time-delay analyses compared to the current analyses conducted on few lenses (e.g., 7 lenses in case of current TDCOSMO results) and will transform the measurements and approaches in the domain of time-delay cosmography. The first step in utilizing these lenses is to discover them in large datasets. The next step is to acquire all the necessary follow-up information, from monitoring data for a time-delay measurement, high-resolution imaging, to spectroscopic information about the source and lens redshift as well as velocity dispersion of the deflector. This step is going to be challenging with limited resources and there needs to be made decisions which lenses being excessively followed-up and which ones left aside. Some lenses might require less substantial follow-up in case where Rubin light curves are good enough for a time-delay measurement, and or where high-resolution and sufficiently high signal-to-noise ratio data exists from wide field space surveys, such as Euclid or Roman.

The key to assess the need for follow-up and on which lenses to spend it, is to what extend these datasets impact the precision on \(H_0\). Follow-up decisions, besides the limited resources, are currently also impacted by the accessible to adaptive optics (AO) coverage. With next-generation AO instrumentation on both hemispheres, we expect a full sky coverage of instrumentation that allows the community, at least from a technical view point, to target every single gravitational lens on the sky.

Fig. 22
figure 22

Forecast for \(H_0\) measurements in the near future with the upcoming ground- and space-based facilities. Left: Spatially resolved kinematics measurements of a sample of 40 time-delay lenses enable a precision on \(H_0\) of 1.5% (Figure adopted from Birrer and Treu 2021). Right: Standardizable magnification measurements of \(\sim \)144 gravitationally lensed supernovae enable a precision on \(H_0\) of 1.5% (Figure adopted from Birrer et al. 2021). Both approaches do constrain the MST with independent observations

The dominant uncertainty in the current measurement of the Hubble constant with strong gravitational lensing time delays is attributed to uncertainties in the mass profiles of the main deflector galaxies. There are several independent avenues of data available in the near future to approach a 1% measurement of \(H_0\) that we focus in this section.

Spatially resolved kinematics of the deflector galaxy with the next generation space (JWST, Gardner et al. 2006) and ground-based (ELT’s) instruments provides precise measurement of the kinematics and have the ability to break the mass-anisotropy degeneracy, a currently limiting systematic when using integrated kinematic measurements. Birrer and Treu (2021) forecasts that with 40 time-delay lenses with exquisite spatially resolved kinematics, a 1.5% precision on \(H_0\) can be achieved without relying on mass-density profile assumptions to break the MST, as shown in the left panel of Fig. 22 (see also, e.g., Yıldırım et al. 2020, 2021). Resolved spectroscopy can also be employed on non time-delay lenses without bright and contaminating quasar images, which can further improve the kinematic measurement precision and enlarge the dataset (Birrer et al. 2020; Birrer and Treu 2021).

Standardizable magnifications with gravitationally lensed supernovae (glSNe) provide another promising avenue to constrain the MST in the near future with the onset of Rubin. As reported in the left plot of Fig. 22, Birrer et al. (2021) provides a forecast with glSNe in constraining \(H_0\) independently of stellar kinematics. They conclude that the standardizable nature enables a 1.5% \(H_0\) measurement with a 10 years Rubin survey. On the discovery, expected number of glSNe, the challenges of following them up, and the caveats of micro-lensing, we refer to Goldstein et al. (2018), Foxley-Marrable et al. (2018), Wojtak et al. (2019), Goldstein et al. (2019), Huber et al. (2021), Birrer et al. (2021).

In summary, in the next decade with an increasing of the number of lenses and the improved data quality, a \(\sim \)1% measurement of the Hubble constant becomes feasible, when also major efforts in the validation and possible covariant systematics are being invested in.

3.6 Cosmography with cluster strong lensing

While Sect. 3.5 considers strong lensing effects produced by galaxy-scale lenses on intrinsically variable sources, this section focuses on much more massive structures in the Universe, galaxy clusters. In particular, we illustrate the principles of cluster strong lensing cosmography.

3.6.1 Basic idea and equations

For simplicity, we use the thin-screen approximation, i.e. we assume that the lens total mass distribution is confined on a plane, called the lens plane. In addition, we assume a single lens plane. The equations described in Sect. 3.5.1 remain valid in this context. The measurement of relative time-delays between the multiple images of intrinsically variable sources lensed by galaxy clusters can be used to constrain cosmological parameters such as \(H_0\).

Due to their large mass, galaxy clusters can have large cross sections for strong lensing. The size of these cross sections depends on several properties of the lenses, including their total mass, dynamical state, ellipticity and asymmetry (Torri et al. 2004; Hennawi et al. 2007; Meneghetti et al. 2007, 2010). It is not uncommon that massive clusters strongly lens several tens of background sources simultaneously (e.g., Postman et al. 2012; Lotz et al. 2014; Coe et al. 2019; Steinhardt et al. 2020; Caminha et al. 2017b, 2019; Lagattuta et al. 2019; Bergamini et al. 2021b). In this case, additional constraints on the cosmological parameters can be set, even with sources that are not intrinsically variable and for which relative time-delays cannot be measured.

Equations 50, 51, and 52 show that the difference between the observed and intrinsic positions of a source whose light is deflected by a gravitational lens is the product of two factors. The first factor is the deflection angle \(\hat{\varvec{\alpha }}(\varvec{\theta })\), which is proportional to the two-dimensional gradient of the integral of the lens Newtonian gravitational potential along the line-of-sight:

$$\begin{aligned} \hat{\varvec{\alpha }}(\varvec{\theta }) = \frac{2}{c^2}\varvec{\nabla }\int \varPhi (\varvec{\theta },l) \mathrm{d}l . \end{aligned}$$
(74)

Thus, the deflection angle depends on the lens total mass distribution.

The second factor is the ratio between the angular diameter distances \(D_{\mathrm{ds}}\) and \(D_{\mathrm{s}}\). In a flat cosmological model, the angular diameter distance to redshift z is given by:

$$\begin{aligned} D_{\mathrm{A}}(z) = \frac{c}{H_0}\frac{1}{1+z}\int _0^z\frac{\mathrm{d}z}{[\varOmega _{\mathrm{m}}(1+z)^3+(1-\varOmega _{\mathrm{m}})(1+z)^{3(1+w)}]^{1/2}} , \end{aligned}$$
(75)

where w is the EoS parameter for the dark energy. Thus, the angular diameter distance depends on the values of cosmological parameters, such as \(H_0\), \(\varOmega _{\mathrm{m}}\), and w. The ratio of two angular diameter distances does not depend on \(H_0\).

For simplicity, we consider a circular symmetric lens and choose to measure the angular positions \(\varvec{\theta }\) and \(\varvec{\beta }\) with respect to the lens center. The deflection angle for any circular-symmetric mass distribution is:

$$\begin{aligned} \hat{\varvec{\alpha }}(\varvec{\theta }) = \frac{4GM(|\varvec{\theta }|)}{c^2D_\mathrm{d}|\varvec{\theta }|^2}\varvec{\theta }. \end{aligned}$$
(76)
Fig. 23
figure 23

Sensitivity of the family ratio to the values of the cosmological parameters \(\varOmega _{\mathrm{m}}\) and \(w_0\). The left panel shows the family ratio for a lens at redshift \(z_{\mathrm{d}}=0.5\) in a flat \(\Lambda \)CDM cosmological model with \(\varOmega _{\mathrm{m}}\)=0.3, \(w(t)=w_0=-1\), and \(H_0\)=70 \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\). We assume \(z_\mathrm{s,1}=1\). The solid black curve describes how the family ratio varies as a function of the second source redshift, \(z_{\mathrm{s,2}}\). The shaded blue region indicates the 95% prediction interval estimated by sampling the parameter plane \(\varOmega _{\mathrm{m}}\)-\(w_0\) assuming uniform priors (\(\varOmega _{\mathrm{m}}\) \(\in [0.1,1]\) and \(w_0\in [-2,-0.5]\)). The right panel shows the results of the Sobol’s sensitivity analysis. We show the first-order Sobol index for both parameters as a function of \(z_{\mathrm{s,2}}\)

Inserting Eq. 76 into Eq. 51, we obtain that the image of a source perfectly aligned with the lens and the observer (\(\varvec{\beta }= 0\)) is a ring, whose angular size is:

$$\begin{aligned} \theta _{\mathrm{E}}(z_{\mathrm{d}},z_{\mathrm{s}}) = \sqrt{\frac{4GM[\theta _\mathrm{E}(z_{\mathrm{d}},z_{\mathrm{s}})]}{c^2}\frac{D_{\mathrm{ds}}}{D_{\mathrm{d}}D_{\mathrm{s}}}} . \end{aligned}$$
(77)

This radius is called Einstein radius. The mass \(M(\theta _\mathrm{E})\) is the projected mass enclosed by the ring.

In the case of two sources at redshifts \(z_{\mathrm{s,1}}\) and \(z_\mathrm{s,2}\) aligned with the lens and the observer, the ratio of the corresponding Einstein radii is given by:

$$\begin{aligned} \frac{\theta _{\mathrm{E}}(z_{\mathrm{d}},z_{\mathrm{s,1}})}{\theta _{\mathrm{E}}(z_\mathrm{d},z_{\mathrm{s,2}})} = \sqrt{\frac{M[\theta _{\mathrm{E}}(z_{\mathrm{d}},z_\mathrm{s,1})]}{M[\theta _{\mathrm{E}}(z_{\mathrm{d}},z_{\mathrm{s,2}})]} \frac{D_\mathrm{ds}(z_{\mathrm{d}},z_{\mathrm{s,1}})}{D_{\mathrm{s}}(z_{\mathrm{s,1}})}\frac{D_\mathrm{s}(z_{\mathrm{s,2}})}{D_{\mathrm{ds}}(z_{\mathrm{d}},z_{\mathrm{s,2}})}} . \end{aligned}$$
(78)

The function:

$$\begin{aligned} \varXi (z_{\mathrm{d}},z_{\mathrm{s,1}},z_{\mathrm{s,2}}) = \frac{D_\mathrm{ds}(z_{\mathrm{d}},z_{\mathrm{s,1}})}{D_{\mathrm{s}}(z_{\mathrm{s,1}})}\frac{D_\mathrm{s}(z_{\mathrm{s,2}})}{D_{\mathrm{ds}}(z_{\mathrm{d}},z_{\mathrm{s,2}})} \end{aligned}$$
(79)

is called the family ratio, and depends on the values of cosmological parameters, such as \(\varOmega _{\mathrm{m}}\) and w. This result holds also in the case of sources not perfectly aligned with the lens and the observer, or for lenses whose total mass distribution is not circular. The general principle is that the relative positions of multiple image families depend both on the lens mass distribution and the family ratios.

Fig. 24
figure 24

Similar to Fig. 23, but showing the sensitivity of the time-delay distance, \(D_{\varDelta t}(z_{\mathrm{s}})\), to the values of the cosmological parameters \(H_0\), \(\varOmega _{\mathrm{m}}\), and \(w_0\)

In the left panel of Fig. 23, we show the family ratio for a lens at redshift \(z_{\mathrm{d}}=0.5\) in a flat \(\Lambda \)CDM cosmological model with \(\varOmega _{\mathrm{m}}\)= 0.3, \(w(z)=w_0=-1\), and \(H_0\)= 70 \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\). We assume \(z_{\mathrm{s,1}}=1\). The solid black curve describes how the family ratio varies as a function of the second source redshift \(z_{\mathrm{s,2}}\). The shaded blue region indicates the 95% prediction interval estimated by sampling the parameter plane \(\varOmega _{\mathrm{m}}\)-\(w_0\) using the Saltelli’s scheme (Saltelli 2002). We assume uniform priors on the cosmological parameters, with \(\varOmega _{\mathrm{m}}\) \(\in [0.1,1]\) and \(w_0\in [-2,-0.5]\). Performing a Sobol’s sensitivity analysis (Sobol’ 2001; Saltelli et al. 2010), we find that \(\sim 60-70\%\) of the variance of the family ratio is due to the variance of \(\varOmega _{\mathrm{m}}\), as indicated by the first-order Sobol index S1 plotted in the right panel. The contribution of the \(w_0\) variance amounts to \(\sim 10--25\%\), while the remainder of the variance is due to second-order interactions between \(\varOmega _{\mathrm{m}}\) and \(w_0\). Thus, the family ratio is primarily sensitive to \(\varOmega _{\mathrm{m}}\), but it is also sensitive to the dark energy equation of state.

As shown in Fig. 24, performing a similar analysis for the time-delay distance (assuming again flat priors on the cosmological parameters, with \(H_0\) \(\in [50,100]\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\), \(\varOmega _{\mathrm{m}}\) \(\in [0.1,1]\) and \(w_0\in [-2,-0.5]\)), we find that the variance of this quantity is mostly contributed by the variance of \(H_0\) (\(\sim 90--95\%\)), while the sensitivity to other cosmological parameters is much weaker. Thus, these results suggest that the family ratio and the time-delay distance are highly complementary cosmological probes.

The existing degeneracy between the parameters \(w_0\) and \(\varOmega _{\mathrm{m}}\) estimated from the family ratios of several multiply imaged sources is illustrated in Fig. 25. We assume to fit 45 family ratios obtained by combining 10 multiply imaged sources uniformly distributed in redshift between \(z=1\) and \(z=6\). The confidence contours (at 1, 2, and 3\(\sigma \)) do not account for the uncertainties related to lens modeling (see the discussion in Sect. 3.6.5). As we see, the degeneracy is strong: we obtain similar family ratios in cosmologies with high value of \(w_0\) and low value of \(\varOmega _{\mathrm{m}}\) and vice-versa. Breaking the degeneracy requires increasing the number of constraints by either accumulating a larger number of multiple image families by means of deeper observations of single clusters or stacking multiple lenses (Gilmore and Natarajan 2009).

Fig. 25
figure 25

Degeneracy between the parameters \(w_0\), and \(\varOmega _{\mathrm{m}}\), derived by fitting 45 family ratios (obtained by combining 10 multiply imaged sources uniformly distributed between \(z=1\) and \(z=6\)). The dashed lines indicate the true values of the cosmological parameters

3.6.2 Sample selection

Currently, only five lens galaxy clusters with multiple images of time-varying sources (3 QSOs and 2 SNe) and measured time-delays are known. Systematic searches for gravitationally lensed quasars over a range of angular separations have ramped up in the last 20 years with the availability of the Sloan survey. The SDSS Quasar Lens Search (SQLS, Oguri et al. 2006) used a combination of morphological and color selection criteria applied to a SDSS sample of spectroscopically confirmed QSOs to find over 200 candidate strongly lensed QSOs. A few of these were found with angular separations exceeding 10\(^{\prime \prime }\), characteristic of group-cluster scale lenses; remarkable examples are SDSS J1004\(+\)4112 (Inada et al. 2003), a five-image lensed QSO with a separation of \(\sim \! 15^{\prime \prime }\), and SDSS J1029\(+\)2623 (Inada et al. 2006), a three-image system with the largest known separation to date (\(\sim \! 23^{\prime \prime }\)). The SLOAN Giant Arc Survey (Hennawi et al. 2008), a study largely based on visual inspection of strong lensing features around massive clusters, has discovered SDSS J2222\(+\)2745 (Dahle et al. 2013), with six detected images of a QSO at \(z=2.2\) with a separation of \(\sim \! 15^{\prime \prime }\). More advanced methods to search for multiply lensed quasars based on machine-learning techniques, which work directly on image cutouts using neural network pattern recognition methods, have been developed over the last years (e.g. Agnello et al. 2015; Petrillo et al. 2017; Metcalf et al. 2019; Cañameras et al. 2020), and are being applied to new wide-area optical surveys such as the Dark Energy Survey (Huang et al. 2020). Machine-learning methods have also recently been applied to search for strongly lensed quasars selected from Gaia catalogs, in combination with near-IR surveys to identify likely lenses (e.g. Stern et al. 2021). These techniques have initially been developed to discover lensed QSO systems with small separations (a few arcsec), however they can be easily extended to cluster-scale lenses.

The Vera C. Rubin Observatory project (LSST Science Collaboration et al. 2009) will discover hundreds of new multiply imaged QSOs and SNe (a few tens of which will be lensed by galaxy clusters) and will measure their time-delays (Oguri and Marshall 2010). The latter can require time consuming monitor campaigns, particularly in case of cluster-scale lenses.

By assuming a singular isothermal sphere (SIS) profile for the total mass distribution of the lens (cluster), the time-delay between the two multiple images of the same background source can vary between 0 and a maximum value given by:

$$\begin{aligned} \varDelta t_{\mathrm{SIS,max}}&= \frac{1+z_{\mathrm{d}}}{c^5}\, \frac{D_\mathrm{d}\,D_{\mathrm{ds}}}{D_{\mathrm{s}}}\, 32\pi ^2\sigma _{\mathrm{SIS}}^4\nonumber \\&= 127.5\,(1+z_{\mathrm{d}})\left( \frac{D_{\mathrm{d}}\,D_{\mathrm{ds}}}{D_\mathrm{s}\,1\,{\mathrm{Gpc}}} \right) \left ( \frac{\sigma _{\mathrm{SIS}}}{1000\,{\mathrm{km}}\,{\mathrm{s}}^{-1}}\right )^4\,{\mathrm{yr}} , \end{aligned}$$
(80)

where \(\sigma _{\mathrm{SIS}}\) is the value of the effective velocity dispersion associated to the isothermal total mass profile. Alternatively, one can write:

$$\begin{aligned} \varDelta t_{\mathrm{SIS,max}} = \frac{D_{\varDelta t}}{c}\, 2\theta _{\mathrm{E}}^2= \frac{D_{\varDelta t}}{L_H}\, \frac{2}{H_0}\theta _{\mathrm{E}}^2 = \left( \frac{D_{\varDelta t}}{L_H}\right) \, 0.66\, \theta _{\mathrm{E}}^2(\mathrm{arcsec})\; \mathrm{yr}, \end{aligned}$$
(81)

where \(D_{\varDelta t}\) is the time delay distance (Eq. 58), \(L_H=c H_0^{-1}\) is the Hubble length, so that \(D_{\varDelta t}/L_H\lesssim 1\), and \(\theta _{\mathrm{E}}\) the Einstein radius, ranging from a few arcsec to \(\sim \! 15^{\prime \prime }\) (\(H_0\) \(=70\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\)is adopted).

Cluster-scale multiply lensed quasars (e.g. Dahle et al. 2015; Fohlmeister et al. 2013), as well as the multiple images of SN Refsdal (the first multiply-imaged and spatially resolved supernova, Kelly et al. 2015) and SN Requiem (Rodney et al. 2021) do indeed show model-predicted and measured time-delays spanning from a few days to years and tens of years.

3.6.3 Measurements

To constrain cosmology using the strong lensing cosmography approach, one has to simultaneously fit as many strong lensing constraints as possible, using a model that incorporates the cluster mass distribution, and the cosmological parameters. This process is called lens inversion. There are two general classes of inversion algorithms. A first approach is called free-form, wherein the cluster is subdivided into a mesh on to which the lensing observables are mapped, and which is then transformed into a pixelized mass distribution. Other methods comprise parametric models (e.g. Kneib et al. 1996; Jullo et al. 2007; Jullo and Kneib 2009), wherein the mass distribution is reconstructed by combining clumps of matter on different scales. One or more large-scale mass components are used to describe the diffuse cluster dark matter halo. They are often positioned where the brightest cluster galaxies are located. The other cluster galaxies are used to trace the cluster substructure. Both the large and small scale mass components have density profiles given by analytic functions.

Using either of these approaches, the cluster mass distribution is described by a set of parameters (which can be a set of pixel values or parameters describing the shape and density profiles of each mass clump). Let \({\mathbf {p}}\) be the totality of the parameters used to model the cluster mass distribution, and \(\mathbf {p_{\mathrm{cosmo}}}\) the cosmological parameters we want to estimate.

The strong lensing constraints are generally in the form of positions of multiple images of several sources. These images are identified based on the morphology and color similarities of the lensed features. Gravitational lensing conserves the source surface brightness, implying that several source properties (e.g., star forming regions, spiral arms, bulges, etc.) can be recognized in all their multiple images. The geometry of the lens, inferred from the spatial distribution of the cluster galaxies, is useful to find counter-images of a given source. Typically, the cluster galaxies in the central regions of galaxy clusters are early-type galaxies, most of which can be recognized because they populate a red sequence in the color-magnitude diagram. More sophisticated methods to identify these galaxies and separate them from foreground and background sources also include deep-learning models trained using multi-band images (Angora et al. 2020). Thus, finding multiple images and cluster members requires high resolution multi-band imaging observations that only the Hubble Space Telescope (HST) can currently deliver.

Candidate multiple images can be confirmed by verifying that they have similar spectra. Spectroscopy is also crucial to measure the redshifts of lenses and sources. Without redshifts it is impossible to convert angular scales into physically meaningful units.

The multiple images of the same source form a family. Each family provides some constraints on the lens deflection field. Indeed, given a source at the intrinsic angular position \(\varvec{\beta }\), the positions of its images, \(\varvec{\theta }_i\), satisfy the lens equation (Eq. 51). In the case of intrinsically variable sources, such as Supernovae or QSOs, we can derive additional constraints by measuring the relative time-delays between the multiple images. In the following equations, we assume that both positional and time-delay measurements are available for the lensing analysis. Nevertheless, we remark once more that the strong lensing cosmography approach can be used to estimate the values of cosmological parameters, such as those of \(\varOmega _{\mathrm{m}}\), \(\varOmega _{\mathrm{de}}\), and w, even without measuring time-delays.

The cluster potential and the cosmological model are constrained simultaneously by maximizing the posterior probability distribution:

$$\begin{aligned} P(\varvec{\theta ^{\mathrm{obs}}}\frown \mathbf {\varDelta t^{obs}}|{\mathbf {p}}\frown {\mathbf {p}}_{\mathrm{cosmo}}) \propto P({\mathbf {p}}\frown {\mathbf {p}}_{\mathrm{cosmo}}|\varvec{\theta ^\mathrm{obs}}\frown \mathbf {\varDelta t^{obs}})\cdot P({\mathbf {p}} \frown {\mathbf {p}}_{\mathrm{cosmo}}) , \end{aligned}$$
(82)

where \(\varvec{\theta ^{\mathrm{obs}}}\) and \(\mathbf {\varDelta t^{obs}}\) are the observed positions and relative time-delays of the multiple images, respectively. The symbol \(\frown \) denotes the concatenation operator. The model likelihood is given by:

$$\begin{aligned} {\mathcal {L}}({\mathbf {p}}\frown {\mathbf {p}}_\mathrm{cosmo}|\varvec{\theta ^{\mathrm{obs}}}\frown \mathbf {\varDelta t^{obs}}) \propto \exp {\left[ -\frac{1}{2}\chi ^2({\mathbf {p}}\frown {\mathbf {p}}_\mathrm{cosmo})\right] } . \end{aligned}$$
(83)

Since the datasets (positions and time-delays) are independent, the likelihood is separable. Thus the \(\chi ^2({\mathbf {p}}\frown {\mathbf {p}}_{\mathrm{cosmo}})\) function is the sum of two terms. The first quantifies the separation between the observed and the model-predicted multiple image positions:

$$\begin{aligned} \chi ^{2}_\mathrm{pos}({\mathbf {p}}\frown {\mathbf {p}}_{\mathrm{cosmo}}) = \sum _{i=1}^{N_{fam}} \sum _{j=1}^{n_{i}}\left( \frac{\left\| \varvec{\theta ^\mathrm{obs}}_{i, j}-\varvec{\theta ^{\mathrm{pred}}}_{i, j}({\mathbf {p}}\frown {\mathbf {p}}_\mathrm{cosmo})\right\| }{\sigma _{\varvec{\theta }_{i, j}}}\right) ^{2} , \end{aligned}$$
(84)

where \(\varvec{\theta ^{\mathrm{obs}}}_{i, j}\) and \(\varvec{\theta ^{\mathrm{pred}}}_{i, j}\) are the observed and model-predicted positions of the j-th multiple image belonging to the i-th family, \(N_{fam}\) is the total number of multiple image families, and \(n_{i}\) is the number of multiple images belonging to the i-th family. The uncertainty on the image positions, \(\sigma _{\varvec{\theta }_{i, j}}\), is generally unknown. It depends not only on the effective resolution of the observations (i.e. the pixel scale and the size of the Point-Spread-Function), but also on several properties of the lens not directly accounted for in the lens model (such as unseen substructures in the cluster or along the line-of-sight or asymmetries of the dark matter distribution). Generally, this uncertainty is scaled to obtain a value of reduced \(\chi ^2\) of \(\sim 1\) (Bergamini et al. 2021b).

The second term quantifies the difference between the observed and model-predicted relative time-delays:

$$\begin{aligned} \chi ^2_{\mathrm{td}}({\mathbf {p}}\frown {\mathbf {p}}_{\mathrm{cosmo}}) = \sum _i^{N_{fam,\mathrm{td}}}\sum _j^{n_{i,\mathrm{td}}-1} \left( \frac{\left\| \varDelta t_{i,j}^{\mathrm{obs}} - \varDelta t_{i,j}^\mathrm{pred}({\mathbf {p}}\frown {\mathbf {p}}_{\mathrm{cosmo}}) \right\| }{\sigma _{\varDelta t_{i,j}}}\right) ^2 , \end{aligned}$$
(85)

where \(N_{fam,\mathrm{td}}\) is the number of families of multiple images for measured time-delays, \(n_{i,\mathrm{td}}\) is the number of multiple images of the i-th family (note that this implies \(n_{i,\mathrm{td}}-1\) relative time-delays measurements after choosing the \(n_{i,\mathrm{td}}\)-th image as reference), \(\varDelta t_{i,j}^\mathrm{obs}\) and \(\varDelta t_{i,j}^{\mathrm{pred}}\) are the observed and model-predicted relative time-delays of the j-th multiple image belonging to the i-th family, and \(\sigma _{\varDelta t_{i,j}}\) is the error of the time-delay \(\varDelta t_{i,j}^{\mathrm{obs}}\).

If the lens model and the cosmology are constrained by the positions of \(N_{im}^{tot}=\sum _{i=1}^{N_{fam}} n_{i}\) observed multiple images and \(N_{\mathrm{td}}^{tot}=\sum _{i=1}^{N_{fam,td}} (n_{i,\mathrm{td}}-1)\) relative time-delay measurements, by defining \(N_{par}\) as the total number of model free parameters, we can write the number of degrees-of-freedom (DoF) of the lens model as:

$$\begin{aligned} \mathrm {DoF} = 2 \times N_{im}^{tot} + N_\mathrm{td}^{tot} - 2 \times N_{fam} - N_{par} = N_{con}-N_{par} \;. \end{aligned}$$
(86)

The term \(2 \times N_{fam}\) stems from the fact that the unknown positions of the \(N_{fam}\) background sources (2 coordinates for each of them) are additional free parameters of the model. Thus, \(N_{con}\) is the effective number of available constraints.

3.6.4 Systematic effects

As described in Sect. 3.5, the strong lens time-delay method has been successfully utilized with quasars lensed by galaxies. Several studies (e.g., Birrer et al. 2016; Treu and Marshall 2016; Suyu et al. 2017) have recognized that, in addition to the spectroscopic redshifts of the lens and the source, the most important steps toward accurate and precise cosmological measurements are: i) precise time-delays, ii) high-resolution images of the lensed sources, iii) precise stellar kinematics of the lens galaxy, and iv) detailed information about the lens environment. Long-term monitoring campaigns of lensed quasars with optical, notably by the COSMOGRAIL collaboration (e.g., Tewes et al. 2013b; Courbin et al. 2018b), or radio (e.g., Fassnacht et al. 2002) telescopes, together with advances in light-curve analyses (e.g., Tewes et al. 2013a; Hojjati et al. 2013), have provided precise time-delays. To convert these delays to cosmologically relevant quantities, an accurate lens mass model is needed, particularly concerning its radial total mass density profile. Steeper profiles yield larger Fermat potential differences between two images, resulting in larger inferred values of \(H_0\) (Wucknitz 2002; Kochanek 2002). In addition to the main lens, there could be other mass contributions, associated to galaxies belonging to the same group/cluster of the main lens or to line-of-sight structures. If not properly accounted for, this term represents an important source of systematic error, the so-called mass-sheet degeneracy (Falco et al. 1985; Schneider and Sluse 2013), in the model prediction of the time-delays. This clarifies why the extended reconstruction of multiple images, the use of independent mass diagnostics (e.g., stellar dynamics; see Treu and Koopmans 2002) for the main lens, and a detailed characterization of its environment (i.e., points ii), iii), and iv) listed above) are so relevant to a very accurate total mass model of the lens, thus to the success of this cosmological probe (e.g., Suyu et al. 2014; Birrer et al. 2016; McCully et al. 2017; Rusu et al. 2017; Sluse et al. 2017; Shajib et al. 2018; Tihhonova et al. 2018).

Despite being more complex than that of an isolated galaxy, the strong lensing modeling of a galaxy cluster presents some advantages. First, the identification of several multiple images, some of which might be very close to the cluster center and radially elongated, provides important information about the slope of the cluster total mass density profile (see, e.g., Caminha et al. 2017b). Second, the frequent observations of pairs of angularly close multiple images from sources at different redshifts (see, e.g., Grillo et al. 2016) locate very precisely the positions of the lens tangential critical curves, thus resulting in precise calibrations of the projected total mass of the cluster within different apertures. These facts reduce the need to rely on different total mass diagnostic, such as stellar dynamics in lens galaxies. Moreover, the large number of secure and spectroscopically confirmed multiple images observed in galaxy clusters allows one to choose the best mass model among the different tested ones (i.e., the best reconstruction of the cluster mass components; see Grillo et al. 2015, 2016), according to the value of the minimum \(\chi ^2\). As shown in Grillo et al. (2015, 2016), it is remarkable that all considered mass models lead to statistical and systematic relative errors of only a few percent for the cluster total mass. Very good agreement has also been found with the measurements from independent total mass diagnostics, e.g. those from weak lensing, dynamical and and X-ray observations (see, e.g., Grillo et al. 2015; Balestra et al. 2016; Caminha et al. 2017b). In addition, in a galaxy cluster, the modeling of its different mass components (i.e., extended dark-matter haloes, cluster members, and possibly hot gas; see, e.g., Bonamigo et al. 2017, 2018; Annunziatella et al. 2017) provides a good first-order approximation of possible additional lensing effects (i.e., of the environment) in the regions adjacent to where the time-delays can be measured. Some recent studies have exploited kinematic data for the cluster members to model more realistically their total mass contribution through scaling relations with non-zero scatter or information from the Fundamental Plane relation (e.g., Bergamini et al. 2021a; Granata et al. 2021). In summary, if extensive multi-color and spectroscopic information is available in lens galaxy clusters, robust mass maps can be constructed (see Grillo et al. 2015; Caminha et al. 2017a; Lagattuta et al. 2017). The feasibility of using the measured time-delays of the first multiply-imaged and spatially-resolved supernova (SN “Refsdal”) for measuring \(H_0\) with high statistical precision has been demonstrated (Grillo et al. 2018a), and a full systematic analysis has been performed (Grillo et al. 2020). Adding to the model a uniform sheet of mass at the cluster redshift or a cluster main mass density profile with a variable slope (optimized together with all the other model parameters), result in \(H_0\) probability distribution functions that are just slightly broader than those without these extra model parameters. Based on our previous studies (see, e.g., Chirivì et al. 2018, on the influence of mass structures along the line of sight on lensing modeling), systematic effects in lens galaxy clusters seem to be controlled to a level similar to or even lower than the statistical uncertainties, given the exquisite datasets in hand and soon becoming available, making time-delay cluster cosmography a potentially very competitive method.

Finally, we remark that in any cluster strong lensing model the values of the cosmological parameters and those defining the mass distribution of the lens are not independent, and they cannot be considered separately in obtaining model-predicted quantities (e.g., the time-delays, positions, and flux ratios of the multiple images). Therefore, the results obtained by simplistically keeping the cluster mass distribution fixed are likely to underestimate the uncertainty on the values of the cosmological parameters, and possibly introduce biases, since they neglect the covariance between the cosmological and cluster mass model parameters (see, e.g., Acebron et al. 2017). Zitrin et al. (2014) confirm that the values of the cosmological parameters are biased when they are estimated by applying a fixed cluster mass distribution for correcting the luminosity distances of lensed SNe Ia.

3.6.5 Main results and forecasts

As detailed in Sect. 3.6.1, time-delay distances are primarily sensitive to the value of \(H_0\), and more mildly to those of other cosmological parameters. In galaxy clusters, usually showing several multiple images, different values of the family ratio (see Eq. 79) can be used at the same time to add constraints on the values of the cosmological matter (\(\varOmega _{\mathrm{m}}\)) and dark-energy (\(\varOmega _{\mathrm{de}}\)) density parameters, defining the global geometry of the Universe. In general, the cosmological contribution is difficult to disentangle from that associated to the total mass of a lens, because of a strong degeneracy between the two. However, when a significant number of multiply lensed sources (with spectroscopic redshifts spanning a wide range) is present, valuable information about the cosmological parameters can be inferred. This technique has been applied without time-delay measurements in the galaxy clusters Abell 2218 (Soucail et al. 2004), Abell 1689 (Jullo et al. 2010) and, more recently, RXC J2248.7−4431 (Caminha et al. 2016), and in combination with time-delay measurements in MACS J1149.5+2223 for the first time (see Fig. 3 of Grillo et al. 2018a).

In Caminha et al. (2016), by exploiting the observed positions of 47 multiple images, 24 of which spectroscopically confirmed, from a total of 16 background sources over the redshift range 1.0–6.1, a comprehensive study of the total mass distribution of the galaxy cluster RXC J2248.7−4431 with a set of high precision strong lensing models has resulted into measurements (from lensing only) of the values of \(\varOmega _{\mathrm{m}}\) and \(w_0\) with, respectively, between \(\sim \,\)40% and \(\sim \,\)60%, depending on the adopted cosmological model, and \(\sim \,\)30% (1\(\sigma \)) statistical uncertainties. In Caminha et al. (2022), thanks to a sample of five detailed cluster total mass models, it has been demonstrated that strong lensing measurements of the values of the cosmological parameters are complementary and in good agreement with the estimates from the CMB, BAO, and SNe Ia. In particular, the strong lensing cosmographic analysis has allowed to improve the constraints from the CMB on the values of \(\varOmega _{\mathrm{m}}\) and \(w_0\) (in a flat wCDM model) by factors of 2.5 and 4.0, respectively.

By using the observed positions of 89 multiple images, with extensive spectroscopic information, from 28 background sources and the measured time-delays between the images S1-S4 and SX of SN Refsdal, Grillo et al. (2018a) have inferred blindly the values of \(H_0\) and \(\varOmega _{\mathrm{m}}\) with relative (1\(\sigma \)) statistical errors of, respectively, 6% (7%) and 31% (26%) in flat (general) cosmological models, assuming a conservative 3% uncertainty on the final time-delay of image SX and, remarkably, no priors from other cosmological experiments. Moreover, by investigating separately the impact of a constant sheet of mass at the cluster redshift (see Figs. 26 and 27), of a power-law profile for the mass density of the cluster main halo and of some scatter in the cluster member scaling relations, Grillo et al. (2020) have found that, in a flat \(\Lambda \)CDM cosmology, these systematic effects do not introduce a significant bias on the inferred values of \(H_0\) and \(\varOmega _{\mathrm{m}}\), and that the statistical uncertainties dominate the total error budget: a 3% uncertainty on the time-delay of image SX translates into approximately 6% and 40% (including both statistical and systematic 1\(\sigma \)) uncertainties for \(H_0\) and \(\varOmega _{\mathrm{m}}\), respectively. They have also presented the interesting possibility of measuring the value of the EoS parameter w of the dark energy density with a 30% uncertainty (see Fig. 26).

Fig. 26
figure 26

Confidence regions (at 1 and 2\(\sigma \) levels) and median values (crosses) of \(H_{0}\), w and \(\varOmega _{\mathrm{m}}\) obtained from the lensing models of SN Refsdal (adapted from Grillo et al. 2020). Dotted lines corresponding to the 16th and 84th percentiles for each parameter. A time-delay between SX and S1 of 345 days with a 2% relative error is adopted. Flat wCDM models (\(\varOmega _{\mathrm{m}}\)+\(\varOmega _{\mathrm{de}}\)=1) with uniform priors on the values of the cosmological parameters (\(H_0\) \(\in [20,120]\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\), \(\varOmega _{\mathrm{m}}\) \(\in [0,1]\) and \(w_0 \in [-2,0]\)) are considered. Constraints on the matter density and dark energy EoS parameters are mostly due to the angular diameter distance ratios (Eq. 79), whereas those on the Hubble constant are mainly driven by optimizing the measured time-delay of SN Refsdal with the blind mass model by Grillo et al. (2016)

Fig. 27
figure 27

The impact of the mass sheet degeneracy (MSD) on the lensing model of SN Refsdal, where \(k_0\) is the value of the convergence of a constant sheet of mass at the cluster redshift (Grillo et al. 2020). In red: confidence contour levels at 1 and 2\(\sigma \) for \(H_0\) and \(k_0\) obtained using all (89) multiple images at different redshifts. A time-delay between SX and S1 of \(345\pm 10\) days is adopted. In this case the best fit model yields a vanishing mass-sheet (\(k_0=0.00_{-0.08}^{+0.06}\), see vertical dotted line). In gray: confidence regions obtained from a model using only those images (63) belonging to SN Refsdal and its host, all at \(z=1.49\). The dashed-dotted line illustrates the theoretical effect of the MSD (Schneider and Sluse 2013). Flat \(\Lambda \)CDM models (\(\varOmega _{\mathrm{m}}\)+\(\varOmega _{\varLambda }\)=1) with uniform priors on the values of the cosmological parameters (\(H_0\) \(\in [20,120]\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\)and \(\varOmega _{\mathrm{m}}\) \(\in [0,1]\)) and on the value of k\(_{0}\) (\(\in [-0.2,0.2]\) or \([-0.5,0.5]\)) are considered

By comparing different results of the strong lens time-delay method, with SN Refsdal in MACS J1149.5+2223 (Grillo et al. 2020) and with lensed quasars in the galaxy-scale systems of the H0LiCOW program (Suyu et al. 2017), we can conclude that i) the relative error on the inferred value of \(H_0\) from a single (galaxy or cluster) strong lensing system is similar (mean value of 6.4% in Fig. 2 of Wong et al. 2020), ii) in a single lens cluster, there is the additional possibility of estimating the value of \(\varOmega _{\mathrm{m}}\) (and w), thanks to the observations of different multiple-image families with spectroscopically confirmed redshifts and to the measurements of the time-delay value between the multiple images of intrinsically variable sources, and iii) the observed positions of many spectroscopic multiple images (some of which are key to locating the lens tangential and radial critical curves) provide precise calibrations of the different mass components (i.e., extended dark-matter halos, cluster members, and hot gas) considered in the model of a galaxy cluster and, thus, also a good approximation of the effect of the “environment” where the time-delays are measured.

In particular, it has been tested on models with either the entire sample of 89 multiple images from 28 sources at different redshifts, or only the 63 multiple images from SN Refsdal and its host, all at the same redshift, that in the former case the effect of the so-called “mass-sheet degeneracy” is significantly reduced (Grillo et al. 2020). More quantitatively, this has produced an approximately 9% difference in the median value of \(H_0\) (and of \(\varOmega _{\mathrm{m}}\)), and a remarkable reduction by a factor of more than 3, from \(\sim \,\)21% to \(\sim \,\)6%, for its uncertainty (from \(\sim \,\)63% to \(\sim \,\)40% for the uncertainty on \(\varOmega _{\mathrm{m}}\)).

In each lens galaxy cluster, the combination of the positions of several tens of spectroscopically confirmed multiple images and of one or more time-delays between the multiple images of a lensed QSO or SN will allow one to determine the lens Fermat potential differences with a \(\sim \,\)5% uncertainty (including both the statistical and systematic errors, as shown by Grillo et al. 2018a, 2020; Acebron et al. 2021). The planned modeling of the extended surface brightness distributions and kinematic maps of some of multiple images will very likely reduce this uncertainty below 5%. For each time-varying source lensed by a galaxy cluster, the longest time-delay between its multiple images will be measured with a \(\sim \,\)2% error (as obtained so far for the known systems). This will result in a \(\lesssim \,\)6% total uncertainty on the value of \(H_0\) estimated from a single lens cluster. The dataset that are already available for the first three lens clusters will already provide a combined \(\approx \)3% uncertainty on \(H_0\), that will be reduced to \(\lesssim \,\)2%, when a sample of \(\sim \,\)10 lens clusters will be completed, thanks also to the new data from the Rubin survey.

3.7 Cosmic voids

The largest discernible structures of the Universe make up the so-called cosmic web. It represents a network of compact nodes that are connected by filaments and walls of lower density (Zel’dovich 1970). The remaining space is taken up by cosmic voids, extended regions of very low matter content (e.g., Zeldovich et al. 1982; Bertschinger 1985; van de Weygaert and van Kampen 1993). The nodes are occupied by groups and clusters of galaxies, which makes them the most luminous and thus best identifiable individual structures at cosmological distances. The contrary is the case for voids, which host the least luminous galaxies in the cosmos and have only been discovered in the late 1970s (Gregory and Thompson 1978; Jõeveer et al. 1978). A systematic identification of voids not only requires a complete sampling of their boundaries, consisting of filaments and walls, but also the sensitivity to detect the faintest galaxies in their interiors. This has only recently become feasible with the advance of wide and deep redshift surveys that are able to reveal the three-dimensional structure of the cosmic web in great detail (e.g., see Pan et al. 2012; Sutter et al. 2012b; Micheletti et al. 2014; Mao et al. 2017b; Sánchez et al. 2017; Achitouv et al. 2017; Brouwer et al. 2018; Hawken et al. 2020, for some of the first void catalogs obtained from SDSS, VIPERS, BOSS, DES, 6dFGS, KiDS, and eBOSS).

Since then, void catalogs of ever-growing size have been compiled and analyzed to tackle unanswered questions in various fields of cosmology and astrophysics. For example, voids can been used to study environmental effects in the formation and evolution of galaxies (e.g., Hoyle et al. 2005; Patiri et al. 2006; Kreckel et al. 2012; Ricciardelli et al. 2014a; Habouzit et al. 2020; Panchal et al. 2020), to investigate the nature of gravity with the motivation to find modifications to the general theory of relativity (e.g., Clampitt et al. 2013; Spolyar et al. 2013; Zivick et al. 2015; Cai et al. 2015; Barreira et al. 2015; Hamaus et al. 2015; Achitouv 2016; Voivodic et al. 2017; Falck et al. 2018; Sahlén and Silk 2018; Baker et al. 2018; Paillas et al. 2019; Davies et al. 2019; Perico et al. 2019; Alam et al. 2021; Contarini et al. 2021; Wilson and Bean 2021), or to reveal unknown properties of the standard model ingredients in cosmology, namely its initial conditions (Chan et al. 2019), dark energy (e.g., Lee and Park 2009; Biswas et al. 2010; Lavaux and Wandelt 2012; Sutter et al. 2012a; Bos et al. 2012; Hamaus et al. 2014a; Pisani et al. 2015a; Pollina et al. 2016; Verza et al. 2019), dark matter (e.g., Leclercq et al. 2015; Yang et al. 2015; Reed et al. 2015; Baldi and Villaescusa-Navarro 2018), and neutrinos (Massara et al. 2015; Banerjee and Dalal 2016; Sahlén 2019; Kreisch et al. 2019; Schuster et al. 2019; Zhang et al. 2020; Bayer et al. 2021; Kreisch et al. 2022). It is the under-dense character of voids that makes them particularly sensitive to homogeneous or diffuse components of our Universe, such as dark energy and neutrinos. For example, dark energy dominates the matter-energy budget inside voids much earlier than in the cosmos as a whole. Thanks to their small mass, neutrinos can freely stream into the deep interiors of voids, while baryons and dark matter are mostly restricted to their boundaries due to gravitational interaction. Finally, screening mechanisms that efficiently hide possible deviations from general relativity in regions of high density or deep gravitational potential are not effective inside voids.

In order to encompass such a wide range of topics, various void-related observables have been considered. This includes cross-correlations with the CMB, which provide detections of the integrated Sachs-Wolfe effect (ISW, e.g., Granett et al. 2008; Ilić et al. 2013; Cai et al. 2014; Planck Collaboration et al. 2014a; Nadathur and Crittenden 2016; Kovács et al. 2019, 2022) and of the Sunyaev-Zeldovich (SZ) effect (Alonso et al. 2018), or correlations with the distorted shapes of galaxies, revealing the matter content of voids via the gravitational lensing effect (e.g., Melchior et al. 2014; Clampitt and Jain 2015; Gruen et al. 2016; Sánchez et al. 2017; Cai et al. 2017; Brouwer et al. 2018; Fang et al. 2019; Vielzeuf et al. 2021; Jeffrey et al. 2021). However, voids may also serve as cosmological probes themselves, because their dynamics are governed by the same physical laws that describe the evolution of the Universe as a whole. This enables us to predict their properties from first principles, and to compare these predictions with observations in order to constrain cosmological models.

3.7.1 Basic idea and equations

In this section, we discuss two of the most studied observables that have been investigated for cosmological applications with voids so far: the void size function and the void density profile (or void-galaxy cross-correlation function). These two observables are affected by the so-called Alcock and Paczynski (1979a) (AP) effect (e.g., Ryden 1995; Sutter et al. 2012a, 2014d; Hamaus et al. 2014a, 2016; Mao et al. 2017a; Correa et al. 2019; Endo et al. 2020; Nadathur et al. 2020; Paillas et al. 2021) and by redshift-space distortions (RSD) (e.g., Ryden and Melott 1996; Padilla et al. 2005; Paz et al. 2013; Pisani et al. 2015b; Hamaus et al. 2015, 2017; Cai et al. 2016; Chuang et al. 2017; Hawken et al. 2017, 2020; Achitouv 2019; Aubert et al. 2022; Correa et al. 2021, 2022), which themselves carry cosmologically relevant information. For other methods that employ voids as cosmological probes, such as their pairwise clustering statistics on large scales (e.g., Hamaus et al. 2014c, a; Chan et al. 2014; Zhao et al. 2016; Chuang et al. 2017; Lares et al. 2017b; Voivodic et al. 2020), the associated baryon acoustic oscillation (BAO) feature (e.g., Kitaura et al. 2016; Liang et al. 2016; Chan and Hamaus 2021), the velocity statistics of voids (e.g., Sutter et al. 2014a; Ruiz et al. 2015; Lambas et al. 2016; Ceccarelli et al. 2016; Wojtak et al. 2016; Lares et al. 2017a), or marked tracer statistics that up-weight underdense regions (e.g., Beisbart and Kerscher 2000; Sheth 2005; White 2016; Philcox et al. 2020; Massara et al. 2021, 2022), we refer the reader to the provided references.

3.7.1.1 Void size function The void size function \(\mathrm {d}n(R,z)/\mathrm {d}R\) specifies the number density of voids of a given size R at redshift z. It is also known as void abundance. One can think of it in analogy to the cluster mass function \(\mathrm {d}n(M,z)/\mathrm {d}M\), with the advantage of the void size being a directly observable quantity. In contrast, the cluster mass M can in practice only be related to other observables, such as richness or X-ray luminosity. The void size function has already been measured in current data (see Fig. 28), but has not yet been used to extract cosmological constraints (however, see Sahlén et al. 2016, for constraints from extreme-value statistics of voids). The increase in expected void numbers from upcoming surveys in the next decade and the strong modeling activity performed on simulations will soon allow first applications to observational data (Pisani et al. 2019). Theoretical models for the void size function allow us to predict void numbers in the dark matter distribution from first principles (e.g., Sheth and van de Weygaert 2004; Furlanetto et al. 2006; Platen et al. 2007; Paranjape et al. 2012; Jennings et al. 2013; Pisani et al. 2015a). By accounting for tracer bias, it is possible to relate those predictions to observable voids in the tracer distribution (Pollina et al. 2016; Ronconi and Marulli 2017; Ronconi et al. 2019; Contarini et al. 2019), thereby providing estimates of expected void numbers in large-scale structure surveys. First and foremost, predicting void numbers is an important task, necessary to perform accurate forecasts for other probes relying on the statistics of voids. However, it turns out that the void size function is an extremely sensitive probe of cosmology in itself: by counting voids of different size in surveys one can obtain constraints on the dark energy EoS (Pisani et al. 2015a; Verza et al. 2019), the presence of massive neutrinos (Sahlén 2019; Kreisch et al. 2019; Schuster et al. 2019; Kreisch et al. 2022), and modified gravity (Clampitt et al. 2013; Lam et al. 2015; Cai et al. 2015; Zivick et al. 2015; Sahlén et al. 2016; Contarini et al. 2021).

The most common setup to obtain predictions relies on the excursion-set formalism (Bond et al. 1991), applied to the hierarchical evolution of cosmic voids. It has first been developed by Sheth and van de Weygaert (2004) and was later extended by Jennings et al. (2013). Excursion-set theory provides predictions for void numbers based on spherical fluctuations in the initial (Lagrangian) density field. It calculates their conditional first-crossing distribution \(f_{\ln \sigma }(\sigma )\) as a function of the root mean square matter fluctuations \(\sigma \), smoothed on a scale \(R_\mathrm {L}\). A fluctuation becomes a void when its Lagrangian density contrast \(\delta ^\mathrm {L}\), filtered on the scale \(R_\mathrm {L}\), reaches the void formation threshold \(\delta _\mathrm {v}^\mathrm {L}\) without crossing the collapse threshold \(\delta _\mathrm {c}^\mathrm {L}\) on a scale larger than \(R_\mathrm {L}\). The thresholds are determined via the nonlinear evolution of a spherically symmetric top-hat fluctuation (Icke 1984), the moment of shell crossing conventionally defines the formation of a void (Bertschinger 1985; Blumenthal et al. 1992). The void size function in Lagrangian space is then given by:

$$\begin{aligned} \frac{\mathrm {d} n_\mathrm {L}}{\mathrm {d} \ln R_\mathrm {L}} = \frac{f_{\ln \sigma }(\sigma )}{V(R_\mathrm {L})} \, \frac{\mathrm {d} \ln \sigma ^{-1}}{\mathrm {d} \ln R_\mathrm {L}} , \end{aligned}$$
(87)

where \(V(R_\mathrm {L})=4\pi R_\mathrm {L}^3/3\), and the first-crossing distribution is (Sheth and van de Weygaert 2004):

$$\begin{aligned} f_{\ln \sigma }(\sigma ) = 2 \sum _{j=1}^{\infty } \, e^{-\frac{(j \pi x)^2}{2}} \, j \pi x^2 \, \sin {\left( j \pi {\mathcal {D}} \right) } , \end{aligned}$$
(88)

with:

$$\begin{aligned} {\mathcal {D}} \equiv \frac{|\delta _\mathrm {v}^\mathrm {L}|}{\delta _\mathrm {c}^\mathrm {L} + |\delta _\mathrm {v}^\mathrm {L}|}, \qquad x \equiv \frac{{\mathcal {D}}}{|\delta _\mathrm {v}^\mathrm {L}|} \sigma . \end{aligned}$$
(89)

The label \(\mathrm {L}\) indicates all quantities that are evaluated following linear theory in Lagrangian space. To ensure volume conservation between the linear and nonlinear density field, Jennings et al. (2013) impose:

$$\begin{aligned} V(R)\mathrm {d}n = V(R_\mathrm {L})\mathrm {d}n_\mathrm {L} |_{R_\mathrm {L}(R)} . \end{aligned}$$
(90)

Together with the equality \(\mathrm {d} \ln R=\mathrm {d} \ln R_\mathrm {L}\), which applies for the spherical top-hat model, one obtains the final expression for the so-called Vdn model, as extension from the original Sheth and van de Weygaert model:

$$\begin{aligned} \frac{\mathrm {d} n}{\mathrm {d} \ln R} = \frac{f_{\ln \sigma }(\sigma )}{V(R)} \, \frac{\mathrm {d} \ln \sigma ^{-1}}{\mathrm {d} \ln R} . \end{aligned}$$
(91)
Fig. 28
figure 28

Images reproduced with permission from Hamaus et al. (2020), copyright by IOP & SISSA

Left: Void size function from the final BOSS data in the redshift range \(0.20<z<0.75\). Right: Projected void density profile (void-galaxy cross-correlation function, red wedges) from the final BOSS data, and its real-space counterpart after deprojection (green triangles). The redshift-space monopole of the density profile (blue dots) is shown along with its best-fit model (blue solid line). The upturn towards the void center is due to residual noise in the deprojected profile, which is used in the model.

However, in order to apply this model to data it is necessary to consider the complicating fact that in practice, voids are found in the distribution of tracers of the matter density field, that are typically galaxies. Moreover, the structures identified by a shape-agnostic void finding algorithm are not the idealistic spherically symmetric and isolated objects assumed in the theoretical model (e.g., Platen et al. 2007; Neyrinck 2008; Sutter et al. 2015, see Sect. 3.7.2). To align the theory with observations, two important steps need to be taken into account. Firstly, the measured properties of real voids need to be linked with the idealistic top-hat model, such that their size and depth agree. For example, this can be achieved by identifying a sphere of radius R around the void center, which yields a given density threshold \(\delta _\mathrm {v}\). The spherical top-hat model suggests using \(\delta _\mathrm {v}\simeq -0.8\) at the moment of shell crossing as a natural choice (Bertschinger 1985; Blumenthal et al. 1992), but in principle the model keeps its validity with any other value (Jennings et al. 2013; Verza et al. 2019). Secondly, density fluctuations in the tracer distribution are biased with respect to the matter density field (see Sect. 3.7.4). Therefore, a model for tracer bias needs to be incorporated in the theoretical formalism to predict the observable void size function (Ronconi et al. 2019; Contarini et al. 2019, 2021).

3.7.1.2 Void density profile Apart from their size, voids are characterized by their unique composition and geometry. While these properties may vary significantly from one void to another, they are more well-defined in an ensemble average sense. For example, in a statistically homogeneous and isotropic universe the average density profile of voids exhibits some universal characteristics: an extended under-dense core and a steep density run towards the void boundary (e.g., see Ricciardelli et al. 2014b; Hamaus et al. 2014b, and Fig. 28). The boundary itself features an over-dense ridge whose amplitude diminishes for increasingly large voids (e.g., Sheth and van de Weygaert 2004; Ceccarelli et al. 2013). These characteristics can be parameterized by analytical fitting formulae for the isotropic void density profile. For example, one well-explored expression is given by:

$$\begin{aligned} \delta (r) = \delta _\mathrm {c}\frac{1-(r/r_\mathrm {s})^\alpha }{1+(r/R)^\beta } , \end{aligned}$$
(92)

where \(\delta \equiv \rho /{\bar{\rho }}-1\) is the density contrast with respect to the background density of the Universe \({\bar{\rho }}\) (Hamaus et al. 2014b). For voids with an effective radius R, it expresses the average density fluctuation as a function of comoving distance r from the void center and contains four parameters: a scale radius \(r_\mathrm {s}\) that determines where the density equals the background value, a central under-density \(\delta _\mathrm {c}\), and two power-law indices \(\alpha \) and \(\beta \) that control its inner and outer slopes. It has further been shown that the latter two parameters linearly scale with \(r_\mathrm {s}\), which can be exploited to reduce the dimensionality of the parameter space for the density profile to two. While the form of Eq. (92) has been motivated and tested by simulation studies (e.g., Sutter et al. 2014b; Barreira et al. 2015; Pollina et al. 2017; Falck et al. 2018; Baker et al. 2018; Perico et al. 2019; Stopyra et al. 2021; Shim et al. 2021; Tavasoli 2021), it is also in good agreement with observations (e.g., Sánchez et al. 2017; Chantavat et al. 2017; Pollina et al. 2019; Fang et al. 2019). Typical values for the parameters in Eq. (92) are \(r_\mathrm {s}\simeq R\), \(\delta _\mathrm {c}\simeq -0.8\), \(\alpha \simeq 2\), and \(\beta \simeq 9\) (Hamaus et al. 2014b).

However, in redshift surveys the assumption of spherical symmetry is violated due to RSD. They arise as a consequence of the peculiar motions of galaxies on top of the Hubble flow, causing a Doppler shift in their emitted spectrum. This affects the distance-redshift relation, which only accounts for a Hubble redshift \(z_h\). As a result, the comoving location \({\mathbf {x}}\) of a galaxy with observed redshift z is given by:

$$\begin{aligned} {\mathbf {x}}(z) = {\mathbf {x}}(z_h) + \frac{1+z_h}{H(z_h)}{\mathbf {v}}_\parallel , \end{aligned}$$
(93)

where \({\mathbf {v}}_\parallel \) is the component of the galaxy velocity vector \({\mathbf {v}}\) along the line of sight, relative to the observer. The same argument applies to the location of a void center \({\mathbf {X}}\) at redshift Z (we use capitals to designate void properties), i.e. for the separation vector \({\mathbf {s}}\) between galaxy and void center in redshift space we obtain:

$$\begin{aligned} {\mathbf {s}} \equiv {\mathbf {x}}(z)-{\mathbf {X}}(Z) \simeq {\mathbf {x}}(z_h)-{\mathbf {X}}(Z_h) + \frac{1+z_h}{H(z_h)}\left( {\mathbf {v}}_\parallel -{\mathbf {V}}_\parallel \right) = {\mathbf {r}} + \frac{1+z_h}{H(z_h)}{\mathbf {u}}_\parallel , \end{aligned}$$
(94)

where \({\mathbf {r}}\) is their comoving separation in real space and \({\mathbf {u}}_\parallel ={\mathbf {v}}_\parallel -{\mathbf {V}}_\parallel \) their relative velocity along the line of sight. Thus, a description of the mapping between real and redshift space requires a model for the dynamics of voids. It has been shown that the assumptions of average spherical symmetry and local mass conservation at linear order in the density contrast provide an accurate relation for the relative velocity field \({\mathbf {u}}\) (Peebles 1980; Hamaus et al. 2014b):

$$\begin{aligned} {\mathbf {u}}({\mathbf {r}}) = -\frac{f(z_h)}{3}\frac{H(z_h)}{1+z_h}\varDelta (r)\,{\mathbf {r}} , \end{aligned}$$
(95)

where f is the linear growth rate of density perturbations and \(\varDelta (r)\) the average density contrast within a radius \(r\equiv |{\mathbf {r}}|\) from the void center:

$$\begin{aligned} \varDelta (r) = \frac{3}{r^3}\int _0^r\delta (r')r'^2\,\mathrm {d}r' \;. \end{aligned}$$
(96)

The vector \({\mathbf {u}}\) is directed along the radial direction \({\mathbf {r}}\) from the void center in real space, so the coordinate mapping from Eq. (94) can now be written in terms of \({\mathbf {r}}\) and its component along the line of sight \({\mathbf {r}}_\parallel \):

$$\begin{aligned} {\mathbf {s}} = {\mathbf {r}} - \frac{f(z_h)}{3}\varDelta (r)\,{\mathbf {r}}_\parallel . \end{aligned}$$
(97)

From this equation the coordinate transformation between real and redshift space is fully determined by the void density profile in real space, e.g. via the fitting formula of Eq. (92). The linear growth rate f only depends on the cosmological model, but within the realm of General Relativity it is well approximated by a power law of the matter content \(\varOmega _{\mathrm{m}}(z)\) at redshift z with a growth index of \(\gamma \simeq 0.55\), \(f(z)=\varOmega _\mathrm{m}(z)^\gamma \) (Lahav et al. 1991; Linder 2005).

Fig. 29
figure 29

Image reproduced with permission from Hamaus et al. (2020), copyright by IOP & SISSA

Schematic representation of a void in real (left) and in redshift space (right). The separation vector \({\mathbf {r}}\) between its center at \({\mathbf {X}}\) and a galaxy at \({\mathbf {x}}\) in real space is transformed via \({\mathbf {s}}={\mathbf {r}}+{\mathbf {u}}_\parallel \) to redshift space, where \({\mathbf {u}}_\parallel ={\mathbf {v}}_\parallel -{\mathbf {V}}_\parallel \) is the relative line-of-sight velocity between them. For simplicity the illustration displays \(\mu \) instead of \(\arccos (\mu )\) to indicate angles to the line of sight and uses velocity displacements in units of \((1+z_h)/H(z_h)\).

The coordinate mapping in Eq. (97) leads to an anisotropic distortion of voids along the line of sight, as illustrated schematically in Fig. 29. Therefore, an isotropic density profile is no longer sufficient to describe the average geometry and composition of voids. Instead, the corresponding observable quantity is the void-galaxy cross-correlation function \(\xi ^s({\mathbf {s}})\) in redshift space, which not only depends on the magnitude s of the separation vector, but also on the cosine of its angle to the line of sight \(\mu _s=s_\parallel /s\). Because the number of galaxies around every void is conserved in the mapping from real to redshift space, the Jacobian \(\partial {\mathbf {s}}/\partial {\mathbf {r}}\) relates \(\delta (r)\) to \(\xi ^s({\mathbf {s}})\) via:

$$\begin{aligned} \int [1+b\delta (r)]\mathrm {d}^3r = \int [1+\xi ^s({\mathbf {s}})]\det \!\left( \frac{\partial {\mathbf {s}}}{\partial {\mathbf {r}}}\right) \mathrm {d}^3r . \end{aligned}$$
(98)

Here we have additionally assumed a linear bias relation of the form \(\xi (r)=b\delta (r)\) between galaxy and matter over-densities in real space, with a scale-independent bias parameter b (see Sect. 3.7.4). This assumption has been investigated with the help of N-body simulations (Sutter et al. 2014b; Pollina et al. 2017; Contarini et al. 2019; Ronconi et al. 2019), but also with galaxy-clustering and weak-lensing observations (Pollina et al. 2019; Fang et al. 2019), and was found to be remarkably accurate. Using Eq. (97) inside Eq. (98) and an expansion to linear order in \(\delta (r)\), one finally arrives at (Cai et al. 2016; Hamaus et al. 2017):

$$\begin{aligned} \xi ^s({\mathbf {s}}) = b\delta (r) + \frac{f}{3}\varDelta (r) + f\mu _r^2\left[ \delta (r)-\varDelta (r)\right] , \end{aligned}$$
(99)

where \(\mu _r=r_\parallel /r\). Given a density profile \(\delta (r)\) and the mapping between \({\mathbf {s}}\) and \({\mathbf {r}}\) from Eq. (97), one can now evaluate the void-galaxy cross-correlation function \(\xi ^s\) for any observed separation vector \({\mathbf {s}}\). Since \({\mathbf {r}}\) is unknown, one may initially evaluate \(\varDelta (r)\) at \(r=s\) and then calculate r via iteratively applying the following set of equations (Hamaus et al. 2020):

$$\begin{aligned} r = \sqrt{r_\perp ^2+r_\parallel ^2},\qquad r_\perp = s_\perp ,\qquad r_\parallel = s_\parallel \left[ 1-\frac{f}{3}\varDelta (r)\right] ^{-1} , \end{aligned}$$
(100)

where \(s_\perp \) is the perpendicular component of \({\mathbf {s}}\) to the line of sight, and hence unaffected by RSD. Equation 99 can also be expanded in terms of Legendre polynomials, with monopole and quadrupole as the only non-vanishing multipoles at linear order (Cai et al. 2016).

It remains to determine the real-space density profile \(\delta (r)\) to be used in the previous equations. Various approaches have been followed in the literature: they either make use analytic fitting formulae like Eq. (92) (Paz et al. 2013; Hamaus et al. 2015, 2016; Correa et al. 2019), calibrated measurements from simulations (Achitouv et al. 2017; Nadathur et al. 2020), or a deprojection technique to determine it from the observed data directly (Pisani et al. 2014; Hawken et al. 2017; Hamaus et al. 2020). The latter approach is based on the inverse Abel transform (Abel 1842; Bracewell 1999):

$$\begin{aligned} \xi (r) = -\frac{1}{\pi }\int _r^\infty \frac{\mathrm {d}\xi ^s_\mathrm {p}(s_\perp )}{\mathrm {d}s_\perp }\frac{\mathrm {d}s_\perp }{\sqrt{s_\perp ^2-r^2}} , \end{aligned}$$
(101)

exploiting the fact that the projected void-galaxy cross-correlation function in redshift space, \(\xi ^s_\mathrm {p}(s_\perp )=\int \xi ^s({\mathbf {s}})\,\mathrm {d}s_\parallel \), is insensitive to RSD, which only act along the line-of-sight component \(s_\parallel \) of the separation vector \({\mathbf {s}}\) (see Fig. 28). Assuming linear bias, this provides the real-space density profile via \(\delta (r)=\xi (r)/b\).

Moreover, it is possible to extend this dynamical model to the quasi-linear regime via the so-called Gaussian Streaming Model (GSM). Assuming the pairwise line-of-sight velocity \(u_\parallel \) between void centers and galaxies to follow a Gaussian distribution, the void-galaxy cross-correlation function in redshift space is given by (e.g., Paz et al. 2013; Hamaus et al. 2015; Cai et al. 2016):

$$\begin{aligned} 1+\xi ^s({\mathbf {s}}) = \int \left[ 1+\xi (r)\right] \frac{1}{\sqrt{2\pi }\;\sigma _\parallel (r,\mu _r)}\exp \left\{ -\frac{\left[ u_\parallel -u(r)\mu _r\right] ^2}{2\sigma _\parallel ^2(r,\mu _r)}\right\} \mathrm {d}u_\parallel , \end{aligned}$$
(102)

which additionally requires the pairwise velocity dispersion along the line of sight \(\sigma _\parallel (r,\mu _r)\) as a model ingredient. We refer to Hamaus et al. (2020) for a discussion on the advantages and disadvantages of the various modeling choices.

3.7.2 Sample selection

The observational identification of voids requires a distribution of tracers of the large-scale structure, as it is obtained via redshift surveys. Typically these tracers are galaxies with either spectroscopic or photometric redshift estimates, but other tracer types, such as galaxy clusters (Pollina et al. 2019), the Ly-\(\alpha \) forest (Stark et al. 2015; Krolewski et al. 2018; Porqueres et al. 2019), or the 21 cm emission from neutral Hydrogen (White and Padmanabhan 2017; Endo et al. 2020) have been considered for void finding as well. These observations commonly optimize the target selection based on their individual science cases, but voids can be extracted as a byproduct without additional expense. Therefore, the sample selection for voids usually derives from the target tracer selection, and is rarely optimized specifically for void detection (however, see van de Weygaert et al. 2011; Pisani et al. 2019, for more details on the optimization of surveys for void detection). Nevertheless, previous survey data has proven itself very valuable in providing void catalogs of high quality with significant sample sizes (e.g., Sutter et al. 2012b; Mao et al. 2017b; Fang et al. 2019; Hamaus et al. 2020; Aubert et al. 2022; Nadathur et al. 2020).

Various techniques for the identification of voids have been presented in the literature (see Colberg et al. 2008; Cautun et al. 2018, for an overview of different methods). They either consider a full distribution of tracers in 3D, or 2D projections along the line-of-sight direction. The former approach is typically applied to spectroscopic, the latter to photometric data, although both techniques can be used in each case. Moreover, some void finders search for spherical domains with tracer densities below a given threshold, while others locate void boundaries of arbitrary geometry in a non-parametric fashion. The latter can be achieved with a so-called watershed algorithm (Platen et al. 2007), which requires the definition of a density field from the distribution of tracer particles. The density field itself can be estimated in various ways, for example via grid interpolation or adaptive methods, such as Delaunay or Voronoi tessellation. As a result, one obtains a nearly space-filling distribution of voids in the large-scale structure with individual properties, such as their size, shape, density, or center location, which can be considered as cosmological observables. Among the most popular software repositories that implements this is the public Void IDentification and Examination toolkit VIDEFootnote 15 (Sutter et al. 2015). It is based on the code ZOBOV (Neyrinck 2008), which performs a Voronoi tessellation and the watershed transform on a set of tracer particles. VIDE additionally handles the complexities arising from the survey geometry, which typically represents a masked light cone within a given redshift range. Voids intersecting with the boundary of the survey mask are usually excluded from the final void catalog. Furthermore, a cut on minimum void size based on the mean tracer separation is often used to mitigate the contamination from spurious voids that may arise via random density fluctuations (see Sect. 3.7.4).

3.7.3 Measurements

The location of an astronomical object at cosmological distance is determined via its observed redshift z and its position on the sky, expressed in angular coordinates \(\vartheta \) (right ascension) and \(\varphi \) (declination). In order to identify voids in the 3D distribution of tracers, we first need to perform a transformation to Cartesian coordinates \({\mathbf {x}}\) in comoving space:

$$\begin{aligned} {\mathbf {x}}(z,\vartheta ,\varphi ) = (1+z) D_\mathrm {A}(z)\begin{pmatrix}\cos \vartheta \cos \varphi \\ \sin \vartheta \cos \varphi \\ \sin \varphi \end{pmatrix} , \end{aligned}$$
(103)

where \(D_\mathrm {A}(z)\) is the angular diameter distance to a tracer at redshift z. It depends on the expansion history of the Universe via the Hubble function H(z) and on the curvature of space via the parameter \(\varOmega _k\) as expressed in Eq. (14). That equation can be also written as:

$$\begin{aligned} D_\mathrm {A}(z) = \frac{c}{(1+z)H_0\sqrt{-\varOmega _k}}\sin \left( \sqrt{-\varOmega _k}\int _0^z\frac{H_0}{H(z')}\mathrm {d}z'\right) , \end{aligned}$$
(104)

where c is the speed of light and \(H_0\equiv H(z=0)\) the Hubble constant. Thus, in order to perform the coordinate transformation in Eq. (103) it is necessary to assume a particular cosmology. Within \(\Lambda \)CDM, for example, this requires values for the radiation, matter, and cosmological constant parameters \(\varOmega _\mathrm {r}\), \(\varOmega _{\mathrm{m}}\), and \(\varOmega _{\varLambda }\), which determine the curvature parameter as \(\varOmega _k=1-\varOmega _\mathrm {r}-\varOmega _\mathrm {m}-\varOmega _\Lambda \). The Hubble function is given by Eq. (8). Once the coordinate transformation is performed, voids can be identified in comoving space. It is then possible to ascribe a volume V and an effective radius R to every void. In particular, making use of a Voronoi tessellation, one can define these quantities via a sum over the cell volumes \({\mathcal {V}}_i\) of the individual tracer particles with index i that belong to each void:

$$\begin{aligned} V = \sum \nolimits _i{\mathcal {V}}_i,\qquad R = \left( \frac{3}{4\pi }V\right) ^{1/3} . \end{aligned}$$
(105)

Moreover, one can define a volume-weighted barycenter from the tracers at location \({\mathbf {x}}_i\), which serves as a good estimator for the geometric center of a void (e.g., Sutter et al. 2012b; Cautun et al. 2016; Stopyra et al. 2021):

$$\begin{aligned} {\mathbf {X}} = \frac{\sum _i{\mathbf {x}}_i{\mathcal {V}}_i}{\sum _i{\mathcal {V}}_i} . \end{aligned}$$
(106)

Further properties, such as the inertia tensor with its eigenvalues and eigenvectors, the ellipticity, the minimum density, the density contrast, or the average density can be defined for each void based on its defining tracers (Sutter et al. 2015).

The separations between void centers and tracers in comoving space can be calculated via their differences in the angle on the sky \(\delta \theta \) and in redshift \(\delta z\) following Eq. (103):

$$\begin{aligned} s_\perp = (1+z)D_\mathrm {A}(z)\,\delta \theta ,\qquad s_\parallel = \frac{c}{H(z)}\,\delta z . \end{aligned}$$
(107)

However, as both \(D_\mathrm {A}(z)\) and H(z) depend on the assumed cosmological model, so do the separations. It is therefore common practice to introduce two AP parameters \(q_\perp \) and \(q_\parallel \) that inherit the dependence on cosmology via:

$$\begin{aligned} q_\perp = \frac{s_\perp ^*}{s_\perp } = \frac{D_\mathrm {A}^*(z)}{D_\mathrm {A}(z)},\qquad q_\parallel = \frac{s_\parallel ^*}{s_\parallel } = \frac{H(z)}{H^*(z)} , \end{aligned}$$
(108)

where the quantities with an asterisk are evaluated in the true underlying cosmology, which is unknown. In the special case where the assumed cosmology coincides with the true one, \(q_\perp =q_\parallel =1\). In turn, measuring \(q_\perp \) and \(q_\parallel \) provides a measurement of \(D_\mathrm {A}^*(z)\) and \(H^*(z)\), respectively. However, without an absolute calibration scale the two parameters remain degenerate in the AP test. Only their ratio, known as the AP parameter:

$$\begin{aligned} \varepsilon \equiv \frac{q_\perp }{q_\parallel } = \frac{D_\mathrm {A}^*(z)H^*(z)}{D_\mathrm {A}(z)H(z)} \end{aligned}$$
(109)

can be determined, which provides a measurement of the product \(D_\mathrm {A}^*(z)H^*(z)\) (Sutter et al. 2012a; Hamaus et al. 2016). Furthermore, the observed volume is proportional to \(s_\parallel s^2_\perp \), which implies \(R^*=q_\perp ^{2/3}q_\parallel ^{1/3}R\) for the true effective void radius (Hamaus et al. 2020; Correa et al. 2021).

In practice, the AP test is applied to measurements of the void-galaxy cross-correlation function \(\xi ^s(s_\perp ,s_\parallel )\) in redshift space. It is customary to use the Landy and Szalay (1993) estimator for this purpose:

$$\begin{aligned} \hat{\xi }^s(s_\perp ,s_\parallel ) = \frac{\langle {{\mathcal {D}}_\mathrm {v} {\mathcal {D}}_\mathrm {g}}\rangle -\langle {{\mathcal {D}}_\mathrm {v} {\mathcal {R}}_\mathrm {g}}\rangle -\langle {{\mathcal {R}}_\mathrm {v} {\mathcal {D}}_\mathrm {g}}\rangle +\langle {{\mathcal {R}}_\mathrm {v} {\mathcal {R}}_\mathrm {g}}\rangle }{\langle {{\mathcal {R}}_\mathrm {v}{\mathcal {R}}_\mathrm {g}}\rangle } , \end{aligned}$$
(110)

where the angled brackets indicate normalized pair counts of void-center and galaxy positions in the data (\({\mathcal {D}}_\mathrm {v}\), \({\mathcal {D}}_\mathrm {g}\)) and in random catalogs (\({\mathcal {R}}_\mathrm {v}\), \({\mathcal {R}}_\mathrm {g}\)) without spatial correlations. The number of random objects has to be large enough to guarantee an unbiased estimate of \(\xi ^s\), it is typically set one to two orders of magnitude higher than the number of observed objects of each kind. From Eq. (110) it is then straightforward to estimate the projected correlation function via line-of-sight integration:

$$\begin{aligned} \hat{\xi }^s_\mathrm {p}(s_\perp ) = \int \hat{\xi }^s(s_\perp ,s_\parallel )\,\mathrm {d}s_\parallel . \end{aligned}$$
(111)

Application of the inverse Abel transform from Eq. (101) then provides the real-space correlation function \(\xi (r)\) (see Fig. 28), which is needed as a model ingredient for \(\xi ^s({\mathbf {s}})\), as in Eqs. 99 or 102. For example, assuming linear bias Eq. (99) can be written as:

$$\begin{aligned} \xi ^s(s_\perp ,s_\parallel ) = \xi (r) + \frac{1}{3}\frac{f}{b}{\overline{\xi }}(r) + \frac{f}{b}\mu ^2\left[ \xi (r)-{\overline{\xi }}(r)\right] , \end{aligned}$$
(112)

where \({\overline{\xi }}(r) = 3r^{-3}\!\int _0^r\xi (r')\,r'^2\,\mathrm {d}r'\). This can be compared to the measured \(\hat{\xi }^s(s_\perp ,s_\parallel )\) assuming a Gaussian likelihood:

$$\begin{aligned} {\mathcal {L}}(\hat{\xi }^s|{\varvec{\Theta }}) \propto \exp \left\{ -\frac{1}{2}\sum \limits _{i,j}\left[ \hat{\xi }^s({\mathbf {s}}_i)-\xi ^s({\mathbf {s}}_i|{\varvec{\Theta }})\right] \,\hat{{\mathsf {C}}}_{ij}^{-1}\left[ \hat{\xi }^s({\mathbf {s}}_j)-\xi ^s({\mathbf {s}}_j|{\varvec{\Theta }})\right] \right\} , \end{aligned}$$
(113)

with model parameter vector \({\varvec{\Theta }}\) and covariance matrix:

$$\begin{aligned} \hat{{\mathsf {C}}}_{ij} = \left\langle {\left[ {\hat{\xi }^s({\mathbf {s}}_i)-\langle \hat{\xi }^s({\mathbf {s}}_i)\rangle }\right] \left[ {\hat{\xi }^s({\mathbf {s}}_j)-\langle \hat{\xi }^s({\mathbf {s}}_j)\rangle }\right] }\right\rangle . \end{aligned}$$
(114)

Here, angled brackets imply averages over an ensemble of observations. Because voids are spatially exclusive, they represent independent regions of the large-scale structure and the covariance matrix can be estimated via jackknife resampling of the observed sample of voids (e.g., Paz et al. 2013; Hamaus et al. 2015; Cai et al. 2016; Correa et al. 2019).

The likelihood can be used to determine the AP parameter \(\varepsilon \), which is equivalent to a measurement of the product of angular diameter distance \(D_\mathrm {A}(z)\) and Hubble expansion H(z) at redshift z. Because these two quantities depend on the cosmological model via Eqs. 104 and 8, measurements of \(\varepsilon \) can be converted to constraints on the cosmological parameters that enter these two equations. A variation of \(\varepsilon \) corresponds to a change in distance ratios along and perpendicular to the line of sight, which can be described as a geometric distortion of void shapes. However, voids are additionally affected by RSD due to the peculiar velocity flows in their immediate surroundings, as described in Sect. 3.7.1. The magnitude of these velocities and hence the strength of dynamic distortions is controlled by the growth rate parameter f, which enters in the model Eq. (112) via Eq. (95). In order to properly model the average shapes of voids, respectively the void-galaxy cross-correlation function in redshift space, geometric and dynamic distortions must be accounted for at the same time (Hamaus et al. 2015, 2016). Fortunately, the two types of distortions influence \(\xi ^s(s_\perp ,s_\parallel )\) in fundamentally different ways, such that there is no significant degeneracy between the parameters \(\varepsilon \) and f/b.

3.7.4 Systematic effects

The mass distribution on cosmological scales is predominantly constituted by invisible dark matter. Large-scale structure surveys merely allow us to infer this distribution via luminous tracers of the mass, but this inference is subject to bias, statistical noise, and other sources of error. As we rely on the spatial distribution of tracers for void identification, these complications necessarily propagate into the properties of voids as a source of systematic effects. The main known systematics are summarized in the list below.

3.7.4.1 Clustering bias

The over-densities of tracers \(\delta _\mathrm {t}\) generally differ from the fluctuations in the matter density field \(\delta _\mathrm {m}\), a phenomenon referred to as tracer bias (Desjacques et al. 2018). At linear order in the perturbations this difference is quantified by a multiplicative constant b, denoted as linear bias, with \(\delta _\mathrm {t}=b\delta _\mathrm {m}\) (Kaiser 1984). For example, luminous red galaxies (LRGs) typically have \(b>1\), because they populate relatively massive halos that form in the most over-dense environments (e.g., Gil-Marín et al. 2015; Zhai et al. 2017). Therefore, voids identified in the distribution of LRGs exhibit deeper interiors and higher compensation ridges compared to voids identified in the dark matter density field (Sutter et al. 2014b; Pollina et al. 2017, 2019). As a consequence, basic void properties, such as their effective radius and density profile, depend on the bias of the tracer sample considered for their identification.

Stochasticity. While the distribution of dark matter can be seen as a collisionless fluid, tracers of the mass consist of discrete objects, such as galaxies. Therefore, the density field of tracers \(\delta _\mathrm {t}\) must be estimated from a finite number of objects per volume element, which is subject to discreteness noise, also referred to as shot noise. Typically, this shot noise is assumed to obey Poisson statistics, but corrections due to the finite extent of tracers and their nonlinear clustering appear (Hamaus et al. 2010; Baldauf et al. 2013; Paech et al. 2017; Ginzburg et al. 2017; Friedrich et al. 2022). Voids are necessarily affected by shot noise as well, if they are defined via tracer statistics. For example, even in a tracer distribution that is drawn from a homogeneous density field, chance fluctuations due to shot noise can result in spurious void detections (Neyrinck 2008). Therefore, one may expect that not all voids identified in a real tracer distribution are genuine, but that there is a contamination of spurious voids depending on the sparsity of the considered tracer sample. With the help of simulations and mock catalogs the contamination fraction can be assessed, exploiting the fact that various void properties distinguish genuine from spurious voids. The use of machine learning methods is particularly effective to minimize the contamination by spurious voids (Cousinou et al. 2019).

3.7.4.2 Nonlinear RSD

By design, large-scale structure surveys infer distances via the measured redshifts of sources, which are distorted due to their peculiar motion along the line of sight (Kaiser 1984). While just a decade ago peculiar velocities have been considered the strongest systematic effect to limit the extractable information from the void density profile (e.g., Lavaux and Wandelt 2012; Sutter et al. 2012a), RSD models for voids have now reached a level of maturity that exploit peculiar velocities as an independent source of information. At the linear level, which is most relevant for voids, these RSDs can be modeled very accurately, as discussed in Sect. 3.7.1, but their nonlinear regime is more complex and difficult to understand from first principles. A well-known example of an extreme type of nonlinear RSD is the so-called Finger-of-God (FoG) effect (Jackson 1972). It arises around the most massive structures in the Universe observed in redshift space, galaxy clusters, and appears as an elongated feature along the line of sight. This apparent elongation is RSD caused by the virial motion of the cluster member galaxies. While the occurrence of FoGs inside voids is less likely, they can disrupt their over-dense boundaries. In turn, this can cause spurious mergers or segmentation of voids, preferentially along the line-of-sight direction, which results in an anisotropic selection effect (Pisani et al. 2015b; Correa et al. 2022).

3.7.4.3 Redshift error

The measurement of redshift is error-prone itself. While this can be largely neglected for the high-resolution spectra obtained with spectroscopic redshift surveys, photometric surveys are subject to a relatively large photo-z scatter that often amounts to a few percent uncertainty in redshift. Translated to a distance scale, this typically corresponds to several tens of Mpc and therefore strongly impacts the identification of voids whose extent is of the same order. 2D void finders have specifically been designed to reduce the impact of this error from photometric surveys (Sánchez et al. 2017; Kovács et al. 2019; Vielzeuf et al. 2021). Another option is to rely on tracers with higher photo-z accuracy, such as galaxy clusters, for the identification of voids (Pollina et al. 2019).

3.7.4.4 Survey boundary

Redshift surveys typically only observe a fraction of the full sky. In addition, objects in the foreground, such as stars or the plane of the Galaxy, have to be masked out. Together with the finite redshift range of the survey, this creates a complex geometry of the observed cosmological volume. Voids that intersect with a survey boundary are only partially observed and hence cannot be used for further analysis. This constraint concerns the largest voids most severely, as they are the most likely to extend beyond the edges of the survey. Thus, survey boundaries impact the detectable distribution of void sizes in a systematic way, which is not straightforward to predict (Sutter et al. 2014c). To mitigate this effect, it is desirable to survey large contiguous fractions of the sky, and to discard voids that are too close to the survey boundary.

In reality the mentioned systematic effects do not occur in isolation, but impact the identification of voids jointly. It is therefore difficult to address them from purely theoretical grounds. As an alternative, various empirical approaches to handle systematics have been adopted in the literature. This can be realized in essentially two different ways: first, at the level of the data, such as performing a cleaning procedure to select voids based on their size and depth (e.g., Contarini et al. 2019; Ronconi et al. 2019), applying projections within redshift slices (e.g., Sánchez et al. 2017), or implementing a velocity reconstruction to control the void selection (Nadathur et al. 2020). Second, at the level of the model, which can be extended by additional nuisance parameters to allow some more flexibility (e.g., Hamaus et al. 2020; Paillas et al. 2021). Even though such extra parameters may not be uniquely associated with a given systematic effect, they can be marginalized over for the cosmological interpretation of the analysis.

3.7.5 Main results and forecasts

Voids have been considered for cosmological forecasts and constraints in various ways throughout the literature. Constraints from current data based on voids mainly rely on the void density profile, the theoretical modeling of the void size function has only recently reached maturity and will show its full power with larger samples of voids from the next generation of surveys. Therefore, here we focus on one of the most established applications to probe cosmology with voids: the AP test with the void-galaxy cross-correlation function \(\xi ^s(s_\perp ,s_\parallel )\).

Fig. 30
figure 30

Images reproduced with permission from Hamaus et al. (2020), copyright IOP & SISSA

Left: Measurement of the void-galaxy cross-correlation function \(\xi ^s(s_\perp ,s_\parallel )\) from voids in the final BOSS data (color scale with black contours) and the best-fit model (white contours). Right: Constraints on the parameters \(\varOmega _{\mathrm{m}}\) and \(f\sigma _8\) obtained from modeling the data in the left panel. A white cross indicates the best fit and dashed lines the mean parameter values obtained by the Planck Collaboration et al. (2020).

Figure 30 shows the results obtained by performing an AP test with voids identified in the final data release of the Baryon Oscillation Spectroscopic Survey (BOSS, Dawson et al. 2013). The left panel contains the measured \(\hat{\xi }^s(s_\perp ,s_\parallel )\) in bins of void-centric separations along and perpendicular to the line of sight, and the best-fit model indicated by white contour lines. Application of a MCMC sampler allows one to retrieve the posterior distribution of the model parameters \(\varepsilon \) and f/b. Then, assuming a flat \(\Lambda \)CDM cosmology, \(\varepsilon \) can be converted to \(\varOmega _{\mathrm{m}}\), the only free parameter in the product \(D_\mathrm {A}H\) within that model (since \(\varOmega _\mathrm {r}\) can be neglected and \(\varOmega _k=0\)). Furthermore, with a measurement of the linear clustering amplitude of the tracer galaxies in BOSS, which is determined by the product of their bias b and the amplitude of linear matter fluctuations \(\sigma _8\), the ratio f/b can be converted to the more commonly quoted combination \(f\sigma _8\). Because \(\sigma _8\) is defined in terms of \(8\,h^{-1}\mathrm {Mpc}\), the posterior on \(f\sigma _8\) should be marginalized over the Hubble constant h, which is often neglected (Sánchez 2020).

The constraints on \(\varOmega _{\mathrm{m}}\) and \(f\sigma _8\) from the AP test with voids in the final BOSS data are shown in the right panel of Fig. 30. It demonstrates how competitive this relatively new method is, for example when compared to the more traditional approach that focuses on the pairwise clustering of galaxies (e.g., Alam et al. 2017). The latter is more challenging to model on small scales, due to the complex velocity statistics of galaxies in over-dense environments. However, on large scales it is imprinted with a characteristic scale of about \(105\,h^{-1}\)Mpc by the BAO feature that emerged during the radiation-dominated epoch of the early Universe, which can be used as a standard ruler to constrain \(D_\mathrm {A}(z)\) and H(z) individually. There are strong indications that the combination of such measurements with the AP test from voids can greatly improve the precision on cosmological parameters, thanks to their complementarity (e.g., Nadathur et al. 2020; Paillas et al. 2021; Kreisch et al. 2022).

Fig. 31
figure 31

Comparison of constraints on growth via \(f\sigma _8\) and geometry via \(D_\mathrm {A} H\) (\(68\%\) confidence intervals) obtained from cosmic voids in the literature, references are ordered chronologically in the figure legend. Gray lines with shaded error bands show the Planck Collaboration et al. (2020) baseline result as a reference, with corresponding values of \(D_\mathrm {A}'H'\). Filled markers indicate growth rate measurements without consideration of the AP effect, while open markers include the AP test. The different line styles of error bars indicate various degrees of model assumptions made: model-independent (solid), calibrated on simulations (dashed), calibrated on mocks (dotted), calibrated on simulations and mocks (dash-dotted)

Figure 31 summarizes the cosmological constraints that have been obtained from cosmic voids as a stand-alone probe in the literature. Despite being a young field of research, it has been blossoming with applications of increasingly accurate techniques applied to very different surveys, including SDSS (Sutter et al. 2012a), BOSS (Sutter et al. 2014d; Hamaus et al. 2016; Mao et al. 2017a), eBOSS (Hawken et al. 2020; Aubert et al. 2022; Nadathur et al. 2020), VIPERS (Hawken et al. 2017), and 6dFGS (Achitouv et al. 2017). All measurements of \(D_\mathrm {A}H\) are based on the AP test, while some constraints on \(f\sigma _8\) are only derived from dynamic distortions of void shapes and assume a fiducial cosmology with a fixed \(D_\mathrm {A}H\). The method becomes particularly powerful towards higher redshift, where the observed volume and hence the available sample size of voids grows larger. Moreover, the product \(D_\mathrm {A}(z)H(z)\) is an increasing function of redshift, so its measurement becomes more sensitive to changes in cosmological parameters at higher z. These two trends will eventually be overcome by the declining amplitude of nonlinear fluctuations in the matter density field and the absence of observable tracers of the latter. However, upcoming surveys of the next generation, such as DESI (DESI Collaboration et al. 2016), Euclid (Laureijs et al. 2011), PFS (Takada et al. 2014), the Nancy Grace Roman Space Telescope (Spergel et al. 2015), the Vera Rubin Observatory (LSST Science Collaboration et al. 2009), and SPHEREx (Doré et al. 2014) are expected to obtain void catalogs of unprecedented size, containing on the order of \(10^5\) objects each (Pisani et al. 2019; Hamaus et al. 2022). Compared to the current state of the art, this corresponds to an increase of about two orders of magnitude. Therefore, we expect the next generation of surveys to initiate an era of voids in the pursuit of precision cosmology.

3.8 Neutral hydrogen intensity mapping

Traditionally, large-scale structure surveys aim to detect individual galaxies in three dimensions. This involves measuring the redshift of each galaxy as well as its angular position on the sky, and then creating a catalog and a corresponding 3D map. This procedure has been routinely used by optical galaxy surveys like SDSS and has led to constraints on dark energy, gravity, and the initial conditions of the Universe (see, e.g., Beutler et al. 2012; Alam et al. 2021; Mueller et al. 2021). An alternative proposal is to map the large-scale structure of the Universe using the redshifted 21-cm line from the spin flip transition in neutral hydrogen (Hi) with radio telescopes (Battye et al. 2004; Chang et al. 2008; Loeb and Wyithe 2008; Mao et al. 2008; Peterson et al. 2009; Seo et al. 2010; Ansari et al. 2012).

3.8.1 Basic idea and equations

The Hi Intensity Mapping technique does not require the often difficult and expensive detection of individual galaxies. Instead, it maps the entire Hi flux coming from many galaxies together in large 3D pixels, across the sky and along time (see Fig. 32). With radio telescope arrays, the Hi intensity mapping method has the potential to provide the largest map of the Universe back to \(\sim \)1 billion years after the Big Bang. The data can then be used for precision cosmology and galaxy evolution studies (Kovetz et al. 2020; Ahmed et al. 2019).

Fig. 32
figure 32

From individual galaxies (left) to Hi intensity maps (right). Using the intensity mapping technique we can map the entire Hi flux from many galaxies together in large 3D pixels, and produce low angular resolution Hi brightness temperature maps that retain the large-scale statistical information. This figure was produced using the MultiDark simulations (Klypin et al. 2016; Knebe et al. 2018) and the methods in Cunnington et al. (2020b)

A number of Hi intensity mapping experiments are expected to launch in the next few years, with some of them already working with pathfinder data. These are the proposed MeerKLASS survey using the SKA Observatory’s MeerKAT precursor (Santos et al. 2017), FAST (Hu et al. 2020), BINGO (Battye et al. 2013; Wuensche 2019), CHIME (Bandura et al. 2014), HIRAX (Newburgh et al. 2016), Tianlai (Li et al. 2020; Wu et al. 2021), PUMA (Slosar et al. 2019), and CHORD (Vanderlinde et al. 2019). Existing experiments include Hi intensity mapping surveys performed with the Green Bank Telescope (GBT) (Chang et al. 2010; Switzer et al. 2013, 2015; Masui et al. 2013; Wolz et al. 2022) and Parkes (Anderson et al. 2018).

At cosmological distances, the 21-cm line is redshifted to very low frequencies, which alleviates the danger of line confusion that often plagues other lines. There is a one-to-one correspondence of observing frequency, \(\nu \), with redshift, z, given by:

$$\begin{aligned} \nu = \frac{1420.4}{1+z} \, \mathrm{MHz} . \end{aligned}$$
(115)

For this reason, there is no need of detecting and cataloging individual galaxies. Looking at some 3D region (voxel) on the sky, the radio telescope receives the total 21-cm intensity from that region, as demonstrated in Fig. 32. This is a proxy for the total amount of hydrogen in the voxel, which is then assumed to be a (biased) tracer for the total matter density. While the telescope beam can be quite large and erase the small-scale structure, the large-scale statistical information is retained. From Earth, the 21-cm line is measurable up to very high redshifts, \(z\sim 50\), and could reach \(z\sim 200\) with a lunar instrument (Furlanetto et al. 2006). This provides a unique opportunity for cosmology and astrophysics studies at high redshifts, where traditional galaxy surveys become shot noise limited.

It is important to consider the mode of operation of the telescope array. Purpose-built Hi intensity mapping experiments like CHIME and HIRAX are interferometers with elements that are closely packed together. Sparse arrays like MeerKAT and SKA-MID cannot provide enough short baselines to probe cosmological scales when used in interferometric mode. Instead, they need to operate in “single-dish” mode (Battye et al. 2013; Wang et al. 2021b), where the array is used as a collection of scanning auto-correlation dishes. This is necessary in order to map cosmological scales with sufficient sensitivity (Bull et al. 2015; Santos et al. 2015; SKA Cosmology SWG 2020).

A major challenge for the Hi intensity mapping method is the presence of strong astrophysical emission, or foreground contamination: 21-cm foregrounds such as galactic synchrotron (Zheng et al. 2017), point sources, and free-free emission, are bright in the relevant frequency ranges and can be orders of magnitude stronger than the cosmological Hi signal (see Fig. 33 and top panel of Fig. 34). Hence, they have to be removed.

Fig. 33
figure 33

Image reproduced with permission from Cunnington et al. (2019), copyright by the authors

Simulated full sky maps of different 21-cm foreground components at a frequency of 1136 MHz (\(z = 0.25\)). The frequency dependence of these foregrounds can be approximated by power laws with a running spectral index.

3.8.1.1 Modeling the observed Hi signal

We consider the 3D power spectrum as our main observable, and follow the formalism used in optical galaxy surveys analyses. Similarly to optical galaxies, redshift space distortions (RSD) introduce anisotropies in the observed Hi power spectrum. In order to account for this, we consider the power spectrum as a function of redshift z, k, and \(\mu \), where k is the amplitude of the wave vector and \(\mu \) the cosine of the angle between the wave vector and the line-of-sight (LoS) component. We model RSD by considering the Kaiser effect (Kaiser 1987), which is a large-scale effect dependent on the growth rate, f. To linear order, the anisotropic Hi power spectrum can be written as:

$$\begin{aligned} P_\text {H}\textsc {i}(k, \mu ) = \left( {\overline{T}}_\text {H}\textsc {i}b_\text {H}\textsc {i}+ {\overline{T}}_\text {H}\textsc {i}f \mu ^{2}\right) ^{2} P_\text {M}(k) + P_\text {SN} , \end{aligned}$$
(116)

where \(P_\text {SN} = {\overline{T}}_\text {H}\textsc {i}^2 (1/{\overline{n}})\) is the shot noise, \({\overline{n}}\) is the number density of objects, \(P_\text {M}(k)\) is the underlying matter power spectrum, \(b_\text {H}\textsc {i}\) is the Hi bias, and \({\overline{T}}_\text {H}\textsc {i}\) is the mean Hi brightness temperature (Chang et al. 2010; Battye et al. 2013):

$$\begin{aligned} {\overline{T}}_\text {H}\textsc {i}=44 \left( \frac{\varOmega _{\mathrm {HI}}(z) h}{2.45 \times 10^{-4}}\right) \frac{(1+z)^{2}}{H(z)/H_0} \mathrm {\mu K} \,. \end{aligned}$$
(117)

The \(P_{\mathrm{SN}}\) contribution is expected to be subdominant (smaller than the thermal noise of the telescope) and is usually neglected (Villaescusa-Navarro et al. 2018). The Hi abundance and clustering properties have been studied using simulations and semi-analytical modeling (see, e.g., Villaescusa-Navarro et al. 2018; Spinelli et al. 2020). In general, the clustering of Hi can be accurately described by perturbative methods (Castorina and White 2019), and maps can be constructed with N-body and approximate methods based on the halo model (Alonso et al. 2014; Villaescusa-Navarro et al. 2014; Padmanabhan et al. 2016, 2017; Carucci et al. 2017; Spinelli et al. 2022; Avila et al. 2022). For interpreting current and forthcoming observations, there is a pressing need to work towards end-to-end simulations including observational effects (Spinelli et al. 2022).

3.8.1.2 The telescope beam effect

The telescope beam introduces one of the main instrumental effects in the case of single-dish intensity mapping experiments. We can model this effect using a damping term dependent on the physical smoothing scale of the beam (see, e.g., Battye et al. 2013; Villaescusa-Navarro et al. 2017). Assuming the telescope beam can be modeled as a Gaussian, this is defined as \(R_\text {beam} = \sigma _\theta r(z)\), where \(\sigma _\theta = \theta _\text {FWHM} / ( 2\sqrt{2\ln (2)} )\), \(\theta _\text {FWHM} \sim \lambda /D_{\mathrm{dish}}\) is the full-width-half-maximum of the beam with diameter \(D_{\mathrm{dish}}\) at observation wavelength \(\lambda = 21(1+z)\) cm, and r(z) is the comoving distance to a redshift z. We emphasize that the angular resolution of single-dish surveys is very low, of the order \(\sim \)1 deg, while interferometers have much better angular resolution.

The Fourier transform of the telescope beam damping term is:

$$\begin{aligned} {\widetilde{B}}_\perp (k,\mu ) = \exp \left( \frac{-k^2 R_{\mathrm {beam}}^2(1-\mu ^2)}{2}\right) , \end{aligned}$$
(118)

and the power spectrum becomes:

$$\begin{aligned} P_\text {H}\textsc {i}(k, \mu ) = {\widetilde{B}}_\perp ^2(k,\mu ) \times \left[ \left( {\overline{T}}_\text {H}\textsc {i}b_\text {H}\textsc {i}+ {\overline{T}}_\text {H}\textsc {i}f \mu ^{2}\right) ^{2} P_\text {M}(k) + P_\text {SN} \right] . \end{aligned}$$
(119)

For surveys that are limited in frequency resolution, a similar effect will occur on the small radial scales. In cases where this might be relevant, a way to account for it is described in Blake (2019).

3.8.1.3 Thermal noise

Instrumental noise is determined by the telescope configuration and survey strategy (see, e.g., Battye et al. 2013; Bull et al. 2015; Pourtsidou et al. 2017, for detailed descriptions of representative cases). For a single-dish experiment, the pixel noise is assumed to be described by a Gaussian random field with spread given by:

$$\begin{aligned} \sigma _{\mathrm {pix}}=\frac{T_{\mathrm {sys}}(\nu )}{\sqrt{\delta _\nu t_{\mathrm {total}}\left( \varOmega _{\mathrm {pix}} / S_{\text{ area } }\right) N_{\mathrm {dishes}}}} . \end{aligned}$$
(120)

Here, \(T_{\mathrm{sys}}(\nu )\) is the system temperature (including receiver and sky components) at a given frequency, \(S_{\mathrm{area}}\) the sky area, \(\varOmega _{\mathrm{pix}} = 1.33 \theta ^2_{\mathrm{FWHM}}\), \(N_\mathrm{dishes}\) the number of dishes (this can also include multiple feeds/beams per dish), \(\delta _\nu \) the frequency channel bandwidth, and \(t_{\mathrm{obs}}\) the total observation time. The combination \(t_{\mathrm{obs}} (\varOmega _{\mathrm{pix}} /S_{\mathrm{area}})\) represents the time spent at each pointing. It follows that the noise power spectrum is:

$$\begin{aligned} P_\text {N} = \sigma _{\mathrm {pix}}^2 V_{\mathrm {pix}} , \end{aligned}$$
(121)

where \(V_\text {pix}\) is the voxel volume. Typical values for a cosmological survey using MeerKAT in single-dish mode are \(T_\mathrm{sys} \sim 30 \, \mathrm{K}\), \(N_{\mathrm{dishes}} = 64\), \(S_{\mathrm{area}}= 5,000 \, \mathrm{deg}^2\) and \(t_{\mathrm{total}} = 5,000 \, \mathrm{hrs}\).

For the simplest form of interferometer, a dual polarization array assuming uniform antennae distribution, the noise power spectrum is:

$$\begin{aligned} P^{\mathrm {N}}=T_{\mathrm {sys}}^{2} r^{2} y_{\nu }\left( \frac{\lambda ^{4}}{A_{\mathrm {e}}^{2}}\right) \frac{1}{2 n(u) t_{\mathrm {total}}}\left( \frac{S_{\mathrm {area}}}{\mathrm {FOV}}\right) . \end{aligned}$$
(122)

Here, \(A_e\) is the effective beam area, \(\mathrm{FOV} \approx \lambda / (D_{\mathrm{dish}})^2\), r is the comoving distance to the observation redshift z, and \(y_\nu = c(1+z)^2/(\nu _0H(z))\) with \(\nu _0 = 1420\) MHz. The distribution function of the antennae n(u) can be approximated as \(n(u) \simeq N^2_f / 2\pi u^2_{\mathrm{max}}\) for the uniform case, where \(N_f\) is the number of elements of the interferometer and \(u_{\mathrm{max}} \simeq D_{\mathrm{max}}/\lambda \) with \(D_{\mathrm{max}}\) the maximum baseline. Typical values for a compact instrument like HIRAX are \(T_{\mathrm{sys}}=50 \, \mathrm{K}\), \(N_f = 1024\), \(D_{\mathrm{dish}} = 6 \, \mathrm{m}\), \(D_{\mathrm{max}} = 250 \, \mathrm{m}\), \(S_{\mathrm{area}}= 15,000 \, \mathrm{deg}^2\) and \(t_{\mathrm{total}} = 10,000 \, \mathrm{hrs}\).

3.8.2 Sample selection

Hi intensity mapping maps the entire Hi flux coming from many galaxies together in large voxels. This means that we do not need to select individual galaxies. An advantage of the intensity mapping technique is that it is sensitive to all sources of Hi emission, regardless how faint. This is in contrast to traditional galaxy surveys, which are sensitive only above a flux cutoff. This makes Hi intensity mapping ideally suited to probe the global Hi content, a key quantity for galaxy formation and evolution studies.

Another fundamental choice is the bandwidth of observation. For example, MeerKAT can perform cosmological observations using its L-band (\(900-1420\) MHz) or UHF-band (\(580-1000\) MHz) receivers. The former corresponds to a redshift range \(0<z<0.58\), while the latter can probe \(0.4<z<1.45\). Band 1 of SKA-MID corresponds to a very wide redshift range, \(0.35<z<3\). Other examples are CHIME and HIRAX, with \(0.8<z<2.5\). Depending on the bandwidth of observation as well as the sky area coverage and total observing time, these Hi intensity mapping surveys can measure Baryon Acoustic Oscillations and Redshift Space Distortions, and search for signatures of primordial non-Gaussianity.

The selection of frequency bandwidth and patch of sky can also be tuned to try and mitigate known systematic effects:

  • Human-made Radio Frequency Interference (RFI) is a major source of contamination (Harper and Dickinson 2018). While methods for RFI flagging and removal do exist (Offringa et al. 2010; Akeret et al. 2017), it is important to perform observations in “radio-quiet” locations.

  • Foreground contamination from Galactic synchrotron, free-free emission, and point sources, can be orders of magnitude larger than the Hi cosmological signal (see, e.g., Oh and Mack 2003). Different regions of the sky are contaminated by the various foregrounds differently, and regions of the sky that are particularly complex (for example the Galactic plane) should be avoided.

  • At the time of writing, there has been no detection of the Hi auto-correlation signal due to residual foregrounds and other systematics (see, e.g., Switzer et al. 2013, 2015). The only available detections come from cross-correlating Hi maps with spectroscopic optical galaxies (Chang et al. 2010; Masui et al. 2013; Anderson et al. 2018; Wolz et al. 2022). While detecting the Hi auto-correlation is the primary aim, it is currently desirable that the chosen patch of sky overlaps with optical galaxy surveys.

3.8.3 Measurements

In this section, we summarize the main steps of a typical Hi intensity mapping data analysis procedure, based on the pioneering works by the GBT, Parkes, and MeerKLASS teams (Chang et al. 2010; Switzer et al. 2013, 2015; Masui et al. 2013; Anderson et al. 2018; Wang et al. 2021b). In general, single-dish observations require a scanning strategy where the dishes are rapidly moving across the sky. The goal is to keep the instrument gains (which are set by the so-called 1/f noise; Harper et al. 2018) constant, limited only by the thermal noise fluctuations while covering the relevant angular scales. The scanning strategy is also tuned in order to avoid false signals, for example ground spill and atmospheric emission.

Fig. 34
figure 34

Images reproduced with permission from Cunnington et al. (2019), copyright by the authors

Top: Angular power spectra for different simulated foregrounds, and the Hi cosmological signal. The black solid line represents the combined signal. All are at a frequency of 1136 MHz (\(z = 0.25\)). Bottom: Observed brightness temperatures along a chosen LoS through frequency (redshift).

3.8.3.1 From raw data to maps

The raw data is stored in time-stream blocks. The first step of the data analysis is to mitigate RFI contamination. This is facilitated by the high spectral resolution of the data (e.g. 4096 channels across 200 MHz of bandwidth for the GBT). Individual frequency channels are flagged and removed based on their variance. Any RFI in a block whose variance is not prominent enough to be flagged is identified as increased noise later on and down-weighted at the map-making stage. Some low-level RFI can be masked after map-making. In addition, aliasing issues and high variance often result in removing channels within a few MHz of the band edges, as well as channels in the receiver’s resonances. Before mapping, the data are re-binned (to \(\sim \)1 MHz bins). The time-stream data can be converted to sky maps with an inverse-noise weighted chi-squared minimization. This is a known method from CMB map-making (Tegmark 1997), and it produces the maximum likelihood (unbiased and optimal) estimate of the sky map assuming the noise is Gaussian. The algorithm also produces an inverse noise covariance matrix, useful for applying inverse-noise weights.

3.8.3.2 Foreground subtraction

Strong astrophysical foregrounds have to be separated from the cosmological Hi signal. Fortunately, these are expected to be spectrally smooth, following power-laws in frequency (Oh and Mack 2003; Santos et al. 2005; Seo et al. 2010), and can be removed if the calibration of the instrument is well controlled. The top panel of Fig. 34 demonstrates the differences in amplitude of the various foregrounds compared to the cosmological Hi signal, while the bottom panel demonstrates the differences in spectral smoothness.

Since cosmic Hi oscillates in a near-Gaussian fashion with frequency, in contrast to the slowly evolving foregrounds that are also orders of magnitude larger, the two can be separated (Liu and Tegmark 2011; Wolz et al. 2014; Shaw et al. 2015; Alonso et al. 2015b). Blind component separation methods aim to identify a set of smooth functions (the dominant foreground components) and subtract them from the observed maps to uncover the cosmological Hi signal. There is a wealth of different foreground removal algorithms, including parameterized fitting, non-parametric fitting, and mode projection (see Liu and Shaw 2020), for a comprehensive review). Principal Component Analysis (PCA) is a popular method that uses mode projection and exploits the fact that foregrounds are much larger in amplitude than the signal. PCA works by estimating the data frequency-frequency covarianceFootnote 16 matrix and then performing an eigenvalue decomposition. The strongest modes in frequency (the foregrounds) can then be identified and projected out. An advantage of this “blind” approach is that it can take into account a distortion of the smoothness of the foregrounds by the instrument, as it works by determining which modes are dominant in the observed data. However, the price to pay is that inevitably a part of the cosmological Hi signal will also be removed (Switzer et al. 2015). Other methods include Independent Component Analysis (ICA) (Chapman et al. 2012; Wolz et al. 2017b) and Generalized Morphological Component Analysis (GMCA) (Chapman et al. 2013; Carucci et al. 2020). The former works by maximizing non-Gaussianity, and the latter is a sparsity-based algorithm that works with the spatial structure of the foregrounds in wavelet space. An example of a non-parametric fitting method is Gaussian Process Regression (Mertens et al. 2018; Soares et al. 2021). Other methods include the Generalized Needlet Internal Linear Combination (GNILC) (Olivari et al. 2018) and Kernel PCA (Irfan and Bull 2021). For recent comparisons of different foreground removal methods using real data and simulations, see Hothi et al. (2020), Cunnington et al. (2021a), Spinelli et al. (2022).

Fig. 35
figure 35

Simulated Hi maps before and after foreground cleaning with PCA. From left to right: a map with simulated Hi signal with added thermal (instrumental) noise; the same map with added 21-cm foregrounds; the “cleaned” map after performing foreground removal with PCA. This figure was produced using the publicly available code gpr4im (Soares et al. 2021) and MeerKAT-like simulated data products at a frequency of 1136 MHz (\(z = 0.25\))

In Fig. 35 we have taken simulated Hi intensity maps, and added thermal (instrumental) noise and 21-cm foregrounds. We have then performed a PCA foreground cleaning. An important choice made by hand is how many principal components, \(N_{\mathrm{FG}}\), to remove. In this case, we show results with \(N_\mathrm{FG}=3\), which is expected to be near optimal for idealized simulated cases like the ones we have considered here (Alonso et al. 2015b; Wolz et al. 2017b). However, a much higher \(N_{\mathrm{FG}} \sim 30\) has been required for real data analyses (Masui et al. 2013; Wolz et al. 2022).

3.8.3.3 Power spectrum estimator

When performing an Hi intensity mapping survey, it is useful to separately analyze sub-datasets taken at different times (seasons) so that the thermal noise of the instrument is independent in each map. This way the Hi power spectrum can be constructed by cross-correlating (and then averaging over) different sub-datasets; this procedure has the advantage that the final power spectrum is free of the additive thermal noise bias (Switzer et al. 2013; Masui et al. 2013). The method can also suppress systematics like time-dependent RFI.

Intensity maps are over-temperatures measured as a discrete function of position, \(\delta (\mathbf {x}_i)=T(\mathbf {x}_i)-\bar{T}\), where \(\bar{T}\) is the mean temperature at each frequency slice. The total number of pixels, \(N_{\mathrm{pix}}=N_x \cdot N_y \cdot N_z\), is defined by the angular grid and the number of frequency bins. It follows that the Fourier transform of the temperature field is a function of wavevector \(\mathbf {k}_\ell \). We can write:

$$\begin{aligned} {\tilde{\delta }}(\mathbf {k}_\ell ) = \sum ^{N_{\mathrm{pix}}}_{j=1} \delta (\mathbf {x}_j)w(\mathbf {x}_j)\mathrm{exp}(i\mathbf {k}_\ell \cdot \mathbf {x}_j) , \end{aligned}$$
(123)

where \(w(\mathbf {x}_j)\) is a weighting function normalized to unity.

Fig. 36
figure 36

Measured Hi power spectra demonstrating the Hi signal loss effect after foreground cleaning with PCA (\(N_{\mathrm{FG}}=3\)). This figure was produced using the publicly available code IntensityTools (Cunnington et al. 2019; Blake 2019; Soares et al. 2021) and MeerKAT-like simulated data products at \(0.2< z < 0.58\)

Let us now introduce the inverse-noise weighted power spectrum estimator in the flat-sky approximation as used in the GBT and Parkes analysesFootnote 17 (Masui et al. 2013; Wolz et al. 2017b; Anderson et al. 2018; Wolz et al. 2022). For the cross-correlation of two sub-dataset maps A and B, we have:

$$\begin{aligned} {{\hat{P}}}^{AB}(\mathbf {k}_l)=\frac{V_{\mathrm{cell}} \mathrm {Re}\{ \tilde{\delta }^A(\mathbf {k}_l)\cdot {{\tilde{\delta }}}^B(\mathbf {k}_l)^*\} }{\sum _{j=1}^{N_{\mathrm{pix}}} w^A(\mathbf {x}_j)\cdot w^B(\mathbf {x}_j)} , \end{aligned}$$
(124)

with \(V_{\mathrm{cell}} = V_s / N_{\mathrm{pix}}\), where \(V_s = L_x \cdot L_y \cdot L_z\) is the comoving physical volume of the data cube. For Hi intensity maps, \(w(\mathbf {x}_j)\) is given by the inverse noise map of each season. The estimator can be straightforwardly recast for the cross-correlation of intensity maps with optical galaxies, denoted with subscript “opt”. The total weighting factor is then \(w(\mathbf {x}_j)=W(\mathbf {x}_j)w_{\mathrm{opt}}(\mathbf {x}_j)\), where \(w_{\mathrm{opt}}(\mathbf {x}_j)\) is given by optimal weighting function \(w_{\mathrm{opt}}(\mathbf {x}_i)=1/(1+W(\mathbf {x}_i)\times {\bar{N}} P_0)\), with \(P_0=10^3 h^{-3}\mathrm {Mpc}^3\), and the selection function \(W(\mathbf {x}_j)\) (Feldman et al. 1994). The 1D power spectra, \({{\hat{P}}}(k)\), are determined by averaging all modes with \(k = |\mathbf {k}|\) within the k bin width. This is the well known power spectrum monopole. We can also compute higher order multipoles like the quadrupole and hexadecapole following the multipole expansion formalism (Blake 2019; Cunnington et al. 2020b).

In Fig. 36 we show the measured power spectra from simulations with an input (fiducial) Hi signal, to which foregrounds are added and then removed using PCA. We can immediately see that the process of foreground cleaning results in large-scale Hi signal loss. Accounting for this effect is crucial in order to get unbiased Hi and cosmological constraints (Masui et al. 2013; Bernal et al. 2019; Cunnington et al. 2020a; Soares et al. 2021). More details on how this can be done will be presented in Sect. 3.8.5, where we will also describe ways to estimate the statistical uncertainties in the measurements.

3.8.4 Systematic effects

The main known sources of systematic uncertainties affecting the Hi measurements are: foreground contamination, 1/f noise, Radio Frequency Interference, calibration errors, and primary beam effects. We summarize these in the list below.

3.8.4.1 Foregrounds and polarization leakage

We have already discussed that spectrally smooth foregrounds can be many orders of magnitude larger than the Hi cosmological signal (see Fig. 34), and how their removal with methods like PCA or FastICA results in large-scale signal loss (Fig. 36). In addition, the interplay between polarized foregrounds and the instrument leads to polarization leakage, a non-smooth component that further complicates the cosmological analysis. For detailed studies on this subject in the context of Hi intensity mapping, see Shaw et al. (2015), Alonso et al. (2014), Alonso et al. (2015b), Carucci et al. (2020), Cunnington et al. (2021a). The auto-correlation of intensity maps is biased by residual foregrounds. However, these residuals and other survey-specific systematics are expected to drop out in cross-correlation with optical galaxy surveys, and that is why detections to date have only been achieved with cross-correlations (Masui et al. 2013; Anderson et al. 2018; Wolz et al. 2022; Cunnington et al. 2022). This means that significant advances in calibration, simulations, and data analysis techniques are needed for the 21-cm foreground removal to work at the level required for precision cosmology. Working with pathfinder data (see, e.g., Cunnington et al. 2022) will show whether this is possible with current and forthcoming instruments, or whether we need a future generation of purpose-built radio telescopes (Ahmed et al. 2019).

3.8.4.2 1/f noise

This is a form of time-correlated noise component that manifests itself as gain fluctuations and leads to stripes in the Hi intensity maps. This noise can be mitigated with a fast enough scanning strategy, reduced by applying conservative PCA cleaning in the time-ordered data, and/or calibrated out. For detailed studies of this subject in the context of Hi intensity mapping, see Bigot-Sazy et al. (2015), Harper et al. (2018); Li et al. (2021b).

3.8.4.3 RFI

We have already mentioned Radio Frequency Interference (RFI), which can originate from terrestrial telecommunications as well as navigation satellites (Harper and Dickinson 2018), and is a major problem for all radio observations. Even if the experiment employs RFI mitigation systems, it has been shown that RFI can still dominate thermal noise in several channels within the band (Switzer et al. 2013; Masui et al. 2013; Wang et al. 2021b), resulting in significant signal loss (\(\sim \)11% for the GBT). In Sect. 3.8.3 we discussed how RFI flagging and removal is performed, although this is likely not optimal considering the requirements of forthcoming Hi intensity mapping experiments. For example, missing frequency channels as a result of RFI flagging can compromise the performance of foreground removal methods (Carucci et al. 2020; Soares et al. 2021).

3.8.4.4 Calibration

Bandpass and flux calibration errors can have a large impact on the Hi signal recovery. A successful calibration process must calibrate the receiver gain fluctuations, account for the bandpass spectrum that multiplies the true sky signal, and calibrate the total power. The main calibration procedures are using periodic noise diodes as relative calibration references and tracking known astronomical sources, each of which has its own limitations and uncertainties (Masui 2013; Newburgh et al. 2014; Anderson et al. 2018; Wang et al. 2021b). For the GBT observations used in Masui et al. (2013), uncertainties on the calibration of the reference flux scale and the measurements of calibration sources with respect to this reference, uncertainty of the measured fluxes, receiver non-linearity, beam shape irregularities and other variations led to a 9% total calibration systematic error. This translates to systematic errors in the derived Hi constraints below the statistical errors for the GBT levels of thermal noise, but for future experiments aiming to perform high precision cosmological measurements calibration levels must be improved.

3.8.4.5 Primary beam effects

In the vast majority of Hi intensity mapping literature, the telescope beam effect is approximated by a perfect Gaussian smoothing, like in Eq. (118). But in reality, there are side-lobes in the beam profile; the primary beam can also distort the frequency structure of the foregrounds due its own dependence on frequency. A way to mitigate this and other issues related to the instrumental response is to convolve all maps to a common resolution, higher than the one of the largest beam in the frequency band (see, e.g., Switzer et al. 2015; Wolz et al. 2017b). However, the way this convolution is done is based on the Gaussian beam model. Side-lobes further complicate foreground removal, and end-to-end simulations will be necessary in order to address this challenge (Matshawule et al. 2021; Spinelli et al. 2022).

3.8.5 Main results and forecasts

Here we summarize the main data analysis results and forecasts. We begin by describing how the GBT measurements and power spectrum analyses have been performed. Then we present current constraints as well as forecasts on Hi parameters. We end this section by listing some of the cosmological forecasts that have been performed to demonstrate the ability of Hi intensity mapping surveys to constrain dark energy, gravity, and the initial conditions of the Universe.

3.8.5.1 Data analyses

The most comprehensive Hi intensity mapping analyses to date have been performed using GBT observations, alone and in combination with optical galaxy surveys (Chang et al. 2010; Switzer et al. 2013, 2015; Masui et al. 2013; Wolz et al. 2017b, 2022). In this section, we will describe how these analyses were performed, and present the detections and constraints they achieved. For a comprehensive description of the GBT pipeline and analysis softwareFootnote 18 we refer the reader to Masui (2013).

The GBT data we work with cover \(100 \, {\mathrm{deg}}^2\) on the sky and a redshift range \(0.6<z<1\). The data is contaminated by RFI and two telescope resonance frequencies. To suppress these effects, several frequency (redshift) channels were removed. The GBT also uses 4 Sections (sub-seasons) \(\{A,B,C,D\}\) to suppress thermal noise bias as described in Sect. 3.8.3. The noise is large and highly anisotropic towards the edges of the map, therefore 15 pixels per side are masked from the analysis. Foreground removal on the GBT data has been performed using PCA (Switzer et al. 2013; Masui et al. 2013) and FastICA (Wolz et al. 2017b, 2022). The GBT beam can be approximated by a Gaussian with a frequency-dependent FWHM, and in order to mitigate some systematic effects as explained in section 3.8.3, the maps are convolved to a common resolution of \(0.44 \, \mathrm{deg}\).

Here, we concentrate on the most recent analysis presented in Wolz et al. (2022). In the top panel of Fig. 37 we show the (masked) Sections A and D after using \(N_{\mathrm{FG}}=36\) in the FastICA foreground removal process. In principle, the cross-correlation of Sections (e.g. \(A \times B\), \(A \times D\), etc.) should be a proxy for the Hi auto power spectrum. For this, the GBT analysis using the estimator of Eq. (124). A correction is applied to the power spectrum estimate for the telescope beam effect using the discretized, Fourier-transform Gaussian beam of Eq. (118).

Fig. 37
figure 37

Top: GBT data Sections A and D used in the analysis of Masui et al. (2013), Wolz et al. (2022) with the masking choices detailed in the main text. The different Sections correspond to different data seasons and help mitigate thermal noise bias and other systematic effects. Bottom: GBT-WiggleZ cross-correlation power spectrum (top) and a null diagnostic test (bottom) using data from the analysis of Wolz et al. (2022)

The GBT data are noise dominated. Therefore, for the cross-correlation of the different Sections we can estimate the measurement errors as:

$$\begin{aligned} \sigma ({\hat{P}}^{AB}(k_i)) = P_{\mathrm{noise}}(k_i) / \sqrt{2 N(k_i)} , \end{aligned}$$
(125)

with N the number of independent measured modes in the k bin, and the factor \(1/\sqrt{2}\) accounts for the fact that the two maps are independent. There are various approaches for estimating \(P_\mathrm{noise}\), such as using the power spectrum of each sub-dataset after the foreground removal as a proxy for the noise (for more details, see, e.g., Wolz et al. 2017b).

Despite the thermal noise bias mitigation, the Hi auto power spectrum result is an order of magnitude higher than what is expected from theory (Switzer et al. 2013). This is because the data suffers from systematic effects and we have to resort to cross-correlations with optical galaxies to mitigate them and achieve a detection.

The first detection of the cross-correlation between LSS and Hi intensity maps at \(z \sim 1\) was reported in (Chang et al. 2010), using data from the GBT and the DEEP2 galaxy survey. A more significant detection using GBT intensity maps and overlapping WiggleZ galaxies was achieved in Masui et al. (2013), and again in Wolz et al. (2022) at the level of \(\sim 5 \sigma \). The latter study also achieved \(\sim 5 \sigma \) detections using the LRG and ELG samples from SDSS-eBOSS. In the bottom panel of Fig. 37 we show the measured GBT-WiggleZ cross-correlation power spectrum with \(N_{\mathrm{FG}}=36\) used in FastICA for the foreground cleaning of the GBT Hi maps. We also show a null diagnostic test plotting the ratio of data and error. The error in the galaxy-Hi cross-correlation is estimated as:

$$\begin{aligned} \sigma \left( {\hat{P}}_{\mathrm {g,HI}}\left( k_{i}\right) \right) =\sqrt{\frac{1}{2 \cdot N\left( k_{i}\right) }} \sqrt{{\hat{P}}_{\mathrm {g,HI}}\left( k_{i}\right) ^{2}+{\hat{P}}_{\mathrm {g}}\left( k_{i}\right) {\hat{P}}^{A B}\left( k_{i}\right) } . \end{aligned}$$
(126)

An important component of the GBT data analysis is the use of the transfer function formalism to quantify and correct for the Hi signal loss due to foreground removal (see Fig. 36). We will give a brief description of how this works here, and we refer the interested reader to Switzer et al. (2013), Wolz et al. (2022) for the details. Suppose we have a set of mock (simulated) Hi signal data, denoted by m, and our real (observed) data, d. Let us also denote cross-correlation of our real and mock data cubes by “,”. If our real data was completely free of foreground contamination, we would simply have \(P(d+m,m) = P(m)\). This means that if we inject the mock into the data and then cross-correlate with the mock we would get back the power spectrum of the mock, i.e., \(P(d,m)=0\) (the data corresponds to a different realization). But foreground effects will distort this picture and we wish to introduce a transfer function, T(k), to compensate for the unavoidable signal loss. We will then have:

$$\begin{aligned} P(\mathrm {FG}(d+m),m) = P(m) \cdot T , \end{aligned}$$
(127)

where \(\mathrm {FG}(d+m)\) corresponds to foreground cleaning of the \((d+m)\) combined data cube, which takes into account the real data effects and systematics. The above formula defines the transfer function (in reality this is constructed in 2D, (\(k_\parallel , k_\perp \))). The signal loss can reach very high levels (\(\sim \)50%) depending on scale (Wolz et al. 2022), and therefore the transfer function correction is necessary in order to recover the true Hi power spectrum. Assuming the chosen fiducial cosmology (which is kept fixed in the GBT analyses) is correct, we can then proceed to perform a best-fit analysis for constraining Hi quantities.

3.8.5.2 Hi measurements and forecasts

In the post-reionization era, Hi intensity mapping provides an excellent probe of galaxy evolution. We will first review the main findings of the GBT (Masui et al. 2013; Wolz et al. 2022) and Parkes (Anderson et al. 2018) cross-correlation analyses. The model for \(P_{\mathrm{g},\text {H}\textsc {i}}\) is given by:

$$\begin{aligned} P_{\mathrm{g}, \text {H}\textsc {i}}(k) = \bar{T}_\text {H}\textsc {i}b_\text {H}\textsc {i}b_\mathrm{g}r_{\text {H}\textsc {i},\mathrm{opt}} P_{\delta \delta }(k) , \end{aligned}$$
(128)

with \(b_\text {H}\textsc {i}\) the Hi bias, \(b_{\mathrm{g}}\) the optical sample bias (WiggleZ, eBOSS ELGs, eBOSS LRGs), \(r_{\text {H}\textsc {i},\mathrm{opt}}\) the galaxy-hydrogen correlation coefficient, and \(P_{\delta \delta }(k)\) the nonlinear matter power spectrum including a linear RSD boost (for more details and a discussion on the assumptions and limitations of this empirical model, see Wolz et al. 2022). The coefficient \(r_{\text {H}\textsc {i},\mathrm{opt}}\) is dependent on the Hi content of the galaxy sample. The model is run through the same pipeline as the data to include weighting, beam, and window function effects. With the cosmology and optical bias values kept fixed, and using Eq. (117), we can fit the unknown pre-factor \(\varOmega _\text {H}\textsc {i}b_\text {H}\textsc {i}r_{\text {H}\textsc {i},\mathrm{opt}}\) to the data.

Following this procedure, Masui et al. (2013) measured the GBT maps cross-correlation with the WiggleZ 15hr and 1hr fields. Fitting data in the range of scales \(0.05 \, h\mathrm{Mpc}^{-1}< k < 0.8 \, h\mathrm{Mpc}^{-1}\), they found \(10^3\varOmega _\text {H}\textsc {i}b_\text {H}\textsc {i}r = 0.40 \pm 0.05\) for the combined, \(10^3\varOmega _\text {H}\textsc {i}b_\text {H}\textsc {i}r = 0.46 \pm 0.08\) for the 15hr field and \(10^3\varOmega _\text {H}\textsc {i}b_\text {H}\textsc {i}r = 0.34 \pm 0.07\) for the 1hr field. For a more restrictive range of scales, their combined measurement was \(10^3\varOmega _\text {H}\textsc {i}b_\text {H}\textsc {i}r = 0.44 \pm 0.07\). The errors quoted are statistical, and Masui et al. (2013) also estimated a \(\pm 0.04\) systematic error.

With similar methodology and considering three different ranges of scales, Wolz et al. (2022) found \(\varOmega _{\text {Hi}} b_{\text {Hi}} r_{\text {Hi},\mathrm{Wig}} = [0.58 \pm 0.09 \, \mathrm{(stat) \pm 0.05 \, \mathrm{(sys)}}] \times 10^{-3}\) for GBT-WiggleZ, \(\varOmega _{\text {Hi}} b_{\text {Hi}} r_{\text{Hi},\mathrm{ELG}} = [0.40 \pm 0.09 \, \mathrm{(stat) \pm 0.04 \, \mathrm{(sys)}}] \times 10^{-3}\) for GBT-ELG, and \(\varOmega _{\text {Hi}} b_{\text {Hi}} r_{\text {Hi},\mathrm{LRG}} = [0.35 \pm 0.08 \, \mathrm{(stat) \pm 0.03 \, \mathrm{(sys)}}] \times 10^{-3}\) for GBT-LRG, at \(z\simeq 0.8\) and an effective scale \(k_\mathrm{eff}=0.31 \, h/\mathrm{Mpc}\). Results were also reported at \(k_\mathrm{eff}=0.24 \, h/\mathrm{Mpc}\) and \(k_{\mathrm{eff}}=0.48 \, h/\mathrm{Mpc}\). The latter case corresponds to the same range of scales considered in Masui et al. (2013), who found \(10^3\varOmega _\text {H}\textsc {i}b_\text {H}\textsc {i}r_{\text {H}\textsc {i},\mathrm{Wig}} = 0.34 \pm 0.07\) for the same field. These results imply that red galaxies are more weakly correlated with Hi on the scales under consideration, suggesting that Hi is more associated with blue star-forming galaxies and tends to avoid red galaxies. This is in qualitative agreement with what was found in Anderson et al. (2018), at a lower redshift \(z=0.08\) (it is also expected from galaxy evolution studies). Anderson et al. (2018) cross-correlated Parkes Hi intensity maps with red and blue galaxies from the 2dF survey sample. Making some further assumptions, Wolz et al. (2022) also derived constraints on \(\varOmega _{\mathrm{HI}}(z\simeq 0.8)\), which are shown in the left panel of Fig. 38. With little information on Hi parameters beyond the local Universe, these are amongst the most precise \(\varOmega _{\mathrm{HI}}\) constraints in an under-explored redshift range.

Forecasts using the proposed SKA-MID and SKA-LOW surveys are shown in the right panel of Fig. 38. These use the anisotropic Hi power spectrum (Eq. (116)) to break the degeneracy between \(\varOmega _{\mathrm{HI}}\) and \(b_{\mathrm{HI}}\) (Wyithe 2008; Masui et al. 2010; Pourtsidou et al. 2017). Similar measurements can be achieved with instruments like CHIME and HIRAX, the MIGHTEE survey (Paul et al. 2021; Chen et al. 2021b), and ASKAP (Wolz et al. 2017a).

Fig. 38
figure 38

Left: Estimates for \(\varOmega _\text {H}\textsc {i}\) from Wolz et al. (2022) compared to other measurements in the literature (see Crighton et al. (2015) and references therein). All estimates are at a central redshift \(z=0.78\) but they have been staggered for clarity. Right: Forecasts for \(\varOmega _\text {H}\textsc {i}\). This figure was produced using the publicly available code IM-Fish (Pourtsidou et al. 2017) and SKA-like specifications (SKA Cosmology SWG 2020)

3.8.5.3 Cosmological forecasts

The cosmological forecasts literature for Hi intensity mapping surveys is exhaustive. The main result is that assuming excellent calibration and mitigation of systematic and foreground contamination effects, Hi intensity mapping experiments can complement and compete with the largest and best Stage-IV optical galaxy surveys. Both single-dish and interferometric Hi intensity mapping surveys can probe dark energy, gravity, and the initial conditions of the Universe at a level comparable to optical surveys like Euclid (Laureijs et al. 2011; Blanchard et al. 2020) and VRO/LSST (LSST Dark Energy Science Collaboration et al. 2018). Here, we summarize the main findings and caveats. Unless otherwise stated, we quote \(1\sigma \) forecast marginal errors for the various parameters, and give a few representative references for the interested reader.

  • Large sky Hi intensity mapping surveys with radio telescopes like MeerKAT and SKA-MID (in single-dish mode), Tianlai, CHIME, HIRAX, PUMA, and SKA-LOW can use the Hi power spectrum to probe galaxy evolution and cosmology at a very wide redshift range (\(0< z < 6\)). Using the CPL parameterization for the dark energy EoS (Eq. (9)), the forecasts give \(\sigma (w_0) \sim 0.05\), \(\sigma (w_a) \sim 0.15\), with the fiducial values \((w_0,w_a)= (-1,0)\) for the \(\Lambda \)CDM model. Parameterizing the growth of structure as \(f(z)=\varOmega _m(z)^\gamma \) (Lahav et al. 1991; Linder 2003) the forecasts give \(\sigma (\gamma ) \sim 0.03\), with the fiducial value \(\gamma = 0.55\) for GR. For more details, see e.g. Chang et al. (2008), Masui et al. (2010), Battye et al. (2013), Hall et al. (2013), Bull et al. (2015), Cosmic Visions 21 cm Collaboration et al. (2018), Heneka and Amendola (2018), Weltman et al. (2020), Liu et al. (2020). The neutrino mass can also be constrained, \(\sigma (M_\nu ) \sim 0.3\) eV (95% CL) (Villaescusa-Navarro et al. 2015). Most forecasts are very optimistic, assuming perfect instrument calibration and foreground removal. In addition, there are degeneracies between the Hi parameters and the cosmological parameters, for example Hi intensity mapping surveys without prior assumptions can constrain \(\bar{T}_{\mathrm{HI}}f\sigma _8\) and not \(f\sigma _8\) like optical galaxy surveys. Forecasts taking into account some of these caveats are presented in Padmanabhan et al. (2019), Bernal et al. (2019), Camera and Padmanabhan (2020), Soares et al. (2021).

  • The aforementioned surveys can probe ultra-large scales and constrain the primordial non-Gaussianity parameter, \(f_{\mathrm{NL}}\), to a level \(\sigma (f_{\mathrm{NL}}) \sim 1\) (Camera et al. 2013; Alonso et al. 2015a; Karagiannis et al. 2020; Barreira 2022). However, foreground removal effects can lead to large degeneracies and biased estimates (Cunnington et al. 2020a). These need to be controlled for Hi intensity mapping to reach its full potential.

  • Joint analyses of Hi intensity maps, optical galaxy surveys (galaxy clustering and cosmic shear), and CMB experiments, can be a powerful way to mitigate systematic effects and constrain Hi and cosmological parameters (Wyithe 2008; Masui et al. 2010; Pourtsidou et al. 2016; Villaescusa-Navarro et al. 2015; Pourtsidou et al. 2017; SKA Cosmology SWG 2020; Viljoen et al. 2020). Using the multiple tracers method can suppress cosmic variance on large scales and provide the most precise measurements of primordial non-Gaussianity and general relativistic effects (see, e.g., Alonso and Ferreira 2015; Fonseca et al. 2015, 2017; Witzemann et al. 2019).

  • More futuristic prospects include Hi intensity mapping lensing (Pourtsidou and Metcalf 2014; Jalilvand et al. 2019), exploiting higher order statistics such as the bispectrum (Karagiannis et al. 2021; Cunnington et al. 2021b), and cross-correlations between gravitational wave detections and Hi intensity maps (Scelfo et al. 2022).

3.9 Surface brightness fluctuations

Tonry and Schneider (1988) introduced the method of surface brightness fluctuations (SBF hereafter) as a way to obtain distances to stellar systems based on the discrete nature of star counts. As refined in later papers (e.g., Tonry et al. 1990; Jensen et al. 1998; Blakeslee et al. 1999a; Cantiello et al. 2005; Mei et al. 2005a), the SBF method uses the stochastic nature of star counts and luminosities to measure a quantity that is closely linked to the mean brightness of the red giant branch (RGB) star population in a galaxy or other stellar system.

3.9.1 Basic idea and equations

Qualitatively, the idea behind the method is simple, as illustrated in Fig. 39 (panels a–c). Stars that can be individually identified in nearby stellar systems gradually blend into a smooth brightness distribution as the distance increases, but the discrete nature of the stars can be discerned through statistical fluctuations in the integrated flux of the stars per resolution element. These fluctuations are lower relative to the mean surface brightness (i.e., the galaxy appears smoother) at larger distances.

Observationally, SBF is the ratio of the intrinsic variance (correcting for the blurring by the point spread function, PSF) of the stellar light distribution of a region of a galaxy to the mean surface brightness within the same region. In the nearby Universe, galaxy surface brightness is independent of distance, but the variance per unit solid angle decreases as distance squared. The ratio of the variance to the mean has units of flux and constitutes the SBF observable. Although it may be harder to visualize than other standard candles, such as supernovae or Cepheids, it is as rigorously defined, and scales in the same way with distance. Physically, the SBF is related to the ratio of the first and second moments of the stellar luminosity function within the region analyzed.

For example, consider a galaxy that projects a stellar population of \(n_i\) stars of flux \(f_i\) (where \(i=1,..., N\) covers the entire flux interval, i.e. all evolutionary phases, of the stellar population) on a particular pixel k in an image. Along an isophote (the locus of pixels of equal surface brightness within the galaxy) there are many, say M, independent realizations of the population [\(n_i\), \(f_i\)]. Each pixel can be considered a realization. Ignoring, for the moment, the PSF blurring, these realizations obey Poisson statistics, and the first two moments of the stellar intensity distribution can be written as:

  • \((N\times M)^{-1}\times \sum _{k=1}^{M} \sum _{i=1}^{N} (n_{k,i} \times f_{k,i}) \) is the average surface brightness per realization (or pixel);

  • \((N\times M)^{-1}\times \sum _{k=1}^{M} \sum _{i=1}^{N} (n_{k,i} \times f_{k,i}^2)\) is the mean-squared flux of the realizations.

The index k runs over the pixel realizations, and i runs over the stellar luminosity function bins. If we assume that the same form of the underlying luminosity function applies to all the pixels in the region being analyzed, then the mean flux per stellar bin is independent of the pixel: \(f_{k,i} = f_i\). The mean SBF flux, which is defined by Tonry and Schneider (1988) as the ratio of the second to the first moment of the flux along the isophote, then becomes:

$$\begin{aligned} \bar{f} = \frac{\,\sum _{i} \sum _{k} n_{k,i} \times f_{i} ^ 2}{\,\sum _{i} \sum _{k} n_{k,i} \times f_{i}} \, = \, \frac{\,\sum _{i} n_{i} \,f_{i} ^ 2}{\,\sum _{i} n_{i} \,f_{i}\,} . \end{aligned}$$
(129)

Thus, \(\bar{f}\) is the flux-weighted mean stellar flux in the region of the isophote being analyzed. The corresponding luminosity is \(\bar{L}\), which is equal to the ratio of the first two moments of the stellar luminosity function and can be readily calculated from stellar population models. Because of the squared weighting in the numerator, the SBF signal is dominated by the brightest stars in the population.

Fig. 39
figure 39

Illustration of SBF observations and measurements. (a) Simulation of the stellar population in a spheroidal galaxy at the distance of the Virgo cluster (\(D_\mathrm{Virgo}\simeq 16.5\, \mathrm {Mpc}\), Blakeslee et al. 2009) as observed with the E-ELT in \(\sim \)1 h (Cantiello et al., 2021, in prep.). (b) Same as in panel (a), but for a galaxy ten times more distant. (c) Same as in panel (a), but for a galaxy fifty times more distant. Stars, which appear marginally resolved in panel (a), blend together into a smooth brightness profile at larger distances. (d) Near-infrared image of NGC 1399 from the HST WFC3 camera. (e) Model of NGC 1399’s surface brightness distribution derived from the WFC3/IR image. (f) Residual frame, obtained from the galaxy image (d) minus the model (e). (g) Typical luminosity function analysis for estimating the “residual variance” \(P_r\) due to contaminating sources: green squares show the data, the blue curve and red line show the fits to the globular cluster and background galaxy luminosity functions, respectively, and the solid black line is the combined model luminosity function (data and fits are from Cantiello et al. 2011). The vertical gray dashed line indicates the GCLF turnover magnitude and the shaded area shows the magnitude interval where the detection is incomplete. (h) Color-magnitude diagram of an old stellar population (data for the MW globular cluster NGC 1851 from Piotto et al. 2002); the RGB/AGB population is highlighted with red dots. (f) A schematic illustration of the SBF power spectrum analysis

In practice, the SBF measurement is done over finite regions of the galaxy. It is not necessary for the surface brightness to be constant, but the stellar luminosity function should not vary significantly over the region. One deals with the varying surface brightness by subtracting a smooth model for the light distribution and measuring the fluctuations in the residual image. In this case, the numerator of \(\bar{f}\) becomes the variance with respect to the mean surface brightness. For a fully rigorous statistical treatment of SBF statistics, see Cerviño et al. (2008).

The SBF apparent magnitude is defined as \({\overline{m}}=-2.5\,\log (\bar{f})+ m_\text {ZP}\), where \(m_\text {ZP}\) is the magnitude zero-point of the system. Although \({\overline{m}}\) can be measured for any galaxy, this does not mean that a useful distance can be derived. One also needs a reliable calibration of \({\overline{M}}\), the absolute magnitude that gives the correct distance modulus \((m{-}M)\) for a galaxy with the measured \({\overline{m}}\). \({\overline{M}}\) depends only on the photometric bandpass and the stellar population in the galaxy. Thus, unlike other galaxy-based distance indicators, SBF does not depend on the mass, effective radius, dynamics, or environment of the galaxy, although these properties may influence the stellar population.

The measured SBF magnitude \({\overline{m}}\) and a proper calibration of \({\overline{M}}\) are presently used to determine accurate distances moduli to galaxies within \(D\sim 150\) Mpc, enabling robust constraints on the Hubble parameter in the Local Volume via the Hubble-Lemaître law:

$$\begin{aligned} H_0=\langle v/D \rangle , \end{aligned}$$
(130)

where v is the flow-corrected recessional velocity of the target galaxies.

Of course, not all stellar populations are created equal. In particular, galaxies that have undergone recent star formation have poorly calibrated values of \({\overline{M}}\), which causes systematic uncertainty in their SBF distance. For this and other reasons, elliptical galaxies are the preferred targets for SBF studies. Until recently, challenges with data depth and quality have prevented precise distance measurements for significant samples of galaxies reaching into the Hubble flow, but datasets and observing strategies have improved (Cantiello et al. 2018a; Blakeslee et al. 2021; Jensen et al. 2021), making SBF a powerful cosmological probe with a bright future.

3.9.2 Sample selection

Measuring SBF magnitudes requires careful modeling and subtraction of the galaxy light distribution. As described later (Sect. 3.9.3.1), any small-scale residual features with structure on the scale of the PSF will complicate the analysis. Such features may be associated with dust, bars, shells, or other irregularities. In severe cases, the SBF signal may be entirely overwhelmed. As a result, the smoothest, most featureless galaxies are the prime targets for the SBF method; that is to say, \({\overline{m}}\) is easiest to measure in giant ellipticals and other early-type galaxies with substantial bulge components.

Of course, the stellar fluctuations must be sufficiently bright in order to detect the SBF signal; for distances in the Hubble flow, this requires the reddest optical bands or observing in the near-IR. The contamination from dust is also reduced at these wavelengths. However, beyond \(\sim 2\) \(\upmu \)m, the uncertainties in the calibration become too large for precise distances. Below we discuss these issues of sample and bandpass selection in more detail.

3.9.2.1 Choosing the galaxies

In addition to simplifying the galaxy subtraction for the \({\overline{m}}\) measurement, early-type galaxies tend to be dominated by old stellar populations (Fig. 39, panels e and h), which simplifies the \({\overline{M}}\) calibration. This can be seen from empirical plots of the \({\overline{M}}\) versus color relations (e.g., Blakeslee et al. 2009, 2010; Jensen et al. 2015; Cantiello et al. 2018a; Carlsten et al. 2019) for galaxies at a common distance. The empirical relations show that for red early-type galaxies, the correlations between the absolute SBF magnitude in red or near-IR bands and broad-baseline optical color have lower scatter than for fainter, bluer galaxies. In selected passbands (see below), the small intrinsic scatter in the \({\overline{M}}\)-color calibration relations for red galaxies in principle allows distance precision as low as \(\sim 3\)%. In practice, the errors are larger because of measurement uncertainties, as discussed in Sect. 3.9.3.1.

Consistent with observations, stellar population models predict less scatter in \({\overline{M}}\) at a given color for metallicities similar to those found in massive galaxies (e.g., Blakeslee et al. 2001b; Mei et al. 2005b). At the blue end, galaxies may have low metallicities, younger ages, or a combination of both. At these blue colors, the SBF is more affected by age than metallicity; thus two galaxies with similar optical colors may have significantly different SBF magnitudes if they have had different star formation histories, as discussed by Greco et al. (2021). As a result, the observed scatter in the SBF calibration can be quite large at the blue end, and it becomes difficult to measure individual distances with a precision better than \(\sim 10\%\) due to the calibration effects alone.

Thus, for both measurement and calibration reasons, the ideal target galaxies for the SBF method are red early-type galaxies with no recent star formation and little or no dust. In spite of this, SBF measurements cover practically the entire mass range of galaxies (e.g., Blakeslee et al. 2009; Jensen et al. 2021; van Dokkum et al. 2018; Carlsten et al. 2019) and a wide range of morphologies, including the bulges of spirals (Tonry et al. 2000) and ultra-diffuse galaxies (Blakeslee and Cantiello 2018). As long as there is a clean area of the galaxy without recent star formation, it is possible to derive an SBF distance.

Another consideration in defining an observational sample is that the SBF must be detected to high signal-to-noise (S/N), including the effect of correcting for contaminating sources. This puts a practical limit on the distance to which the SBF measurements can be made. Of course, for cosmologically interesting measurements, the galaxies must be distant enough to be in the Hubble flow (i.e., \(d\gtrsim 50\) Mpc). The depth requirement and the related distance limit depend sensitively on the bandpass, which we discuss next.

3.9.2.2 Choosing the bandpass

The most common photometric bands used for SBF distance measurements in recent years have been iIzJH, and K, spanning the wavelength range from \(\sim 0.8\) to \(\sim 2.2\) \(\upmu \)m (Tonry et al. 2001; Jensen et al. 2003, 2021; Mei et al. 2007; Blakeslee et al. 2009, 2010; Cantiello et al. 2007, 2018a; Biscardi et al. 2008). At shorter wavelengths, the SBF signal is much fainter, and the slope of the \({\overline{M}}\) relation with color tends to be steeper because of increased sensitivity to stellar population effects (e.g. Worthey 1993; Blakeslee et al. 2001b; Cantiello et al. 2003). Of course, dust is also a bigger problem in bluer bands.

The intrinsic scatter of \({\overline{M}}\) as a function of color is as low as \(\sim 0.05\) mag for red galaxies in passbands near 1 \(\upmu \)m (Blakeslee et al. 2009, 2021). The intrinsic dispersion is less well constrained at longer wavelengths, but appears to be closer to 0.1 mag in the H and K bands (Jensen et al. 2003, 2015), likely because of the increased stochastic effect of small numbers of luminous red asymptotic giant branch stars (AGB), the properties of which depend sensitively on population age (Raimondo et al. 2005; Raimondo 2009).

Another issue that is worse in bluer bands (like B or V) is the contamination of the SBF signal by globular clusters host in the galaxy, point-like sources that produce extra variance in the image. In bands where the SBF is fainter, the globular clusters must be identified and removed to fainter magnitudes. Even in the I band, for elliptical galaxies with typical globular cluster frequencies, sources must be detected and masked to \(\lesssim 0.3\) mag of the peak, or “turnover,” of the globular cluster luminosity function (GCLF), in order to decrease the contamination to the \(\sim 20\)% level (Blakeslee and Tonry 1995), which reduces the uncertainty in the correction to \(\sim \)5% (Tonry et al. 1990).

In contrast, with the much brighter fluctuations in the K band, it is only necessary to reach within \(\sim 2\) mag of the GCLF turnover to reduce the contamination to the same level (Jensen et al. 1998). Thus, the stellar population scatter in \({\overline{M}}\) is a much bigger issue than globular cluster contamination for SBF measurements near 2 \(\upmu \)m.

Currently, the most efficient instrumental system available for SBF distances is the F110W (broad J) filter of the WFC3/IR on the Hubble Space Telescope (HST). Using this setup, it is possible to measure distances for early-type galaxies out to 80 Mpc with a median statistical uncertainty of 4% (Blakeslee et al. 2021; Jensen et al. 2021) in only one HST orbit. One of the reasons this system works well is that the near-IR sky background is much fainter from space. For future ground-based surveys, such as the one provided by the Vera Rubin Observatory, we expect that the y band, which covers the red end of the optical spectrum similar to F110W but extends less far into the near-IR, may prove the best choice for SBF measurements. At present, however, we lack data in this band for testing. The following section describes SBF measurements for the preferred targets of early-type galaxies in bands suitable for accurate distance determination.

3.9.3 Measurements

SBF distance determination consists of two parts: measuring the fully corrected apparent SBF magnitude \({\overline{m}}\)  and determining the best value for the absolute \({\overline{M}}\) from a calibration based on distance-independent stellar population properties, typically broadband color.

3.9.3.1 Measuring SBF magnitudes

In the absence of any atmospheric and instrumental blurring and external sources of fluctuation, the SBF signal of a stellar system would simply be the statistical variance due to the varying numbers and luminosities of the stars in each pixel, normalized by the local mean flux. In reality, PSF blurring creates a correlation between adjacent pixels; therefore, the SBF signal is measured in Fourier space by determining the amplitude of the component on the scale of the PSF in the image power spectrum. If the large-scale light distribution of the galaxy is well-subtracted, then the power spectrum will consist mainly of a white noise component and a component convolved with the PSF. There may be additional power at lower wavenumbers (larger scales) due to imperfect galaxy subtraction, or at higher wavenumbers (smaller scales) due to correction of geometric distortion of the image (Mei et al. 2005a; Cantiello et al. 2005), but these wavenumbers can be omitted from the analysis.

The detailed process of measuring SBF magnitudes is described in numerous papers with some variations based on the bandpass and other properties of the data (Blakeslee et al. 1999a, 2009; Cantiello et al. 2005, 2007; Jensen et al. 1998, 2021; Mei et al. 2005a, b). See these papers for details on putting the method into practice. Here, we highlight the main steps of the procedure in order of execution, and the products of each step.

  1. i)

    Galaxy model (Fig. 39, panel (e)): a smooth isophotal model of the galaxy surface brightness after sky subtraction; the resulting model frame corresponds to the first moment of the light distribution.

  2. ii)

    Residual frame (Fig. 39, panel (f)): difference image obtained by subtracting the galaxy surface brightness model (and a low-order fit to the background) from the original sky-subtracted image.

  3. iii)

    Mask frame: mask made by identifying all sources of non-SBF variance (dust, globular clusters, foreground stars, background galaxies, bright satellite galaxies, tidal features, bars, etc.) down to a specified S/N threshold and masking them out.

  4. iv)

    Fluctuation frame: the masked residual frame normalized by the square root of the model frame, used to measure the normalized stellar fluctuations; also contains contaminating fluctuations from unexcised sources fainter than the detection limit, plus white noise resulting from photon counting statistics and detector read noise.

  5. v)

    Power spectrum frame: 2-D Fourier power spectrum of the fluctuation frame, used to derive the SBF amplitude after azimuthal averaging (Fig. 39, panel (i)). Because the stellar fluctuations are convolved with the PSF of the image, in the Fourier domain they are multiplied to the Fourier transform of the PSF (convolved with the window function of the mask, see below).

    Once an accurate PSF template is created from stars in the field and normalized, the fluctuation amplitude is obtained as the constant \(P_0\) in Eq. (131) below. This is obtained by fitting the azimuthally averaged power spectrum of the fluctuation frame, P(k), with the expectation power spectrum E(k). Here E(k) is a convolution of the PSF power spectrum and the mask function with which fluctuation frame was multiplied. In addition, the power spectrum includes a constant white-noise component; thus, the full power spectrum is modeled as:

    $$\begin{aligned} P(k)=P_0{\,\times \,}E(k)+P_1 . \end{aligned}$$
    (131)
  6. vi)

    Correction for background fluctuations (Fig. 39, panel (g)): globular clusters and background galaxies that are too faint for direct detection will remain in the image after masking, and their flux will contribute to the \(P_0\) component of the power spectrum. To correct for this contamination, we calculate the “residual power” \(P_r\) from contaminating sources by extrapolating a fit to the combined GCLF and background galaxy luminosity function. The ability to detect and remove the globular clusters is often the limiting factor in how far SBF distances can be measured. Contamination due to background galaxies is normally much less for giant ellipticals.

  7. vii)

    SBF magnitude: Using the measured fluctuation amplitude \(P_0\) and the estimated contribution from contaminating sources \(P_r\), the stellar fluctuation signal is \(P_f=P_0-P_r\). This corresponds to \(\bar{f}\) in Eq. (129). Thus, converting to the SBF magnitude: \({\overline{m}}=-2.5\log (P_f)+m_\mathrm {ZP}\), where \(m_\mathrm {ZP}\) is the appropriate photometric zero-point magnitude.

3.9.3.2 SBF calibration SBF calibration

To obtain a distance from the measured \({\overline{m}}\), one must adopt an absolute SBF magnitude \({\overline{M}}\) for the stellar population. This can be done using either an empirical calibration or theoretical predictions from stellar population synthesis models. With some exceptions (e.g., Biscardi et al. 2008), the vast majority of published SBF distances rely on empirical calibrations (Tonry et al. 2001; Blakeslee et al. 2001a, 2009; Cantiello et al. 2018a; Jensen et al. 2003, 2021).

The ground-based SBF survey by Tonry and collaborators (Tonry 1997; Tonry et al. 2001; Blakeslee et al. 1999b) measured I-band SBF magnitudes \({\overline{m}}_I\) and \(V{-}I\) colors for 300 galaxies out to about 40 Mpc and derived the first high-quality empirical SBF calibration. To do this, Tonry (1997) plotted \({\overline{m}}_I\) as a function of \(V{-}I\) for nearby groups and clusters, determining a single linear slope for the color dependence of \({\overline{m}}_I\). The zero-point of the calibration was then determined from SBF measurements in the bulges of six spiral galaxies that also had distances measured from Cepheids (Tonry et al. 2000). This was revised slightly by Blakeslee et al. (2002) using a recalibrated set of Cepheid distances from Freedman et al. (2001). The resulting linear calibration fully specified \({\overline{M}}_I\) as a function of \(V{-}I\). The intrinsic scatter about this relation was estimated to be of order 0.05 mag, although it was fairly uncertain because the median statistical error on the distances was roughly four times larger.

The same basic approach, with some variations, has been used to derive empirical \({\overline{m}}\)-color calibrations for the SBF method in V (Blakeslee et al. 2001b), K (Jensen et al. 2003), ACS/F850LP (Mei et al. 2007; Blakeslee et al. 2009), WFC3/F110W (Jensen et al. 2015), and i (Cantiello et al. 2018a). Higher-order polynomials were used for the ACS and WFC3 calibrations, while Cantiello et al. (2018a) presented calibrations that combined two color indices, rather than just one as in previous cases.

In general, the empirical approach works well and the resulting calibrations agree with theoretical predictions within the uncertainties. The weak point remains the distance zero-point, which is tied to the Cepheid distance scale via measurements in spiral galaxies, which are not ideal targets for the SBF method. However, alternative zero-point calibrations based on the tip of the red giant branch (TRGB) have also been presented (Mould and Sakai 2009; Blakeslee et al. 2021), and these agree well with the Cepheid-based calibration. Fully theoretical calibrations of \({\overline{M}}\) versus color do not rely on other distance indicators, and thus do not carry systematic uncertainties from Cepheids or other primary distance indicators. However, the often poor agreement among different sets of models shows that theoretical calibrations still carry substantial systematic uncertainties, especially in the near-IR bands (e.g., Jensen et al. 2015), which are observationally most promising for future SBF studies.

As a final remark, we note that since the empirical \({\overline{M}}\) calibrations are parameterized by photometric color, precise measurements of the galaxy colors are required for high-quality distance estimates. Thus, great care must be dedicated to observational details such as photometric calibration, flat-fielding, sky subtraction, etc.

3.9.3.3 Statistical uncertainties

Before moving on to systematic effects, we summarize the statistical uncertainties in SBF distance measurements. These can be grouped into three categories: i) random errors in the photometric calibration, ii) errors in the measurement of the fluctuations themselves, including the corrections for background contamination, and iii) random uncertainty in \({\overline{M}}\) resulting from stellar population effects and errors in the galaxy color estimate.

The first category includes effects such as flat-fielding, background estimation, uncertainty in the galactic extinction, and uncertainties in the photometric zero-point, that are not specific to the SBF method. These effects are typically at the 1% level, but care must be taken to account for them in a consistent way, as they may contribute to various parts of the SBF measurement process. For example, extinction uncertainty affects both \({\overline{m}}\) and the color estimate used for determining \({\overline{M}}\).

Factors contributing to statistical uncertainties in the SBF measurement include: the accuracy of the galaxy surface brightness model, the fit to the image power spectrum to determine \(P_0\), the extrapolation of the luminosity function fit to estimate the \(P_r\) term from contaminating sources (the error is typically 20-25% of the \(P_r\) correction itself), and the match of the PSF template to the data being analyzed. These errors can be minimized by optimizing the observing (including instrument, exposure time, and bandpass) and data analysis strategies. As shown in several works (e.g., Blakeslee et al. 2009, 2010; Cantiello et al. 2018a; Jensen et al. 2021), the total statistical uncertainty on \({\overline{m}}\) can be kept as low as 0.04 to 0.05 mag.

Concerning random errors in \({\overline{M}}\), if the images have high S/N and are in the same bandpasses used for the SBF calibration so that no photometric transformation is needed, then the error in \({\overline{M}}\) due to the color uncertainty can be kept to the \(\sim 0.01\) mag level. In this case, the random error in \({\overline{M}}\) is dominated by intrinsic scatter in the calibration due to stellar population effects. In the Iz,  and J bands, this scatter is estimated to be 0.05 to 0.06 mag (e.g., Tonry 1997; Blakeslee et al. 2009; Cantiello et al. 2018b).

In summary, measurement uncertainties in well-designed SBF observations can be reduced to the \(\sim \,\)0.05 mag level. If these observations are of red galaxies in a well-suited bandpass near 1 \(\upmu \)m, the intrinsic scatter in the calibration relation will be at a similar level. Combining these two sources of error gives a total statistical error as low as \(\sim 0.07\) mag, or about 3.3% in distance per galaxy, although \(\sim \,\)4% is more typical for the median statistical error in well-designed SBF distance samples (e.g., Blakeslee et al. 2021; Jensen et al. 2021).

3.9.4 Systematic effects

The dominant systematic uncertainty affecting all SBF distances is the zero-point of the absolute \({\overline{M}}\) calibration. This zero-point was determined by comparing ground-based I-band SBF magnitudes for the bulges of spiral galaxies with measured Cepheid distances (Tonry 1997; Tonry et al. 2000; Ajhar et al. 2001; Blakeslee et al. 2002). In most cases, the SBF zero-points in other bands have been set by tying the measurements to the I-band SBF distances (e.g., Mei et al. 2007; Blakeslee et al. 2009; Jensen et al. 2015; Cantiello et al. 2018a).

The most recent analysis of the systematic uncertainty in SBF distances was by Blakeslee et al. (2021), who revised the zero point to account for the improved LMC distance determined to \(\sim \,\)1% precision by Pietrzyński et al. (2019). They concluded that the zero-point uncertainty in the Cepheid-calibrated SBF distances in the WFC3/F110W band (the most useful for constraining \(H_0\)) is 0.09 mag, or 4.2% in distance. This is larger than the typical HST SBF measurement error.

Since the SBF method works best for early-type galaxies with old stellar populations, and these do not contain the young Cepheid stars (see Sect. 3.9.2 above), it is worth exploring other means for calibrating SBF. The TRGB method is ideal for measuring distances of early-type galaxies and obtaining an independent calibration of SBF. Like Cepheids, it is possible to calibrate the TRGB method with geometric distances from Gaia (e.g., Soltis et al. 2021), but unlike with Cepheids, the stellar population underlying both SBF and the TRGB is the same, i.e., old low-mass stars.

A first attempt to calibrate SBF with TRGB (Mould and Sakai 2009) used a sample of 16 galaxies within 10 Mpc dominated by relatively blue dwarf galaxies and found negligible change with respect to the Cepheid calibration, but the distance uncertainties were large and the colors did not extend to the range occupied by massive red ellipticals, the preferred SBF targets at large distances. More recently, Blakeslee et al. (2021) rederived the SBF zero-point using the few TRGB distances available for massive early-type galaxies. They concluded that the mean offset between the Cepheid and TRGB calibrations of SBF was \(0.01\pm 0.10\) mag. Because these two calibrations were independent and consistent, they could be combined to improve the precision on the SBF zero-point; this reduces the systematic error in the SBF distances to just over 3%.

Another potential systematic effect comes from SBF k-corrections for galaxies at larger distances. These must be estimated from stellar population models. Based on the model calculations by Liu et al. (2000), Jensen et al. (2021) estimated the SBF k-corrections in F110W to be less than 0.01 mag at 100 Mpc, the limit of their sample. Thus, k-corrections are not currently a significant problem, but they could become more important for future studies. We come back to this issue in Sect. 3.9.5.2.

In conclusion, the systematic uncertainty on SBF distances is slightly larger than 4% when based solely on Cepheids. Combining the best current Cepheid and TRGB calibrations for SBF, the systematic error in distance drops to about 3%. Ultimately, the TRGB method should provide much better precision because it can be used in the same type of galaxies, giant ellipticals, which are best for SBF measurements, while Cepheids only occur in galaxies that are inherently problematic for the SBF method. Blakeslee et al. (2021) estimated that with a sample of \(\sim \,\)15 giant ellipticals having both high-quality SBF and TRGB distances, it would be possible to reduce the systematic uncertainty in the SBF zero-point to the 2% level, including the uncertainty in the TRGB absolute magnitude calibration, which should soon approach the 1% level, thanks to Gaia. Such an overlapping sample of SBF and TRGB distances to giant ellipticals becomes feasible with the advent of JWST.

3.9.5 Main results and forecasts

3.9.5.1 Main results

The SBF method has been in use for several decades. About 600 independent SBF distances (for \(\sim \,\)400 distinct galaxies) have been measured from the Local group to \(\sim \,\)130 Mpc. Samples with at least 20 galaxies include: Tonry et al. (2001), Jensen et al. (2003), Jensen et al. (2021), Mieske et al. (2005), Mieske et al. (2006), Blakeslee et al. (2009), Cantiello et al. (2018a), Cohen et al. (2018), Carlsten et al. (2019). Soon, there will be another \(\gtrsim 200\) from the Next Generation Virgo Survey (Cantiello et al., 2022, in prep.). Although the method is capable of high precision, the quality of published SBF distances is quite heterogeneous, with errors typically 4–5% for HST measurements, 10% for ground-based data on giant ellipticals, and up to 30% for some dwarfs galaxies.

SBF distances have been used to map the velocity field of the local Universe (Tonry et al. 2000) and constrain the cosmic mass density (Blakeslee et al. 1999b). Although the ground-based samples used for these studies were variable in quality and only extended to 40 Mpc, the results agree well with modern analyses. More recently, SBF measurements have been used to probe the structure of nearby clusters (Mei et al. 2007; Blakeslee et al. 2009; Cantiello et al. 2018a), estimate supermassive black hole masses (Event Horizon Telescope Collaboration et al. 2019; Nguyen et al. 2020; Liepold et al. 2020), investigate satellite galaxy systems (Cohen et al. 2018; Carlsten et al. 2019), confirm the lack of dark matter in some ultra-diffuse galaxies (van Dokkum et al. 2018; Blakeslee and Cantiello 2018), and measure the most precise distance to the host galaxy of the binary neutron star merger event GW170817 (Cantiello et al. 2018b). SBF has also been used for various determinations of the Hubble constant, \(H_0\) (Tonry et al. 2000; Blakeslee et al. 1999b, 2002; Jensen et al. 2001; Biscardi et al. 2008). Here we focus on two recent \(H_0\) studies.

Khetan et al. (2021), presented a recalibration of the peak magnitudes of 24 local SNe Ia using a heterogeneous sample of ground and space-based SBF distances from the literature. Adopting a hierarchical Bayesian approach, the authors then extended the calibration to a sample of 96 SNe Ia at redshifts \(0.02< z< 0.08\) and derived \(H_0 = 70.5\pm 2.4\,(\textit{stat}) \pm 3.4\,(\textit{sys})\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\). Note that in this case, SBF is used as an intermediate rung in the distance ladder, between Cepheids and SNe Ia, rather than constraining \(H_0\) directly. When updated for consistency with the improved LMC distance from Pietrzyński et al. (2019), the result becomes \(H_0 = 71.2\pm 2.4\pm 3.4\) \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\).

Blakeslee et al. (2021), using the homogeneous sample of 63 SBF distances measured by Jensen et al. (2021) for bright, mainly early-type, galaxies out to 100 Mpc observed with the F110W filter of HST’s WFC3/IR, derived \(H_0 = 73.3 \pm 0.7\pm 2.4\) km s\(^{-1}\) Mpc\(^{-1}\). The systematic (second) error mainly represents the SBF zero-point uncertainty after combining the Cepheid and TRGB calibrations. Because peculiar velocities can have an important impact over this distance range, Blakeslee et al. (2021) tested four different treatments of the galaxy velocities, including two large-scale flow models, and included this effect in the systematic error estimate. Figure 40 shows example Hubble diagrams from the study. The observed scatter in the Hubble diagram is consistent with the combined uncertainties from the SBF distances and the corrected recessional velocities.

The \(H_0\) result by Blakeslee et al. (2021) agrees well with most other local measurements and with Khetan et al. (2021) to within \(1\sigma \). It disagrees by more than \(2\sigma \) with the value of \(H_0\) based on the cosmic microwave background, assuming the standard \(\Lambda \)CDM model (Planck Collaboration et al. 2020), reinforcing the tension. More WFC3/IR SBF distances are being obtained by ongoing HST programs; these will improve the constraints on the velocity model and further reduce the uncertainties on \(H_0\).

3.9.5.2 Forecasts

The outlook for SBF is bright for several reasons: the next generation of wide-field survey telescopes will produce imaging data suitable for SBF measurements; JWST and the AO-assisted ELT facilities will allow the method to be pushed to unprecedented distances; and new samples of TRGB distances, tied to Gaia parallaxes, will drastically reduce the systematic uncertainty in the SBF zero-point calibration. Section 3.9.4 already discussed the expected zero-point improvement from the TRGB calibration. Here we comment on the other two anticipated opportunities for SBF studies.

Wide-field surveys. Forthcoming large sky surveys, such as the Vera Rubin Observatory (LSST Science Collaboration et al. 2009) and the Euclid Wide Survey (Laureijs et al. 2011), will produce breakthroughs in many fields of astronomy, including excellent opportunities to use SBF to map the spatial distribution of galaxies in the low-redshift Universe. The detailed simulations by Greco et al. (2021) indicate that Rubin will produce an unprecedented dataset for SBF studies. The multi-band ugrizy Rubin dataset, with typical seeing of 0.7\(^{\prime \prime }\), and final \(5\sigma \) point source depth of \(i_{5\sigma }\sim 26.8\) mag, will make it possible to measure SBF distances with 10% accuracy out to at least 70 Mpc, twice as far as the limit of the ground-based SBF survey of Tonry et al. (2001).

The Euclid satelliteFootnote 19 has one-fourth the collecting area of HST but, compared to Rubin, it has the advantage of near-IR coverage and a sharp (\(\sim \)0.2\(^{\prime \prime }\)), stable PSF. Taking as reference the Euclid/NISP H band, with a predicted \(5\sigma \) point source depth of \(H_{5\sigma }\sim 24\) mag, Euclid should enable SBF distances for all suitable galaxies out to at least half the distance as Rubin (\(\sim \)30–40 Mpc), and possibly more. Another future wide-field mission of enormous interest for SBF is the Nancy Grace Roman Space TelescopeFootnote 20 (Spergel et al. 2015). It will have the same aperture as HST and similar resolution, but \(\sim 100\) times the field of view and better IR sensitivity. With a \(5\sigma \) point-source depth of 28 mag in 1 hr in the J and H bands, Roman will deliver phenomenal survey depth and breadth, making it the ultimate machine for producing SBF distances. More detailed simulations are needed to quantitatively refine the expectations for both Euclid and Roman.

Fig. 40
figure 40

Image reproduced with permission from Blakeslee et al. (2021), copyright by the authors

Hubble diagrams and residuals from Blakeslee et al. (2021) based on Cepheid-calibrated WFC3/IR SBF distances tabulated by Jensen et al. (2021). The velocities are group-averaged values in the cosmic microwave background rest frame without correction for peculiar motions (left) and corrected using the 2 M++ flow model derived from the redshift-space density field analysis of Carrick et al. (2015). Solid symbols indicate “clean” galaxies, for which no dust or spiral structure is evident; open symbols are for galaxies with obvious dust and/or spiral structure. The best-fit Hubble constants are indicated, and the statistical and systematic error ranges are shown in dark and light gray, respectively. The reduced \(\chi ^2\) improves from 0.97 for the fit on the left to 0.89 for the fit using the flow-model, but the value of \(H_0\) depends on the overall velocity scale factor, and the study adopted the model-independent version for this. \(H_0\) would increase by 0.3% for the TRGB-based SBF calibration.

Going deeper. JWST (Gardner et al. 2006) and the forthcoming 30–40 m class of Extremely Large Telescopes will have near-IR imaging capabilities far exceeding that of HST. As discussed above, JWST should greatly improve the SBF zero-point calibration by enabling much more extensive direct comparisons of SBF and Gaia-calibrated TRGB distances in giant ellipticals. This will significantly reduce the systematic uncertainty in \(H_0\), making SBF competitive with SNe Ia in this area.

In addition, these new facilities will make it feasible to go far beyond the previous limit of 100–150 Mpc achieved with HST (Jensen et al. 2001, 2021; Biscardi et al. 2008). With its sharper (FWHM\(\sim \)0.1\(^{\prime \prime }\)), better sampled PSF in the near-IR and \(\sim 7\) times the collecting area of HST, JWST should enable SBF distance measurements out to \(\sim \)300 Mpc. As always, the limiting factors will be contamination from globular clusters, and a newly significant consideration will be the quality of the k-corrections derived from stellar population models (Sect. 3.9.4). Further work is needed on this issue.

The ELTs hold even greater promise for pushing SBF to unprecedented depths, potentially out to \(z\sim 0.1\), and perhaps even directly probing cosmic acceleration and dark energy as a complement to SNe Ia. However, this depends critically on the ability to measure precise and reliable SBF magnitudes using adaptive optics (AO). Although some studies have been made of this topic (Gouliermis et al. 2005; Jensen 2012), quantitative demonstrations of AO-assisted SBF measurements are lacking. Further work, using actual AO data, is much needed, and again k-corrections will be an important ingredient in deriving accurate calibrations. We appeal to the stellar population modelers of the world to dedicate some effort to this important problem.

3.10 Stellar ages

The expansion rate of the Universe determines the look-back time. This opens up the possibility to use time (or age) measurements to constrain the background parameters of the cosmological model. The cosmic chronometers method (see Sect. 3.1) uses relative ages to determine H(z), but absolute ages can also be used in a complementary way. In fact, historically, absolute ages were used already in the 1950s and more extensively in the 1990s (see below) to impose competitive (then) constraint on the cosmological model. While the age of any old astrophysical object could in principle serve the purpose to constrain the age of the Universe, historically stellar ages have been a promising avenue as they can be determined with precision and accuracy to date much superior than for any other type objects.

3.10.1 Basic idea and equations

The look-back time t as function of redshift is given by:

$$\begin{aligned} t(z) =\frac{977.8}{H_0}\int _0^z \frac{\mathrm{d}z^\prime }{(1+z^\prime )E(z')}\,\mathrm{Gyr} , \end{aligned}$$
(132)

with \(E(z)\equiv H(z)/H_0\) and H(z) in \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\). Following Eq. (132), the age of the Universe is \(t_{\mathrm{U}}\equiv t(\infty )\). We show the dependence of \(t_{\mathrm{U}}\) on \(H_0\), \(\varOmega _{\mathrm{m}}\) and a constant EoS parameter w for dark energy in a wCDM model in Fig. 41. It is evident that the strongest dependence is on \(H_0\), while \(\varOmega _{\mathrm{m}}\) and w have less influence.

The integral in Eq. (132) is dominated by contributions from redshifts below few tens, decreasing as z grows. Therefore, any exotic pre-recombination physics does not significantly affect the age of the Universe. On the other hand, E(z) is bound to be very close to that of a CMB-calibrated \(\Lambda \)CDM model at \(z\lesssim 2.4\), as shown in Bernal et al. (2021). Hence, a precise and robust determination of \(t_{\mathrm{U}}\) which does not significantly rely on a cosmological model, in combination with BAO and SNe Ia, may weigh in on proposed solutions to the \(H_0\) tension. If an independent (and model-agnostic) determination of \(t_{\mathrm{U}}\) were to coincide with Planck’s inferred value assuming \(\Lambda \)CDM, alternative models involving exotic physics relevant only in the early Universe would need to invoke additional modifications also of the late-Universe expansion history to reproduce all observations as their prediction for \(t_{\mathrm{U}}\) would be too low.

Fig. 41
figure 41

Image reproduced with permission from Bernal et al. (2021), copyright by APS

Age of the Universe (in Gyr) as function of \(H_0\) and \(\varOmega _{\mathrm{m}}\) for \(w=-1\) (left panel), \(H_0\) and w for \(\varOmega _{\mathrm{m}}\)=0.3138 (central panel), and \(\varOmega _{\mathrm{m}}\) and w for \(h=0.6736\) (right panel). When a parameter is not varied, it is fixed to Planck18 \(\Lambda \)CDM best-fit value (Planck Collaboration et al. 2020). White lines mark contours with constant value of \(t_{\mathrm{U}}\).

The color-magnitude diagram (CMD) of co-eval stellar populations in the Milky Way, or any other nearby galaxies where this is observationally possible, can be used to infer the age of its oldest stars. The age can also be estimated for individual stars if their metallicity and the distance are known. For resolved stellar populations, however, an independent measurement of the distance is not strictly necessary as the full morphology of the color-magnitude diagram can, in principle, provide a determination of the absolute age. There is extensive literature on the dating of stellar populations; reviews can be found in, e.g., Catelan (2018), Soderblom (2010), Vandenberg et al. (1996). In this section, we will focus on the most recent developments in the field. Ages of stars can also be computed via nucleo-cosmochronology (see, e.g., Christlieb 2016), which consists in measuring global abundances of radioactive elements like Uranium and Thorium to estimate the age of the parent star. Another method is to use the cooling luminosity function of white dwarfs (see, e.g., Catelan 2018, and references therein for the current status of these methods); while useful, they are not still at the accuracy level of stellar ages measured via the observed color-magnitude diagram, and we will not discuss them further. We will instead focus on the use of the color-magnitude diagram on Globular Clusters (GCs) as new developments are providing stellar ages at the few % level accuracy.

The first quantitative attempt to compute the age of the globular cluster M3 was made by Haselgrove and Hoyle more than 60 years ago (Haselgrove and Hoyle 1956). In this work, stellar models were computed on the early Cambridge mainframe computer and its results compared “by eye” to the observed color-magnitude diagram. A few stellar phases were computed by solving the equations of stellar structure; this output was compared to observations. Their estimated age for M3 is only 50% off from its current value.Footnote 21 This was the first true attempt to use computer models to fit resolved stellar populations and thus obtain cosmological parameters: the age of the Universe in this case. Previous estimates of the ages of GCs involved just analytic calculations, which significantly impacted the accuracy of the results, given the complexity of the stellar structure equations (see e.g., Sandage and Schwarzschild 1952).

Historically, the age of the oldest stellar populations in the Milky Way has been measured using the luminosity of the Main-Sequence Turn-Off Point (MSTOP) in the color-magnitude diagram of GCs. In this way, however, the full richness of information contained in the whole color magnitude diagram is discarded, and only one point kept. While it is true that the MSTOP contains significant information about the age of the stellar population, other parts of the CMD diagram do as well, especially around the sub-giant branch and the main sequence below the MSTOP; this is crucial to break degeneracies with distance and metallicity content (see Fig. 42).

Fig. 42
figure 42

Image reproduced with permission from Valcin et al. (2020), copyright by IOP & SISSA

Dependence of the stellar isochrone on variations of age, metallicity and [\(\alpha \)/Fe] of the GC with all other parameters fixed. Right panels show the relative difference in color.

Globular clusters are (almost, more on this below) single stellar populations of stars (see, e.g., Vandenberg et al. 1996). It has long been recognized that they are among the most metal poor (\(\sim \)1% of the solar metallicity) stellar systems in the Milky Way, and exhibit color-magnitude diagrams characteristic of old (> 10 Gyr) stellar populations (O’Malley et al. 2017; Catelan 2018; Vandenberg et al. 1996).

Of great interest is the fact that determination of stellar ages in the 1990s provided one of the first hints that the dominant cosmological model at the time (an Einstein-de-Sitter Universe) needed revision (see, e.g., Ostriker and Steinhardt 1995; Jimenez et al. 1996; Spinrad et al. 1997). Old stellar populations were determined to be older than \(1/H_0\), the age of the Universe in that model (see, e.g. Jimenez et al. 1996). Of course, the age of stellar objects at \(z=0\) is just a lower limit to the age of the Universe and, by itself, does not constrain the cosmological model, as changes in \(H_0\) and \(\varOmega _{\mathrm{m}}\) can accommodate an Einstein-de-Sitter Universe.

In the past, in order to break this degeneracy, a determination of the stellar ages of the oldest galaxies at \(z \gg 0\) proved crucial. This was first achieved by Dunlop et al. (1996). It is revealing to see Fig. 18 in Spinrad et al. (1997), which shows the exclusion of the Einstein-de-Sitter Universe once the ages of GCs are taken into account. This philosophy has been further developed in the cosmic chronometer method, with the first cosmological-model-independent determination of the redshift evolution of the Hubble parameter, H(z) (see Sect. 3.1 and references therein).

The determination of the absolute age of a GC inferred using only the MSTOP luminosity is degenerate with other properties of the GC. As already shown in the pioneering work of Haselgrove and Hoyle (1956), the distance uncertainty to the GC entails the largest contribution to the error budget: a given % level of relative uncertainty in the distance determination involves roughly the same level of uncertainty in the inference of the age. Other sources of uncertainty are: the metallicity content, the Helium fraction, the dust absorption (Vandenberg et al. 1996), and theoretical systematics regarding the physics and modeling of stellar evolution.

However, there is more information enclosed in the full-color magnitude diagram of a GC than that enclosed in its MSTOP. As first pointed out in Jimenez and Padoan (1996), Padoan and Jimenez (1997), the full color-magnitude diagram has features that allow for a joint fit of the distance scale and the age (see Fig. 42 for a visual rendering of this). On the one hand, Fig. 2 in Jimenez and Padoan (1998) shows how the different portions of the color-magnitude diagram constrain the corresponding physical quantities. Figure 1 in Padoan and Jimenez (1997) and Figure 3 in Jimenez and Padoan (1998) show how the luminosity function is not a pure power-law, but has features that contain information about the different physical parameters of the GC. This technique enabled the estimation of the ages of the GCs M68 (Jimenez and Padoan 1996), M5 and M55 (Jimenez and Padoan 1998). Moreover, in principle, exploiting the morphology of the horizontal branch makes it possible to determine the ages of GCs independently of the distance (Jimenez et al. 1996).

Further, on the observational front, the gathering of Hubble Space Telescope (HST) photometry for a significant sample of galactic GCs has been a game changer. HST has provided very accurate photometry with a very compact point spread function, thus easing the problems of crowding when attempting to extract the color-magnitude diagram for a GC and making it much easier to control contamination from foreground and background field stars.

For these reasons, a precise and robust determination of the age of a GC requires a global fit of all these quantities from the full color-magnitude diagram of the cluster. In order to exploit this information, and due to degeneracies among GC parameters, a suitable statistical approach is needed. Bayesian techniques, which have recently become the workhorse of cosmological parameter inference, are of particular interest. In the perspective of possibly using the estimated age of the oldest stellar populations in a cosmological context as a route to constrain the age of the Universe, it is of value to adopt Bayesian techniques in this context too.

There are only a few recent attempts at using Bayesian techniques to fit GCs’ color-magnitude diagrams, albeit only using some of their features (see, e.g., Wagner-Kaiser et al. 2017). Other attempts to use Bayesian techniques to age-date individual stars from the GAIA catalog can be found in Sahlholdt et al. (2019). A limitation of the methodology presented in Wagner-Kaiser et al. (2017) is the large number of parameters needed in the likelihood. Actually, for a GC of \(N _\mathrm{stars}\) there are, in principle, \(4 \times N_{\mathrm{stars}} + 5\) model’s parameters (effectively \(3 \times N_{\mathrm{stars}} + 5\)), where the variables for each star are: initial stellar mass, photometry, ratio of secondary to primary initial stellar masses (fixed to 0 in Wagner-Kaiser et al. 2017), and cluster membership indicator. In addition, there are 5 (4) additional GC variables, namely: age, metallicity (fixed in the analysis of Wagner-Kaiser et al. 2017), distance modulus, absorption, and Helium fraction. For a cluster of 10,000 or more stars, the computational cost of this approach is very high. To overcome this issue, Wagner-Kaiser et al. (2017) randomly selected a sub-sample of 3,000 stars, half above and half below the MSTOP of the cluster, “to ensure a reasonable sample of stars on the sub-giant and red-giant branches”. Another difficulty arises from the fact that the cluster membership indicator variable can take only the value of 0 or 1 (i.e., whether a star belongs to the cluster or not). This creates a sample of two populations referred to as a finite mixture distributions (Wagner-Kaiser et al. 2017).

Recently, a Bayesian analysis of the GC CMD using all features in it has been carried out by Valcin et al. (2020), Valcin et al. (2021). This has resulted in the join determination of ages, metallicities and distances for 68 GCs observed by the HST/ACS project. The main advantage of the Valcin et al. (2020), Valcin et al. (2021) approach is that by using all features in the CMD diagram it is possible not only to obtain ages with smaller uncertainties, but also remove some of the systematic uncertainties (Valcin et al. 2021).

3.10.2 Sample selection

To obtain a lower limit to the age of the Universe one needs to select the objects hosting the oldest stars. This in itself is an obvious circular argument as we will only know which stars are the oldest after having measured their age. The most useful approach is to select those GCs with the lowest metallicity, as they will likely be the first formed in the Universe. In reality, since the Milky Way only contains a couple of hundred of GCs, the most natural approach would be to just compute the age for all of them and then select.

This procedure can be also applied to other stellar clusters, like open clusters, but these ones always tend to be significantly younger than GCs.

To measure the ages of stars in GCs the sample selection is fairly straightforward. One selects the stars that belong to the globular cluster. The best procedure to do this is to plot individual stars in the color-magnitude diagram to identify the locus of cluster members. While there are technicalities involved in computing photometry in crowded fields and how to identify cluster members, care need to be taken but these are issues well under-control. Indeed, we may already have all data needed as almost all known globular clusters in the Milky Way are known. it would be useful to obtained resolved stellar populations of GCs in other galaxies, like Andromeda; this is something that JWST may achieve in the near future. The most important revolution will come from using full sky surveys to measure ages of stars systematically. For now, suitable observations for a representative sub-sample of 68 GC are available (Valcin et al. 2020, 2021).

3.10.3 Measurements

Accurate photometry is the main requirement for obtaining color-magnitude diagrams of GCs. In addition, it would be desirable to obtain as much spectroscopy as possible from the resolved stars, as this would help reduce the reliance on Bayesian priors on metallicity.

Of course, good data require outstanding analysis tools. Simply fitting the MSTOP does not do justice to the data, as this discards precious information on other parameters besides the age of the GC. The recent use of fully Bayesian techniques (like, e.g., in Valcin et al. 2020) shows that there is more information in the CMD. Future uses of likelihood-free inference can further extract all information from the CMD.

3.10.4 Systematic effects

Systematics are the main source of uncertainties when obtaining the absolute age of GCs; note, however, that relative ages are less prone to systematic uncertainties. Systematic uncertainties are dominated by uncertainties in stellar evolution and distance determination to the GC. The best study of systematic uncertainties in the age determination of GCs is the work by Chaboyer and collaborators (see, e.g., O’Malley et al. 2017). They are mostly arising from three sources: uncertainties in i) nuclear reaction rates, ii)in the modeling of convection in the outer layers of low mass stars, and iiii) in the estimation of the distance to the GC. These are the same systematic uncertainties that affect the age determination obtained using only the MSTOP, but are ameliorated when using the full morphology of the color-magnitude diagram (see Valcin et al. 2021). In particular, both distance and uncertainties in convection of the star’s outer layers can be significantly reduced when using the full CMD. In addition, independent measurements obtained by the Gaia space mission will drastically reduce the uncertainty on distance. It is worth noting that the uncertainties concerning stellar nuclear rates could be greatly reduced by producing better theoretical computations (see also Boylan-Kolchin and Weisz 2021).

Another source of (systematic) error is the uncertainty in \(z_\mathrm{form}\) to infer the the age of the Universe from the age of the star; this was addressed in detail in Jimenez et al. (2019). The determination of \(z_{\mathrm{form}}\) will be improved dramatically by the upcoming observations from JWST (Gardner et al. 2006) which will conclusively map the mass function of objects at \(10<z<20\).

3.10.5 Main results and forecasts

The most recent determinations of the ages of GCs using the MSTOP and full CMD Bayesian method are shown in Figs. 43 and 44. In Fig. 43, taken from Jimenez et al. (2019) ages determinations from the literature were used, including the ages of individual stars,and CMB-derived age. In Fig. 44 the age of the Universe is computed using the method of Valcin et al. (2020). Despite the very different observables and approaches, there is good agreement among all the age determinations.

Fig. 43
figure 43

Image reproduced with permission from Jimenez et al. (2019), copyright by IOP & SISSA

Probability distribution for the age of the Universe obtained using stellar ages (thin set of lines) and derived by Planck18 from the CMB assuming the \(\Lambda \)CDM model (thick solid, Planck Collaboration et al. 2020). All determinations are in good agreement. Just as an example of what kind of accuracy could be obtained if systematic uncertainties were all under control, the inset shows the age of the Universe for the formal determination and formal uncertainty of J18082002-5104378, which is fully compatible with Planck18. The formal GCs ages for 69 ACS clusters from O’Malley et al. (2017) would look similar to the J18082002-5104378 line.

Fig. 44
figure 44

Image reproduced with permission from Valcin et al. (2020), copyright by IOP & SISSA

Age distribution for globular clusters using the Bayesian method of Valcin et al. (2020) when using the full CMD with different metallicity cuts. The behavior is consistent with the expected age-metallicity relation. Only the statistical uncertainty is displayed. An additional uncertainty of 0.25 Gyr (Valcin et al. 2021) at 68% confidence level needs to be added to account for the systematic uncertainty.

Stellar ages have proven to be also extremely useful to unveil the nature of the Hubble tension, as done in Vagnozzi et al. (2022), Krishnan et al. (2021). A summary of how the absolute age \(t_U\) determination can weigh in on the current “debate” on the expansion rate of the Universe is shown in Fig. 45, and further elaborated in Bernal et al. (2021). There are two independent physical quantities (\(H_0\), \(t_U\)), but three quantities are measured independently: \(t_U\) from absolute ages, \(H_0\) from cosmic distance ladder, and, in a \(\Lambda \)CDM model, \(H_0t_U\) from standard rules and candles (BAO and SNe Ia), and a combination of constraints on \(t_U\) and \(H_0\) from CMB. This is an over-constrained system which can be be represented on a triad plot (see Bernal et al. 2021, referred to as “new cosmic triangle”) such that \(\log t_U+\log H_0\equiv \log (H0 t_U)\), Fig. 45. While BAO, SNe Ia and CMB inferences depend on the cosmological model, the stellar ages ones (and the distance ladder ones) are cosmology-independent. It is interesting to note that GC ages, CMB, and BAO+SNe determinations agree, indicating that a \(\Lambda \)CDM-like expansion history is a good fit to the data, in the redshift window heavily weighed by these data.

Fig. 45
figure 45

Image reproduced with permission from Bernal et al. (2021), copyright by APS

Triad plot “new cosmic triangle”: 68% confidence level marginalized constraints on the new cosmic triangle: the triad corresponding to the age of the Universe and the Hubble constant (upper left) is shown. Note that all points in the figure sum up to 0, while the ticks in the axes determine the direction of equal values for each axis. Note that absolute ages of GCs are consistent with the model-dependent Planck18 value.

The future for more accurate GC ages lies in two fronts: reducing the systematic uncertainty in stellar modeling by constructing improved models and reducing the width of priors used for metallicity and distances by resorting to additional, complementary observations. Direct distances from Gaia (Gaia Collaboration et al. 2016) are particularly promising. Especially useful will be the final Gaia data release which will provide % or sub-% direct (parallax based) distances to GCs. This will tremendously narrow the adopted distance prior range. Another important ingredient will be the direct spectroscopic determination of chemical abundances in individual stars in GCs, specially below the MSTOP. The JWST telescope (Gardner et al. 2006) will enable enormous progress in these two directions. If these two priors are constrained at the % level from direct observations, then the only remaining systematic uncertainty will be that from constructing the stellar models. As shown in Valcin et al. (2021), when using the full CMD, the dominant uncertainty left in stellar models is the one due to nuclear reaction rates which can in principle be improved by a combination of laboratory and theory efforts.

3.11 Secular redshift drift

Any non-empty universe will exhibit an accelerating or decelerating Hubble expansion, which can be observed as a secular redshift drift. Sandage (1962) first proposed observing this effect in the optical spectra of galaxies to measure the cosmic deceleration. Loeb (1998) later suggested using the neutral hydrogen Lyman \(\alpha \) forest of absorption lines toward quasars, and this concept has been developed as a key science case for large optical telescopes (e.g., Corasaniti et al. 2007; Liske et al. 2008). Large radio telescopes may likewise probe the redshift drift using neutral hydrogen via the 21-cm emission line from galaxy surveys or using Hi 21-cm absorption toward quasars (e.g., Darling 2012; Yu et al. 2014; Kloeckner et al. 2015). Measurements require exquisite, repeatable, long-term wavelength calibration that will most likely rely on a stable local oscillator in both the optical and radio wavelength regimes.

The secular redshift drift is a means to directly observe the cosmic acceleration that does not rely on models, standard candles, standard rulers, or the cosmological distance ladder. It is capable of directly testing standard dark energy cosmology and can be used as a probe of cosmological inhomogeneities and thus test the FLRW paradigm and general anisotropic models (e.g., Quartin and Amendola 2010). However, the signal is so small (of order \(H_0\) \(\varDelta t\), where \(\varDelta t\) is the duration of observation) that it is unlikely to provide competitive constraints on cosmological parameters in an era of precision cosmology. For example, Alves et al. (2019) predict that a 40-m class optical telescope Ly\(\alpha \) forest program combined with an Hi 21-cm emission line survey and Hi 21-cm absorption line monitoring can provide independent constraints on \(H_0\), \(\varOmega _{\mathrm{m}}\), and \(w_0\) of order 19%, 7%, and 13%, respectively, in a flat wCDM model (marginalized 1\(\sigma \) uncertainties).

Nevertheless, a measurement of \({\dot{z}}\) is a model-independent indication of the presence of dark energy (Heinesen 2021), and offers a means to directly determine the cosmic expansion history. It also offers some improvement on cosmological priors when combined with more traditional measurements (Alves et al. 2019), and notably tends to break parameter degeneracies in traditional comsological probes (Martins et al. 2021).

In the following, we describe the expected secular redshift drift, its dependence on cosmological parameters, measurement methods including sample selection and systematic effects, and forecasts of the measurement precision and the resulting constraints on cosmological parameters.

3.11.1 Basic idea and equations

The observed secular redshift drift, the rate of change of redshift in the current epoch \(t_0\), is to first order the difference between the Hubble expansion of a coasting universe at redshift z and the true Hubble expansion at that redshift (e.g., Loeb 1998):

$$\begin{aligned} \frac{d z}{d t_0} \equiv {\dot{z}} = (1+z)\, H_0 - H(z) . \end{aligned}$$
(133)

The derivation of this relationship relies only on the null interval obeying \(c\,dt = a(t)\,dr\) and the definitions \(1+z = a(t_0)/a(t_e)\) and \(H = {\dot{a}}/a\): for redshifts measured at times \(t_0\) and \(t_0+\varDelta t_0\), the redshift change is

$$\begin{aligned} \varDelta z = \frac{a(t_0+\varDelta t)}{a(t_e+\varDelta t_e)} - \frac{a(t_0)}{a(t_e)} \simeq \left[ \frac{a(t_0)}{a(t_e)}\, \frac{{\dot{a}}(t_0)}{a(t_0)} - \frac{{\dot{a}}(t_e)}{a(t_e)}\right] \varDelta t_0 \end{aligned}$$
(134)

for \(\varDelta t_0 \ll t_0\). The redshift drift can be recast in terms of an observed acceleration:

$$\begin{aligned} \frac{d v}{d t_0} = \frac{c\, {\dot{z}}}{1+z} = c H_0\left( 1 - \frac{E(z)}{1+z}\right) , \end{aligned}$$
(135)

where E(z) is the unitless rescaled Hubble parameter (Eq. 11) that depends on the contents and curvature of the universe. Measurements of the secular redshift drift thus encode the Hubble constant, the matter density, the curvature, and the dark energy density and its equation of state. Alves et al. (2019) show that the redshift drift is most sensitive to \(H_0\) and \(\varOmega _{\mathrm{m}}\) (or \(\varOmega _{\varLambda }\)) in a canonical flat \(\Lambda \)CDM cosmology. In wCDM or \(w_0 w_a\)CDM models, the effect is less sensitive to \(w_0\) and least sensitive to \(w_a\) (but these broad statements vary somewhat as a function of redshift and the span of redshifts explored by a given probe).

Figure 46 shows sample tracks of \({\dot{z}}\) and \({\dot{v}}\) versus redshift for a few cosmologies as well as their differences from a fiducial model. There are a few noteworthy features of the redshift drift: i) the redshifts of the peak \({\dot{z}}\), the peak acceleration, and the null between acceleration and deceleration are all independent of \(H_0\), but ii) the amplitude of the peaks (and the amplitude of the curves generally) scale with \(H_0\). The redshifts of the peaks and the null depend sensitively on the energy densities, including the curvature, but are somewhat insensitive to \(w_0\) and \(w_a\) when these are close to the canonical values. For example, the \({\dot{z}} = 0\) redshift varies by roughly \(z=2.5 \mp 0.5\) for \(\varOmega _{\mathrm{m}}\) \(= 0.27 \pm 0.03\) in a flat \(\Lambda \)CDM cosmology.

Fig. 46
figure 46

Secular redshift drift (left) and apparent acceleration (right) versus redshift. All loci assume \(H_0\)=74 \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\), \(\varOmega _{\mathrm{m}}\)=0.27, \(\varOmega _{\varLambda }\)=0.73, and \(w_0 = -1\) unless otherwise indicated. The top row shows the full signal, and the bottom row shows the difference between several models and the fiducial cosmology

Measurements of \({\dot{z}}\) at a variety of redshifts can thus probe epochs of acceleration caused by dark energy (\(z \lesssim 2.5\)) as well as epochs of deceleration caused by matter (\(z \gtrsim 2.5\)). This measurement is challenging because the size of the acceleration is small: it reaches a peak value of roughly 0.4 cm s\(^{-1}\) yr\(^{-1}\) at \(z\simeq 0.76\). The peak in \({\dot{z}}\) is \(\sim 2\times 10^{-11}\) yr\(^{-1}\) (or \(\sim H_0/3\)) at \(z\simeq 1.1\), as shown in Fig. 46. Provided one can achieve adequate precision and measurement stability over years to decades, nearly any redshift indicator can be used to measure the secular redshift drift, including spectral lines (emission or absorption) and spectral edges or continuum breaks.

Since the most accessible measurements rely on spectral line centroiding, high signal-to-noise observations of many narrow lines are required, and narrow lines tend to be absorption lines (we exclude astrophysical masers from consideration). The technique therefore favors reasonable optical depth (but unsaturated) absorption lines toward bright optical or radio continuum sources. The Lyman \(\alpha \) forest provides a high-N, high-\(\sigma \) per line regime while radio absorption lines provide (for now) low-N, low-\(\sigma \) measurements. The two regimes are likely to be competitive in the long-run, although the Ly\(\alpha \) forest method will likely be less susceptible to and be better able to average out gravitational accelerations caused by the local environment and large scale structure (see Sect. 3.11.4).

Hi 21-cm radio emission from galaxies has also been proposed as a means to measure \({\dot{z}}\) using large galaxy surveys (Kloeckner et al. 2015). This approach relies on large samples, \(\sim 10^7\) galaxies per measurement, in order to overcome the large line width that samples the full rotation curve of galaxies, the large expected internal accelerations, and the acceleration caused by large-scale structures (as will be discussed in Sect. 3.11.4).

3.11.2 Sample selection

There exist three main methods for detecting the secular redshift drift: (1) Lyman \(\alpha \) forest absorption lines toward bright quasars, (2) Hi 21-cm emission from galaxies, and (3) Hi 21-cm absorption toward radio sources. There are additional methods beyond these that have not been as well developed such as molecular absorption lines toward bright (sub)mm continuum sources. It is also certain that additional clever ideas will arise (see especially Kim et al. 2015) as the notion of directly measuring the cosmic acceleration gains traction and becomes more realistic with new facilities.

Bright quasars are needed to maximize signal-to-noise in high-resolution spectra of the Ly\(\alpha \) forest. Quasars must also be redshifted to place Ly\(\alpha \) redward of the atmospheric UV cutoff. To maximize spectral coverage per observation, optimal quasars would have \(z\simeq 5\) and be as bright as possible. The number of monitored quasars does not need to be large because the large-N statistics arise from the hundreds of absorption lines seen along each sight-line (e.g., Liske et al. 2008).

Hi 21-cm emission line surveys rely on areal coverage and redshift selection. Redshift selection for a fiducial Square Kilometer Array (SKA) survey is flux-limited, and the ability to measure the redshift drift is limited by the number of detected galaxies and their signal-to-noise. Typically, \(\sim 10^7\) galaxies need to be observed within a redshift bin, and Kloeckner et al. (2015) predict that \({\dot{z}}\) can be measured up to \(z\sim 1\).

At present, there are only \(\sim \)140 Hi 21-cm absorption line systems known, which is a consequence of limited surveys, limited bandwidths, radio frequency interference (RFI), and flux sensitivity (absorption systems are generally only detected toward Jy-level continuum radio sources at \(\sim \)1 GHz). As areal coverage and sensitivity of surveys increase with SKA prototypes, the Five-hundred-meter Aperture Spherical radio Telescope (FAST), Canadian Hydrogen Intensity Mapping Experiment (CHIME), and ultimately the full SKA, the number of known systems is expected to increase by more than an order of magnitude.

Most planned or current surveys expect to detect at least hundreds of new Hi 21-cm absorption line systems. For example, the ASKAP FLASH survey expects to detect several hundred new 21-cm absorption line systems at \(z\lesssim 1\) (Sadler et al. 2020). Jiao et al. (2020) describe a commensal FAST survey that is predicted to detect roughly 800, 1900, and 2600 Hi 21-cm absorption systems with \(z<0.37\) in 1, 5, and 10-year surveys, whereas Zhang et al. (2021) predict more than 1500 absorbers would be detected at \(z<0.37\) by FAST. CHIME, however, will survey the northern sky continuously and is predicted to detect \(\sim \)10\(^5\) absorption lines in \(0.8< z < 2.5\) (Yu et al. 2014).

3.11.3 Measurements

Here we focus on the expected precision obtained by redshift drift measurements (forecasts for cosmological parameters based on the following predicted measurements are described in Sect. 3.11.5). Figure 47 depicts the following predictions:

  1. 1.

    Following Liske et al. (2008) Eq. 16, we predict measurements based on a generic 42-m ELT. In the figure, we assume a two-epoch Ly\(\alpha \) forest monitoring program of 10 quasars with S/N of 3000 spanning 20 years. Such a program is expected to reach acceleration uncertainties of 0.22–0.08 cm s\(^{-1}\) yr\(^{-1}\) over redshifts \(z=\) 2–5. It may be possible to improve upon this prediction using absorption lines beyond Ly\(\alpha \), such as other Lyman series lines or metal lines that arise from higher column density clouds (Liske et al. 2008). Moreover, Cooke (2020) presented a “Ly\(\alpha \) cell” calibration technique that uses relative accelerations of metal and Ly\(\alpha \) forest lines to provide a larger lever arm on the signal and to allow internal wavelength calibration of spectra. Finally, Eikenberry et al. (2019b), Eikenberry et al. (2019a) proposed a dedicated non-ELT facility comprising many small telescopes that could reduce the detection time to 5 years.

  2. 2.

    The full SKA, following Kloeckner et al. (2015) (see also Martins et al. 2016), is predicted to use 21-cm emission from galaxies to measure \({\dot{z}}\) with 1–10% uncertainty over redshifts \(z=\) 0.1–1.0 in two epochs spanning 12 years. Galaxy-scale emission line profiles are broad (100 s of km s\(^{-1}\), modulo inclination), which translates into a factor of \(\sim \)1000 in sample size needed to roughly match absorption line centroiding, all else equal. We suggest that emission line edges and object-by-object cross-correlation may improve the expected performance of this technique but that the sensitivity of this technique to \({\dot{z}}\) needs to be modeled in detail using observed 21-cm emission line profiles.

  3. 3.

    Provided the expected populations of Hi 21-cm absorption line systems are detected by FAST and SKA precursors (as discussed in Sect. 3.11.2), we can modify the Darling (2012) predictions to make new estimates of the redshift drift measurement. A 20-year FAST monitoring program of 1000 absorption lines at \(z<0.37\) will obtain acceleration precision of roughly \(\pm 0.08\) cm s\(^{-1}\) yr\(^{-1}\). Likewise, a 10-year SKA program observing two redshift bins at \(z=0.55\) and \(z=0.85\) with 500 lines each can reach rms acceleration noise of \(\sim \)0.08 cm s\(^{-1}\) yr\(^{-1}\), which is similar to the expectation for 21-cm emission.

Yu et al. (2014) predict that CHIME can reach 0.08–0.14 cm s\(^{-1}\) yr\(^{-1}\) uncertainties spanning the range \(z=\) 0.8–2.5 in a 10-year survey. The key differences between CHIME and FAST or SKA programs are the 100-fold higher number of expected absorption line systems and the daily observation of every system over 10 years. If absorption line systems are detected at the predicted rate, this suggests that CHIME will be competitive with two- or few-epoch surveys of \(\sim 10^3\) systems that require much larger collecting areas.

Figure 47 shows the measurement forecasts and illustrates how the signal can be detected but cannot generally discriminate between cosmologies that are consistent with current paradigms. They can, however definitively and directly demonstrate the influence of dark energy on the cosmic expansion without use of standard distance indicators or models.

Fig. 47
figure 47

Forecast acceleration measurements versus redshift for the SKA using Hi 21-cm emission from galaxies (Kloeckner et al. 2015), CHIME using Hi 21-cm absorption (Yu et al. 2014), Hi 21-cm absorption using FAST and the SKA estimated from projected detections (see text), and an ELT program that monitors the Ly\(\alpha \) forest (Liske et al. 2008). The cosmological tracks follow Fig. 46. The shaded loci and error bars indicate 1\(\sigma \) uncertainties

3.11.4 Systematic effects

Systematic effects include the ability to obtain stable and repeatable wavelength or frequency calibration, the relative angular motions of absorbing gas with respect to illumination sources, illumination source variability in size, flux, and spectral properties, motion of the observer, peculiar velocity and accelerations, and gravitational accelerations internal to and between monitored objects. Observations are made from a very non-inertial reference frame that reflects multiple accelerations and rotations, although these will be well-determined in the near future to better precision than is needed for the \({\dot{z}}\) measurement.

The requisite calibration stability relies on a local oscillator, and current radio facilities already support this level of precision (e.g., Cooke 2020). Optical spectroscopy will require stable references such as laser combs and actively-controlled high-precision spectrographs (e.g., Eikenberry et al. 2019b, a).

Gravitational accelerations within galaxies, between galaxies, and within galaxy clusters are of order 1 cm s\(^{-1}\) yr\(^{-1}\). For example, the barycenter acceleration due to its orbit within the Galaxy is \(\sim \)0.7 cm s\(^{-1}\) yr\(^{-1}\) (e.g., Titov et al. 2011; Charlot et al. 2020; Gaia Collaboration et al. 2021), which is larger than the peak cosmological acceleration. The \({\dot{z}}\) signal, however, has a well-defined sign at low and high redshifts (away from the null value), while gravitational accelerations will be randomly distributed and null-centered. The net effect of peculiar accelerations will therefore be added noise, which may drive up sample sizes, integration times, and program duration. Gravitational accelerations will be largest for 21-cm emission and absorption lines.

Hi 21-cm absorption lines can be intrinsic to the host of the illumination source or intervening between the illumination and the observer, but are generally going to have column densities associated with damped Ly\(\alpha \) systems and therefore associated with galaxies rather than intergalactic clouds. Peculiar accelerations are of larger concern in these systems than in the Ly\(\alpha \) forest (Cooke 2020), particularly in light of the comparatively smaller number of clouds that will be used for the measurements, except for CHIME (if the expected absorption line population is realized).

Loeb (1998) and Liske et al. (2008) explored the impact of peculiar acceleration on the Ly\(\alpha \) forest and found that it is significantly smaller than the cosmological signal. Cooke (2020) used hydrodynamical simulations to calculate peculiar accelerations in the Ly\(\alpha \) forest and in gas in galaxies and founds that the Ly\(\alpha \) forest peculiar accelerations are much smaller than the cosmological signal except at the \({\dot{z}}\) zero-crossing region. Gas in galaxies, however, shows accelerations of the same order of magnitude up to 2 dex higher than the redshift drift, which supports the concern about systematic effects in 21-cm measurements.

3.11.5 Main results and forecasts

Secular redshift drift measurements on their own will not compete with other “precision cosmology” probes in terms of per cent-level constraints on cosmological parameters. However, the method does offer a model-independent method to directly detect the cosmic acceleration that does not rely on standard candles, standard rulers, or the cosmic distance ladder, and therefore has completely different systematics from canonical cosmological probes. It is also a powerful probe of isotropy and the general FLRW model (Quartin and Amendola 2010).

Fig. 48
figure 48

Forecast cosmological parameter constraints using the combined secular redshift drift measurements presented in Fig. 47 for a flat \(\Lambda \)CDM model (left) and a flat CPL model (right)

Using the combined data and uncertainties for all methods shown in Fig. 47, we run an MCMC analysis to forecast constraints on the parameters of three different cosmological models: (1) a flat \(\Lambda \)CDM model (with two free parameters, \(H_0\)and \(\varOmega _{\mathrm{m}}\)), (2) a geometrically unconstrained \(\Lambda \)CDM model (where also \(\varOmega _{\varLambda }\)is free to vary), and (3) a flat \(w_0 w_a\)CDM model (with four free parameters, namely \(H_0\), \(\varOmega _{\mathrm{m}}\), \(w_0\), and \(w_a\)). The results are shown in Fig. 48. The fiducial parameter values used for the forecasts are \(H_0\)=74 \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\), \(\varOmega _{\mathrm{m}}\)=0.27, \(\varOmega _{\varLambda }\)=0.73, \(w_0 = -1\), and \(w_a = 0\). The less constrained models show the largest uncertainties in large part due to strong correlations between parameters. The best-constrained cosmology is the flat \(\Lambda \)CDM model, which provides uncertainties on \(H_0\) and \(\varOmega _{\mathrm{m}}\) of \(\pm 2\)%. The unconstrained \(\Lambda \)CDM model has uncertainties in \(H_0\), \(\varOmega _{\mathrm{m}}\), and \(\varOmega _{\varLambda }\) of \(\sim \)40% and are highly degenerate. Finally, the flat \(w_0 w_a\)CDM model shows a mixed picture with uncertainties of 17% in \(H_0\), 8% in \(\varOmega _{\mathrm{m}}\), \(\pm 0.1\) in \(w_0\), and \(\pm 0.3\) in \(w_a\), with strong correlation between all parameters. In analyses comparing various redshift drift measurement methods, Martins et al. (2021) and Esteves et al. (2021) show that there is no “best” method and caution that the choice of measurement should be tailored to specific science goals (e.g., constraining \(\varOmega _{\mathrm{m}}\) versus the dark energy equation of state).

The correlation between parameters measured by the secular redshift drift suggests that this method would benefit from joint analyses with other cosmological probes (non-standard and otherwise, see Sect. 4). For example, Alves et al. (2019) combine the expected ELT, SKA 21-cm emission, and CHIME measurements to make individual and joint forecasts for flat \(\Lambda \)CDM, wCDM, and \(w_0 w_a\)CDM cosmologies, both with and without priors. When current or future expected priors are included, cosmological parameter constraints of \(\sim \)1% can be obtained. Moreover, Martins et al. (2021) show that the redshift drift can break parameter degeneracies in traditional cosmological probes.

The larger impact of the secular redshift drift measurement is its ability to unambiguously and directly identify the influence of dark energy on the Hubble expansion. This statement applies individually for any of the measurement methods described above, including the Ly\(\alpha \) forest technique that would only measure deceleration: the amplitude of \({\dot{z}}\) changes dramatically in the absence of dark energy. Any method that can measure a non-zero cosmic acceleration can differentiate between cosmologies with and without dark energy (as shown in Figs. 46 and 47).

3.12 Clustering of standard candles

For over a decade after the seminal 1998 papers (Perlmutter et al. 1998, 1999; Riess et al. 1998), SNe Ia have been one of the most important observables in cosmology. Their prominence as a probe of the background cosmology has more recently been shadowed by the large increase in the available data of both the CMB and of BAO in large-scale structure. There are, however, two reasons why supernovae could return to the forefront of cosmology. First, the Vera Rubin Observatory Legacy Survey of Space and Time (LSST, LSST Science Collaboration et al. 2009) should increase the available number of events by at least two orders of magnitude. Second, supernovae are also able to probe cosmology beyond the background level.

There have been two approaches to extract information on linear perturbation parameters from supernovae. First, they can be used as probes of gravitational lensing. They can in fact be used both in the weak (Quartin et al. 2014; Castro and Quartin 2014; Scovacricchi et al. 2017; Macaulay et al. 2017, 2020) and strong lensing regimes (Zumalacarregui and Seljak 2018; Grillo et al. 2018b, 2020). The main observable is the induced change in their scatter at a given redshift. The second approach is to measure the correlations between supernova magnitudes induced by the peculiar velocity field. This field can be computed to good precision in linear perturbation theory and is correlated to the density contrast (Hui and Greene 2006). Measurements of these correlations have been more recently explored in detail in a number of papers (Castro et al. 2016; Howlett et al. 2017; Garcia et al. 2020; Amendola and Quartin 2021; Graziani et al. 2020). Interestingly, such correlations can also be probed with good precision by upcoming standard siren data (see also Sect. 3.4), as discussed in Palmese and Kim (2021), Alfradique et al. (2022).

Here, we review this latter approach and the forecasts performed for future survey. The advantages of peculiar velocity measurements is that they are well described by linear perturbation theory and both velocity and density tracers have different degeneracies with the linear bias, making them very complementary.

3.12.1 Basic idea and equations

The first measurement of peculiar velocity correlation in real supernova data was carried out by Gordon et al. (2007) using 271 SNe and the MLCS2k2 light-curve fitting method, reaching a \(3.6\sigma \) detection. Castro et al. (2016) proposed a more thorough methodology to extract peculiar velocity information from supernova data. Combining the SN velocity and SN lensing observables in the JLA supernova catalog (Betoule et al. 2014), a joint measurement of \(\sigma _8\) and the growth rate index \(\gamma \) was obtained from SN data alone. This included marginalization over 8 nuisance parameters for both light-curve fitting (using SALT2), lensing and peculiar velocities and other 4 cosmological parameters. It was also shown by Castro et al. (2016) that SN lensing and velocities constraints were very complementary, with degeneracy directions differing by \(60^\circ \) in the \(\sigma _8,\,\gamma \) plane. It was shown that both SN lensing and velocities were also very complementary to the CMB growth of structure constraints.

A measurement of \(f \sigma _8\) at low redshifts, where the dependence on cosmology is weak, was also obtained with SN velocities by Huterer et al. (2017), Boruah et al. (2020). The former used the Supercal SN catalog and 6dFGS data; the latter used the A2 (Second Amendment) SN catalog combined with 2MTF and SFI++ data, and also included velocity estimates based on the Tully–Fisher method. Qin et al. (2019) combined the density and velocity measurements (using the Fundamental Plane relation instead of supernovae) and discussed how to recover the momentum power spectrum (see below for a discussion on the momentum).

A summary of current constraints are listed in Table 6. Note that none of these current measurements employ the full Clustering of Standard Candles method, as described below. In particular, only Qin et al. (2019) combined velocity and density power spectrum measurements, and in that case the cross-spectrum was not analyzed and velocities were not estimated using standard candles.

Table 6 Current measurements using techniques similar to the Clustering of Standard Candles but which do not employ the full methodology

Measurements of the velocity power spectrum (which can be more precisely measured with standard candles) can be combined to great gain with measurements of the density power spectrum and the density-velocity cross-spectrum. This was first proposed by Howlett et al. (2017) (henceforth H17), which performed Fisher Matrix forecasts for measuring \(f\sigma _8\) combining density and velocity spectra. The former measured with galaxies, the latter with SN. Similar forecasts were also performed by Palmese and Kim (2021), Alfradique et al. (2022) combining future standard siren and galaxy survey data.

The above promising results prompted a study of the capabilities of Rubin to perform measurements of the velocity power spectrum with SNe. Garcia et al. (2020) investigated the constraints that could be achieved with Rubin using the official survey strategy under investigation by the collaboration at the time. As was known, that strategy was not optimal for SN science, and the inferred SN completeness using the SNANA code (Kessler et al. 2009) and the proposed quality cuts was very low both at low z and for \(z>0.5\). Figure 49 illustrates this result (dubbed LSST Status Quo, or LSST SQ in short), as well as the assumed completeness by a few other recent works. Nevertheless, even without further refinements, this was already enough to achieve very interesting velocity measurements with Rubin. It was also shown in that paper that the velocity constraints in the \(\sigma _8,\, \gamma \) plane exhibit moderate non-Gaussianity (they are banana-shaped, instead of ellipsoidal), and thus the Fisher Matrix forecasts on the errors were not very accurate. Since the combination of velocity and density spectra makes for much tighter constraints than using velocity alone, the Fisher Matrix results for the combined cases is expected be a good approximation of the full likelihood results.

Garcia et al. (2020) also investigated how to improve the observing strategy, and found that the same observing time provides the similar cosmological information whether one observes a larger area, or a smaller area during more years. In fact, it was shown that even with optimistic Rubin SN numbers, the SN velocity spectrum is still observed far from the Cosmic Variance regime, and for a broad range of SN number densities \(n_{\mathrm{s}}\) the uncertainties still scale as \(n_{\mathrm{s}}^{-1/2}\), which is the same power with which uncertainties generally scale with the survey area. This means that in terms of SN clustering, the most important feature is to have a high cadence in order to achieve higher SN completeness. Lochner et al. (2022) recently revisited the impact of different survey strategies on SN velocity measurements.

Figure 49 also illustrates that the Zwicky Transient Facility (ZTF) would in principle be capable of observing a catalog of SN with high completeness for \(z<0.3\). In fact, recently a first measurement of the clustering of both core collapse and type Ia SN was performed by ZTF (Tsaprazi et al. 2022).

Fig. 49
figure 49

Comparison of assumed SN completeness in forecasts. In dashed lines represent the maximum ZTF theoretical completeness using the limiting magnitude in the deepest filter for the standard 30’ exposure time and also for a possible 120’ exposure. The horizontal solid lines represent assumptions made different works. For Rubin we also show results from the survey strategy as of 2019, which was obtained after applying the proposed photometric quality cuts for a 5-year survey (LSST SQ, for Status Quo), which greatly reduces the completeness

The combined measurements of velocity and density spectra were also studied in Amendola and Quartin (2021) (henceforth A21) where a model-independent methodology was proposed to extract competitive constraints in E(z) without almost any assumption regarding the cosmological model in any stage of the analysis. It was shown, using SN only both as density and velocity tracers, that it was possible to achieve 5–13% (9–40%) measurements in redshift bins of \(\varDelta z = 0.1\) up to at least \(z = 0.6\). These results included marginalization over a large number of bias parameters, which were allowed to vary freely in both z and k. It was also discussed that using SN one cannot however measure \(H_0\) with this method. Moreover the constrains on E(z) blow up in the limit \(z \rightarrow 0\).

Quartin et al. (2022) (henceforth Q21) recently proposed further to analyze galaxy and supernova data in a more exhaustive way by using SN both as density and velocity tracers. This combines the complementarity of the velocity measurements with the benefits of a multi-tracer analysis (see, e.g. Seljak 2009; McDonald and Seljak 2009; Abramo 2012; Abramo and Leonard 2013). Here, instead of different galaxy populations, the multiple tracers are galaxies and supernovae. This leads to 6 different power spectra (3 auto and 3 cross spectra), and thus it was dubbed the \(6\times 2\)pt method. This was shown to increase the precision with respect to the \(3\times 2\)pt methods studied in both H17 and Amendola and Quartin (2021), at no cost in terms of extra data being needed. This extra precision was achieved not only in the cosmological parameters, but also in the bias parameters, making this approach more robust to uncertainties in the galaxy bias.

One should note that in general velocity tracers inhabit galaxies. This means that we can only observe the velocity fields where there are galaxies. This means that we observe a mass-weighted velocity field, also referred to as the momentum field (Howlett 2019): \({\varvec{p}}({\mathbf {r}}) \equiv {\varvec{v}}({\mathbf {r}}) (1+\delta _g({\mathbf {r}}))\). At larger scales both momentum and velocity field coincide, but already at scales of \(\sim \)0.1 h/Mpc the former picks up non-linearities arising from quadratic terms. Nevertheless, these can be modeled using perturbation theory in a straightforward manner, so we will neglect them here for simplicity.

Let us denote with \(\delta _m\) the density contrast of matter, and with \(\delta _T = b_T \delta _m\) the density contrast of a tracer field of sources (subscript T) that are or can be standardized, e.g. SNe Ia, where \(b_T\) is the bias, in general dependent in an unknown way on space and time. In the linear regime and in Fourier space, we know that, due to the continuity equation, the peculiar velocity field v and the matter density contrast of a tracer field are related by:

$$\begin{aligned} {\varvec{v}}_T = i H\beta \frac{{\varvec{k}}}{k^{2}(1+z)}\,\delta _{T} , \end{aligned}$$
(136)

where \(\beta =f/b_T\), and \(f=d\log \delta _m/d\log a\) being the growth rate. The only component of the velocity field that is observable is, however, the longitudinal velocity \(v_{T\parallel }={\varvec{v}}\cdot {\varvec{r}}/r\) (although see Hotinli et al. 2019), so the relation becomes:

$$\begin{aligned} v_{\parallel }=i\frac{H}{k(1+z)}\beta \frac{{\varvec{k}}\cdot {\varvec{r}}}{kr}\delta _{T} =i\frac{H\mu }{k(1+z)}\beta \delta _{T} , \end{aligned}$$
(137)

where \(\mu =\cos \theta _{{\varvec{k}},{\varvec{r}}}\) is the angle between \({\varvec{k}}\) and the line of sight \({\varvec{r}}\). From this expression we see that, if we can measure both \(\delta _T\) and \(v_{\parallel }\), we have access to the combination of \(H\beta \), assuming that we also know \(k,\mu \) (and of course the redshift z). So in order to measure H(z) we need to measure \(\beta \): this can be estimated through the redshift distortion of the galaxy power spectrum. However, we also need to be able to convert the raw data of redshift and angles into k and \(\mu \). To solve this problem, we will make use of the fact that \(k,\mu \) depend on the observables (redshift and angles) through the angular diameter distance \(D_\mathrm{A}\) and through H(z) itself. We assume the Etherington relation between the luminosity distance \(D_{\mathrm{L}}\) and the angular diameter distances is valid, so that \(D_{\mathrm{L}}=D_{\mathrm{A}}(1+z)^2\). We also assume that \(D_{\mathrm{L}}\) is measured directly from the standard candles, while \(H_0\) is given by local measurements, so that we know the combination \(H_0 D_{\mathrm{L}}\). Although we could include the error on the estimation of \(H_0 D_{\mathrm{L}}\) in our formalism, we will show at the end that it is way below the other uncertainties, so we may neglect it.

A peculiar velocity v (in units of c) induces a change in the luminosity distance \(D_{\mathrm{L}}\) given by (Hui and Greene 2006):

$$\begin{aligned} \frac{\delta {D_{\mathrm{L}}}}{D_{\mathrm{L}}}=v\left[ 2-\frac{d\log D_\mathrm{L}}{d\log (1+z)}\right] . \end{aligned}$$
(138)

Since \(m=M+25+5\log D_{\mathrm{L}}\), a small change in \(D_{\mathrm{L}}\) induces a change \(\delta m\) in the apparent magnitude given by:

$$\begin{aligned} \frac{\delta D_{\mathrm{L}}}{D_{\mathrm{L}}}=\frac{\log 10}{5}\delta m \;, \end{aligned}$$
(139)

so that finally the radial peculiar velocity of a standard candle is obtained from the scatter \(\delta m\) of its apparent magnitude as:

$$\begin{aligned} v=\frac{\log 10}{5} \delta m \left[ 2-\frac{d\log D_\mathrm{L}}{d\log (1+z)}\right] ^{-1} \;. \end{aligned}$$
(140)

Let us now consider three Gaussian fields in Fourier space with zero mean: the density contrast \(\delta _{\mathrm{s}}\) of the standard candles, their peculiar velocity field \(v_{\mathrm{s}}\), and the galaxy density contrast \(\delta _{\mathrm{g}}\). A fraction of the supernovae could be hosted by one of the galaxies in the sample, but we expect this fraction to be small. We consider the same growth rate index f for every tracer field, which equates to assuming universal gravity. We also introduce the linear bias for each species, \(b_\mathrm{g,s}=\delta _{\mathrm{g,s}}/\delta _{\mathrm{tot}}\), where \(\delta _{\mathrm{tot}}\) is the underlying total matter density contrast. The functions \(b_\mathrm{g,s}\) are in general arbitrary functions of space and time. Following Quartin et al. (2022) we can write the six observed power spectra as:

$$\begin{aligned} P_{\mathrm{gg}}(k,\mu ,z)&= \varUpsilon \big [1+ \beta _{\mathrm{g}} \mu ^{2}\big ]^2 \,b_{\mathrm{g}}^{2} \,S_{\mathrm{g}}^2\, D_+^2 P_{\text {mm}}(k) + \frac{1}{n_{\mathrm{g}}} , \end{aligned}$$
(141)
$$\begin{aligned} P_{\mathrm{ss}}(k,\mu ,z)&= \varUpsilon \big [1+ \beta _{\mathrm{s}} \mu ^{2}\big ]^2 \,b_{\mathrm{s}}^{2}\,S_{\mathrm{s}}^2 \, D_+^2 P_{\text {mm}}(k) + \frac{1}{n_{\mathrm{s}}} , \end{aligned}$$
(142)
$$\begin{aligned} P_{\mathrm{gs}}(k,\mu ,z)&= \varUpsilon \big [1+ \beta _{\mathrm{g}} \mu ^{2}\big ]\big [1+ \beta _{\mathrm{s}} \mu ^{2}\big ] \,b_{\mathrm{g}} \,b_{\mathrm{s}}\,S_{\mathrm{g}} \,S_{\mathrm{s}} \, D_+^2 P_{\text {mm}}(k) + \frac{n_{\mathrm{gs}}}{n_{\mathrm{g}}n_{\mathrm{s}}} , \end{aligned}$$
(143)
$$\begin{aligned} P_{\mathrm{gv}}(k,\mu ,z)&= \varUpsilon \frac{H\mu }{k(1+z)} \!\big [1 + \beta _{\mathrm{g}}\mu ^{2}\big ] b_{\mathrm{g}}\, S_{\mathrm{g}}\, S_{\mathrm{v}} \,f D_+^2 P_{\text {mm}}(k) , \end{aligned}$$
(144)
$$\begin{aligned} P_{\mathrm{sv}}(k,\mu ,z)&= \varUpsilon \frac{H\mu }{k(1+z)} \!\big [1 + \beta _{\mathrm{s}}\mu ^{2}\big ] b_{\mathrm{s}}\, S_{\mathrm{s}}\, S_{\mathrm{v}} \, f D_+^2 P_{\text {mm}}(k) , \end{aligned}$$
(145)
$$\begin{aligned} P_{\mathrm{vv}}(k,\mu , z)&= \varUpsilon \left[ \frac{H\mu }{k(1+z)}\right] ^2 S_{\mathrm{v}}^2 \,f^{2} \,D_+^2 P_{\text {mm}}(k)+ \frac{\sigma ^2_{v, \mathrm{eff}}}{n_{\mathrm{s}}} , \end{aligned}$$
(146)

where \(\beta _i \equiv f/b_i\), \(\mu \equiv {\hat{k}} \cdot {\hat{r}}\), \(S_{\mathrm{g,v,s}}\) are damping terms and \(P_{\text {mm}}\) is the matter power spectrum at \(z=0\).

All observed spectra are multiplied by a volume-correcting factor \(\varUpsilon \) (Ballinger et al. 1996; Seo and Eisenstein 2003), where:

$$\begin{aligned} \varUpsilon = \frac{H D_{L,r}^2}{H_{r}D_{\mathrm{L}}^2} \;, \end{aligned}$$
(147)

because we need first to choose a reference cosmology, e.g. \(\Lambda \)CDM (subscript r), and then correct for any other cosmology. For the same reason, the AP effect (Alcock and Paczynski 1979b), which introduces corrections to k and \(\mu \) that depend on HD (see, e.g., Magira et al. 2000; Amendola et al. 2005), has also been taken into account by replacing all \(k,\mu \)’s in the rhs of Eqs. (141)-(146) with the AP-corrected \(k,\mu \)’s.

The non-linear smoothing factors \(S_{v,g,s}\) (important only at small scales) can be taken following Koda et al. (2014), Howlett et al. (2017) to be:

$$\begin{aligned} S_{\mathrm{v,g,s}}=\exp \left[ -\frac{1}{4}(k\mu \sigma _\mathrm{v,g,s})^{2}\right] . \end{aligned}$$
(148)

In this expression, \(\sigma _{v,g,s}\) are assumed to be independent of redshift. Q21 set as fiducial values \(\sigma _{\mathrm{g}}=\sigma _\mathrm{s} = 4.24 \;\mathrm{Mpc}/h\) and \(\sigma _{\mathrm{v}}= 8.5\) Mpc/h. H17 used very similar values. These fiducial values nevertheless have little impact in the forecasts. Finally, the noise term in the velocity correlation is given by (Hui and Greene 2006; Davis et al. 2011):

$$\begin{aligned} \sigma _{v,\mathrm{eff}}^{2}\equiv \left[ \frac{\log 10}{5}\sigma _\mathrm{int}\right] ^2\!\left[ 2-\frac{d\log D_\mathrm{L}}{d\log (1+z)}\right] ^{-2}\!\!\!+\frac{\sigma _{v\mathrm{,nonlin}}^{2}}{c^{2}} , \end{aligned}$$
(149)

where \(\sigma _{\mathrm{int}}\) is the intrinsic variance of the source’s magnitude.

The \(6\times 2\)pt results in a \(3\times 3\) matrix of correlation:

$$\begin{aligned} {\mathbf {C}} = \left( \begin{array}{ccc} \!P_{\mathrm{gg}} &{} \!P_{\mathrm{gs}} &{} \!P_{\mathrm{gv}} \\ \!P_{\mathrm{gs}} &{} \!P_{\mathrm{ss}} &{} \!P_{\mathrm{sv}} \\ \!P_{\mathrm{gv}} &{} \!P_{\mathrm{sv}} &{} \!P_{\mathrm{vv}} \end{array}\right) . \end{aligned}$$
(150)

The probability distribution of our random variables, i.e. \(x_{a}=\sqrt{V}\{\delta _g,\delta _s,v_s\}\), is assumed Gaussian with zero mean and covariance matrix given by \({\mathbf {C}}\). The Fisher matrix associated to the unknown parameters is thus (Abramo and Amendola 2019):

$$\begin{aligned} F_{\alpha \beta } \,=\, VV_{k}\bar{F}_{\alpha \beta } , \end{aligned}$$
(151)

where \(V_{k}=(2\pi )^{-3}2\pi k^{2}\varDelta _{k}\) is a volume element in Fourier space and \(\bar{F}_{\alpha \beta }\) is:

$$\begin{aligned} \bar{F}_{\alpha \beta }=\frac{1}{2}\int _{-1}^{+1}d\mu \frac{\partial C_{ab}}{\partial \theta _{\alpha }}C_{ad}^{-1}\frac{\partial C_{cd}}{\partial \theta _{\beta }}C_{bc}^{-1} , \end{aligned}$$
(152)

where the integrand is evaluated at the fiducial value and \(\theta _\alpha \) are the cosmological parameters we want to estimate. For a z-shell of volume V(z) and for \(\varDelta _{k}\approx 2\pi /V^{1/3}\), we have:

$$\begin{aligned} VV_{k} = \frac{k^{2}V^{2/3}}{2\pi } . \end{aligned}$$
(153)

The k-cells were chosen in A21 and Q21 with equal \(\varDelta _{k}=2\pi /V(z)^{1/3}\) between \(k_{\min }(z)\) and \(k_{\max }\), and \(k_{\min }=2\pi /V(z)^{1/3}\) following (Garcia et al. 2020). A21 and Q21 assumed \(k_{\max } = 0.1~h/\)Mpc, whereas H17 assumed \(k_{\max } = 0.2~h/\)Mpc (see Table 7). As discussed in A21, the latter value is responsible for substantial increases in precision.

Table 7 Survey specifications for the proposed forecast scenarios in H17 and Q21

3.12.2 Measurements and sample selection

The equations above and the results below assume spectroscopic measurements of both galaxies and supernovae. If one has to rely on photometric data only, the corresponding photo-z errors will degrade the clustering measurements along the line-of-sight, resulting in larger effective non-linear smoothing factors \(S_{v,g,s}\). For supernovae, the absence of spectroscopic follow-ups will result in contamination from core collapse supernovae, which could be a source of bias as discussed below.

The need for galaxy spectra does not substantially decrease the final precision of the method as due to cosmic variance the information saturates for relatively low number densities, which should be reached with surveys like DESI (DESI Collaboration et al. 2016) and 4MOST (de Jong et al. 2019). For instance, in the Q21 Conservative case, only half a million galaxies with spectra would be required. This will only pose a real challenge in the cases where one tries to push to higher redshifts (\(z \gtrsim 0.5\)), since the absolute number of galaxies needed to get close to the cosmic-variance limit in each redshift bin increases roughly with \(z^2\).

3.12.3 Systematic effects

The sources of systematic effects in the \(6 \times 2\)pt method are the same as for any supernovae and large scale structure survey. Here we limit ourselves to the list of the most important ones, referring to the literature for details.

On the supernovae side, one has to expect various sources of systematic uncertainties. For instance, one can incorrectly classify core collapse supernovae or other transient phenomena as SNe Ia. This is specially problematic if SNe lack spectra, although there is an on-going effort to improve photometric classification techniques (see, e.g., Lochner et al. 2016; Ishida et al. 2019; Villar et al. 2020). Without further improvements in photometric classification, extra dispersion would need to be included in the SN distances to avoid biases, which Vargas dos Santos et al. (2019) showed that could lead to an effective reduction on the number of SN by up to two thirds.

Secondly, the standardization of SNe Ia might be more complicate of what usually assumed, with dependencies on environment, host mass, metallicities, etc., that are still not perfectly accounted and corrected for. Gravitational lensing of the sources is another possible form of bias, although the overall effect is expected to be negligible. The smoothing factors \(S_{g,s,v}\) that we introduced in the previous section might also deviate from the simple parameterization we adopted, perhaps with a redshift dependence. If the SNe Ia redshifts are evaluated through photometric methods, there are of course additional sources of uncertainties, which could however be modeled by larger, and redshift dependent, smoothing factors.

On the large-scale structure side, one should of course consider carefully other effects. First, finite surveys induce window-function distortions on the power spectrum shape that have to be taken into account, although on the forthcoming large surveys this problem is probably under control. Secondly, the redshift bins cannot really be taken as independent, and some correction is also expected (see, e.g., Bailoni et al. 2017). Moreover, magnitude lensing is also affecting the clustering (see, e.g., Cardona et al. 2016).

Perhaps the most problematic systematics is however the impact of non-linearities. The assumption of linearity enters in fact our calculation in several ways: in the P(k) shape, in the Kaiser redshift correction, in the velocity-density contrast relation, and in the overall Gaussian assumption. The non-linearity is actually in principle accounted for by the smoothing factors \(S_{\mathrm{v,g,s}}\), but of course these functions are calibrated only through \(\Lambda \)CDM simulations and might differ sensibly in alternative cosmologies. Already at \(k=0.1\,h\)/Mpc one-loop corrections become relevant, specially when one allows conservative priors for all nuisance parameters involved, as discussed in Amendola et al. (2022). Nevertheless, inclusion of all these parameters may allow the extension to higher values of \(k_{\max }\), especially at higher redshifts. For instance for the analysis of BOSS data Chudaykin et al. (2021) and Ivanov et al. (2020) employed \(k_{\max } = 0.20\,h\)/Mpc and \(k_{\max } = 0.25\,h\)/Mpc, respectively. It remains to be investigated how to best generalize the clustering of standard candles to include one loop corrections and to which scales it can be relied upon.

Fig. 50
figure 50

Image adapted with permission from Quartin et al. (2022)

1 and \(2\sigma \) marginalized forecasts in \(\{\sigma _8,\, \gamma \}\) in from the \(6\times 2\)pt method for the Q21 Conservative (left) and Q21 Aggressive forecasts (right panel). Also shown are the CMB-only and joint constraints. As can be seen, the \(6\times 2\)pt and CMB constraints are very complementary.

3.12.4 Main results and forecasts

The results obtained in H17 for the \(3\times 2\)pt case using galaxies and supernovae are summarized in Table 8 for the case dubbed All Rubin SNe, which is described in detail in Table 7. Here we just recast the H17 results in wider redshift bins. As can be seen, constraints in \(f\sigma _8\) between 3 and \(10\%\) can be achieved in that case.

Forecasts of the \(6\times 2\)pt were performed in Q21 allowing a cosmological model with 5 parameters: \(\{\sigma _8\), \(\gamma \), \(\varOmega _{\mathrm{m}}\), \(\varOmega _{k0}\), \(h\}\) and making use of 3 global nuisance parameters describing the non-linear smoothing factors in Eq. (148) and allowing each bias parameter to be free in each redshift bin. The final, marginalized, constraints in each parameter is given in Table 9, and the 2-D contours in \(\{\sigma _8, \,\gamma \}\) are depicted in Fig. 50. As discussed in Q21, neglecting the AP corrections or assuming flatness has little impact on the \(\sigma _8\) and \(\gamma \) constraints (the other parameters are affected to a higher degree). This figure also illustrates the CMB contours, which were extracted from Mantz et al. (2015). We point the reader to Q21 for more details.

The \(6\times 2\)pt method can also be combined with the traditional Hubble diagram distance measurements with standard candles. This synergy has been investigated by Alfradique et al. (2022), where it was shown that although the improvements to \(\{\sigma _8, \,\gamma \}\) are negligible, this combination yields large gains for \(\varOmega _{k0}\), and should be able to constrain it to less than 2% using either Rubin SN or third generation standard siren measurements. In the latter case, h could also be measured with over an order magnitude increased precision.

Table 8 Relative 1\(\sigma \) errors in \(f\sigma _8\) using the \(3\times 2\)pt gs method. Adapted from Howlett et al. (2017)
Table 9 Fully marginalized absolute forecast uncertainties in each cosmological parameter using the \(6\times 2\)pt method. The (relative) bias uncertainties are the average over all redshift bins, but their redshift dependence is small, only around \(\sim \)10%. Adapted from Quartin et al. (2022)

Finally, using the methodology discussed in Amendola and Quartin (2021), one can employ the \(6\times 2\)pt method also in another way, namely, to produce forecasts without assuming a parameterization of H(z), P(kz) and \(\beta _{\mathrm{g,s}}(k,z)\). This is obtained by employing the data directly in every kz-bin, avoiding therefore the need for assuming a specific cosmological model. Q21 showed that one can obtain uncertainties on E(z) around 3–4% in the farthest bin of the Aggressive survey, as shown in Fig. 51.

Fig. 51
figure 51

Errors in H with the model-independent approach compared to using only galaxy clustering (red and green errorbars are slightly displaced for clarity). The two grey continuous lines represent H(z) for \(w=-0.9\) (top) and \(w=-1.1\) (bottom), as a convenient graphical reference

4 Synergies and complementarities between cosmological probes

In Sect. 3, we extensively discussed all the characteristic and peculiarities of the new emerging cosmological probes, individually. At the end of this review, it is useful to explore also the improvement that can be achieved from the synergical complementarity of the various probes when, potentially, they are combined together; this will allow us to assess if, and how much, they complete each other, and what we could learn from studying them jointly.

The first important point to look at is the redshift range specifically mapped by each probe, as presented in Fig. 52. The horizontal bands show the redshift range of the various methods as discussed in the corresponding sections, either currently covered or expected to be covered with future surveys. The dotted points represent current measurements, while the crosses indicate future forecasts. In some cases, a cosmological probe includes an integrated information from a higher redshift, as in the case of TDC and CCSL, being the measurements of sources at a much larger distance than the lenses, or of SA, providing information also of the entire expansion history up to the formation redshift of the star considered; in the plot, we display that information with arrows. The various methods have been ordered from top to bottom as a function of the spanned range. In the bottom part of the plot are shown, for comparison, the main cosmological probes, namely CMB, BAO and SNe. The first point that it is interesting to notice is how the new emerging cosmological probes richly complement the main probes covering different ranges of cosmic times, from the very local ones (\(z<0.1\) for SA, SBF, and SS), and extending to very high redshifts (up to \(z\sim 10-12\) for QSO, GRB, NHIM, and RD). They allow us to span almost 13.4 Gyr of cosmic time, a significantly larger range than the one reachable by current probes. It is also relevant how a significant fraction of these methods overlap with the range of BAO and SNe (\(0.1<z<2\) for CC, CSC, CV, CCSL, and TDC), providing a crucially wider compilation of late-Universe probes that can result decisive in breaking the dichotomy between late- and early-Universe results, and in validating the results obtained from standard probes.

Fig. 52
figure 52

Redshift distribution of the various emerging cosmological probes considered in this review. From top to bottom are shown stellar ages (SA), surface brightness fluctuations (SBF), standard sirens (SS), cosmic chronometers (CC), quasars (QSO), gamma-ray burst (GRB), clustering of standard candles (CSC), cosmic voids (CV), neutral hydrogen intensity mapping (NHIM), secular redshift drift (RD), cosmography with cluster strong lensing (CCSL), and time delay cosmography (TDC). The horizontal bands show, for each probe, the expected redshift range considering both current and future measurements, where the circle dots represent current measurements described in the review and cross signs the forecasts. The arrows indicate when a probe carries integrated information from a larger redshift, as in the case of stellar ages, mapping all the expansion history since their formation, or TDC and CCSL, carrying the information not only of the lenses (dotted points) but also of the sources. In the lower part of the figure is shown, for comparison, the redshift distribution of the main cosmological probes, namely baryon acoustic oscillations (BAO), supernovae (SNe), and cosmic microwave background (CMB)

Table 10 Summary table of the emerging cosmological probes, highlighting for each one what observable they are constraining, comparing their strengths and weaknesses
Table 11 Current maturity and constraining power of the emerging cosmological probes, with expected timescales for development

Beyond a different redshift distribution, each probe has its own strengths and weaknesses: in Table 10 we summarize them, presenting also which quantity they are primarily constraining. From the table, it is evident their wide diversity. In the first place, we have that SBF and stellar ages, as also highlighted in Fig. 52, are mostly analyzing very local samples, and as a consequence will be in particular relevant in constraining local cosmological parameters. The advantage is that they require no cosmology-dependent calibrations, being either based on the direct estimate of the stellar age, or on calibration on other observables, with a very small scatter. They represent ideal methods to obtain complementary local estimate of the Hubble constant \(H_0\) and the age of the Universe \(t_U\). Similarly, standard sirens provide a direct and cosmology-independent estimate of the luminosity distance of sources detected through GWs that can, when a counterpart is identified directly (bright sirens) or statistically (dark sirens), lead to another direct and independent measurement of \(H_0\); moreover, as described in Sect. 3.4, future observing runs and GW observatory will allow also a measurement of the Hubble parameter H(z) with an accuracy comparable to, and competitive with, other methods.

Moving at higher redshifts, it is better to keep into account also the redshift dependence of the cosmological components to better comprehend the strengths of the various methods. The dark energy component, in particular, dominates at smaller redshifts, \(z\lesssim 0.5\), while at larger redshift the contribution of the matter, or of an evolving dark energy component, starts to become more significant. From this point of view, GRB and QSO represent optimal expansions to the Hubble diagram with respect to SNe, being able to measure the luminosity distance up to \(z\sim 8-12\), providing ideal samples to test possible deviations from a standard \(\Lambda \)CDM model.

On the other hand, the strength of CC, RD and CSC is to provide multiple cosmology-independent estimate of H(z) (or E(z)), that can constrain the expansion history of the Universe up to \(z\sim 2\) with minimal assumptions, not needing to assume a specific background model (as done, e.g., for SNe or BAO).

Fig. 53
figure 53

Forecasts of the Hubble parameter H(z) with future measurements from cosmic chronometers (CC), standard sirens (SS), clustering of standard candles (CSC), and redshift drift (RD), as discussed in the corresponding sections. The colored dotted points show the forecasts, with bands helping the visualization

Figure 53 shows a comparison of the forecasts of future H(z) measurements that can be obtained with these cosmological probes, as presented in Sects. 3.1, 3.4, 3.11, and 3.12. It is interesting to notice that in the future in the range \(0<z<1\) we will have several method that will provide a percent or sub-percent accuracy measurement of the Hubble parameter in a cosmology-independent way. This will be crucial, because having multiple independent probes will allow us to check for consistency and keep systematic errors under control. At the same time, such an accuracy over a wide redshift range will provide an ideal dataset to really test a vast range of cosmological models and constrain the components of our Universe. For example, the combination of the different strengths of the various methods will play a fundamental role also to test deviations from a standard FLRW metric (see,e.g., Räsänen et al. 2015; Cao et al. 2019), to measure cosmological parameters in non-standard models (D’Agostino and Nunes 2019, 2020), and to explore trends in redshift and angular direction possibly providing hints to address the Hubble tension (e.g., Krishnan et al. 2020; Dainotti et al. 2022b). To complement the information of Table 10, in Table 11 we also summarize for each probe the foreseen timescale for a development of the method, highlighting the current or future survey expected to provide a significant improvement in terms of statistic or methodological advance and the expected time frame, the constraining power of each probe in the cosmological parameter they are mainly measuring, and their current maturity.

Fig. 54
figure 54

Current constraints on cosmological parameters from the various cosmological probes covered in this review, namely cosmic chronometers (CC, this paper), quasars (QSO, Lusso et al. 2020), standard sirens (SS Abbott et al. 2017d, a), time-delay cosmography (TDC, Birrer et al. 2020), surface brightness fluctuations (SBF, Blakeslee et al. 2021), cosmic voids (CV, Hamaus et al. 2020), cosmography with cluster strong lensing (CCSL, SN Refsdal case Grillo et al. 2020), gamma-ray bursts (GRB “Amati” relation Amati et al. 2019, an updated sample of 212 objects), and stellar ages (SA, Jimenez et al. 2019). The figure shows the contour plot in the \(H_0\)-\(\varOmega _{\mathrm{m}}\) plane for a flat \(\Lambda \)CDM cosmology, with their marginalized projection; the darker and lighter contours show the 68% and 95% confidence levels, respectively. In the case of QSO, as discussed in Sect. 3.2, also information from SNe Ia have been added to normalize the Hubble diagram; for SA, a Gaussian prior \(\varOmega _{\mathrm{m}}\)=\(0.3\pm 0.02\) is assumed (Jimenez et al. 2019). The dashed lines indicate, for illustrative purposes, the values \(H_0\)=70 \(\mathrm {km\, s^{-1}\ Mpc^{-1}}\)and \(\varOmega _{\mathrm{m}}\)=0.3

To conclude, in Fig. 54 are shown, on a common plane, the current constraints achieved from the various cosmological probes discussed in this review. Since, as discussed above, different probes are sensitive to different parameters, we decided to explore a parameter space that maximizes the number of probes available, in particular the constraints that can be obtained in a flat \(\Lambda \)CDM cosmology where the parameters free to vary are the Hubble constant \(H_0\) and \(\varOmega _{\mathrm{m}}\). In this plane, it is possible to fully explore the complementarity and synergy between the various emerging cosmological probes. First of all, we notice that, as also discussed previously, there are methods providing a constraint only on one of the two parameters. This is the case of CV and QSO, and of SBF, SS, and partially SA, that, as discussed in the previous sections, cannot measure \(H_0\)  and \(\varOmega _{\mathrm{m}}\), respectively. It is useful to underline here that, in the case of QSO, we have included in the constraints also data from SNe, as discussed in Sect. 3.2, and that in the case of SA we have assumed a Gaussian prior on \(\varOmega _{\mathrm{m}}\)=\(0.30\pm 0.02\) as in Jimenez et al. (2019). On the other hand, there are probes that are sensitive to both \(H_0\) and \(\varOmega _{\mathrm{m}}\), namely CC, TDC, CCSL, shown considering the SN Refsdal case (see Sect. 3.6 and Grillo et al. 2020), and GRB, here exploited with the “Amati relation” approach (see Sect. 3.3).

Two points are worth underlining here. The first one is that all probes, despite different accuracies, are converging on a common part of the \(H_0\)-\(\varOmega _{\mathrm{m}}\) plane. Given the extreme diversity between the methods considered, this is very relevant because it builds up the possibility of combining different probes to improve the accuracy on the estimated parameters. Such combinations are, at the moment, beyond the scope of this review, because it requires one to carefully address all possible systematics and covariances between the various probes, but Fig. 54 appears extremely promising. The second one is that the various probes present also a significant degree of orthogonality, due to the different sensitivities discussed above. This has been proven to be extremely important in the past, where the extreme accuracy reached by the main probes was mainly based on the orthogonality between the constraints from SNe, BAO and CMB (see, e.g., Scolnic et al. 2018). Finding a similar level of complementarity also between the new emerging probes represents a good omen toward the use of these new methods in modern cosmology, to better constrain cosmological parameters, provide additional evidences to help solve current tensions, keep under control systematic effects of both the main and the new probes, and, potentially, discover new physics.

5 Summary and conclusions

In this article, we have reviewed the new emerging cosmological probes that are contributing (and are expected to contribute in the near future) to modern cosmology. In particular, we have discussed cosmic chronometers, quasars, gamma-ray bursts, gravitational waves used as standard sirens, time-delay cosmography, cosmography with cluster strong lensing, cosmic voids, neutral hydrogen intensity mapping, surface brightness fluctuations, the ages of the oldest stellar objects, secular redshift drift, and clustering of standard candles. We presented, for each cosmological probe, the main equations involved in the method, how a sample can be selected and the method applied, reviewed the main results and expected forecasts, and discussed the systematics involved, showing also possible paths on how to mitigate or minimize those.

These emerging cosmological probes represent a valuable resource for the next years, since they could allow us to go beyond the main cosmological probes currently exploited (SNe, BAO, CMB, weak lensing). In particular, they will provide crucial additional information to check for possible systematics in current analyses, increase the number of independent measurements of cosmological probes, and give new hints to address the current tensions in cosmology, possibly strengthening the need for new physics (see, e.g. Fig. 54 and Tables 10, 11). As also shown in Fig. 53, these probes will represent also an important dataset in the future to obtain constraints on the expansion history of the Universe at the percent precision independently of assumptions on a particular cosmological model, being ideal complementary probes to the excellent results we are obtaining from the other main probes. exploitation of new and complementary cosmological probes will be fundamental also in view of the new surveys and missions that are currently undergoing or planned, such as SDSS BOSS Data Release 16 (Ahumada et al. 2020), DESI (DESI Collaboration et al. 2016), Gaia (Gaia Collaboration et al. 2016), JWST (Gardner et al. 2006), Euclid (Laureijs et al. 2011), PFS (Takada et al. 2014), the Nancy Grace Roman Space Telescope (Spergel et al. 2015), the LSST on Vera Rubin Observatory (LSST Science Collaboration et al. 2009), the next LIGO-Virgo-KAGRA observing runs (LIGO Scientific Collaboration et al. 2015; Acernese et al. 2015; Akutsu et al. 2020) and future GW experiments like Cosmic Explorer (Reitze et al. 2019) and the Einstein Telescope (Punturo et al. 2010), the MIGHTEE survey (Paul et al. 2021; Chen et al. 2021b), ASKAP (Wolz et al. 2017a), SPHEREx (Doré et al. 2014), and the ATLAS mission (Wang et al. 2019).

As a final note, we acknowledge that other alternative probes are also available and could provide valuable information in the future, including fast radio bursts (FRB, Jaroszynski 2019; Wucknitz et al. 2021), HII galaxies (Terlevich et al. 2015), black hole shadows (Tsupko et al. 2020; Vagnozzi et al. 2020; Perlick and Tsupko 2022; Renzi and Martinelli 2022), Type II SNe (de Jaeger et al. 2020) and SNe Ia lensing (Quartin et al. 2014; Castro and Quartin 2014; Scovacricchi et al. 2017; Zumalacarregui and Seljak 2018). While we are currently not including those in this review because they are not at the same level of maturity of the discussed probes, we are looking forward to them for possible applications in the future.