1 Introduction

The quest for an in-depth understanding of hadrons and their interactions dates back to the mid-20\({\mathrm{th}}\) century. Progress was rapidly made in both the experimental and theoretical frontiers, particularly after the formulation of QCD, which led to the systematic and quantitative study of hadrons and their structure.

The QCD factorization formalism developed for collider processes has been the foundation for understanding hadrons in terms of their partonic content [1, 2]. Such a mapping of the hadron structure is achieved via a set of quantities, that is, the parton distribution functions (PDFs) [2], the generalized parton distributions (GPDs) [3,4,5], and the transverse-momentum-dependent distributions (TMD PDFs) [6, 7]. These distributions are number densities and have a probabilistic interpretation within the parton model. However, they are not directly measurable in experiments. The mechanism that enables access to distribution functions, and therefore to the structure of hadrons, is the asymptotic freedom of QCD. By virtue of asymptotic freedom, the cross-section of high-energy processes can be factorized into a hard part and a soft part. For example, the cross-sections for inclusive unpolarized Deep Inelastic Scattering (DIS) can be written as

$$\begin{aligned} \sigma _{\mathrm{DIS}}(x,Q^2)= & {} \sum _i \big [ H_{\mathrm{DIS}}^i \otimes f_i \big ](x,Q^2), \end{aligned}$$
(1)
$$\begin{aligned} \left[ a \otimes b\right] (x)\equiv & {} \int _x^1 \frac{d\xi }{\xi }\, a\left( \frac{x}{\xi }\right) \, b(\xi ) , \end{aligned}$$
(2)

where i represents all types of partons, that is, quarks, anti-quarks, and gluons. x is the Bjorken scaling variable, and \(Q^2\) represents the scale of the hard interaction. \(H_{\mathrm{DIS}}\) is the hard part, which is process-dependent and calculable in perturbative QCD. \(f_i\) is the non-perturbative part of the cross-section, characterizing the structure of hadrons. In the case of the unpolarized DIS, the relevant distribution function is the spin-averaged (or unpolarized) PDF for the \(i{\mathrm{th}}\) type of partons. Typically, both functions \(H_{\mathrm{DIS}}\) and \(f_i\) are given in the \(\overline{\mathrm{MS}}\) scheme. Unlike H, the distribution functions are universal. Also, the dependence of the distribution functions on the factorization scale can be effectively eliminated using the evolution equations, resulting in process-independent functions. Furthermore, the dependence on the renormalization scale does not pose difficulties, as the data sets can be evolved to the same scale using perturbation theory. Therefore, data sets from different processes can be analyzed together within the framework of a global analysis. Factorization expressions similar to Eq. (1) exist for other high-energy processes, such as polarized DIS and Drell-Yan. Their corresponding H function is known up to next-to-leading order (NLO) or next-to-next-to-leading order (NNLO), depending on the process.

The PDFs, GPDs, and TMD PDFs are light-cone correlation functions and cannot be accessed from the Euclidean formulation of lattice QCD. Partial information is obtained through the Mellin moments of distribution functions, and, in principle, one can reconstruct the parton distributions using an operator product expansion (OPE). Practically, a proper and exact reconstruction is an impossible task due to the computational challenges to calculate reliably high moments: the signal-to-noise rapidly decreases, and an unavoidable power-law mixing occurs beyond the third non-trivial moment (see, e.g., Refs. [8,9,10,11,12]).

Alternative approaches to Mellin moments have been pursued, starting with a method based on the hadronic tensor [13,14,15] already in the 1990s. Other proposals include auxiliary quark field approaches [16, 17], as well as methods to obtain high Mellin moments using smeared operators [18]. The development of the quasi-PDFs approach using Large-Momentum Effective Theory (LaMET) has been a pivot point for calculating x-dependent quantities from lattice QCD [19,20,21]. This approach created an intense enthusiasm in both the phenomenological and lattice communities, which led to other proposals, such as the current–current correlators approach [22,23,24], the pseudo-PDFs [25], and a method based on OPE [26].

This article reviews lattice calculations on extracting the x-dependence of distribution functions utilizing some of the approaches mentioned above, for which there has been recent progress. Some of the calculations prior to 2019 can be found in the proceedings of the Lattice Symposium 2018 [27]. An extensive review of the first calculations up to 2018 can be found in Ref. [28]. The main focus of this review is the presentation of physical results that are also interesting to the broader theoretical community. Therefore, the technical details and heavy equations are avoided to increase readability. However, selected critical technical aspects that affect the reliability of lattice results are critically discussed. We present calculations that appeared in the literature up to October 1, 2020.

The main part of this article is organized in three sections. Section 2 presents calculations on PDFs, Sect. 3 on GPDs, and last but not least, Sect. 4 focuses on TMD PDFs. Finally, we provide closing remarks in Sect. 5.

2 PDFs

Most of the information on hadron structure comes from the intense effort to determine PDFs with quantifiable and reliable uncertainties. Among the distribution functions, PDFs are the easiest to obtain, as they are one-dimensional objects that depend only on the momentum fraction carried by the partons.

The PDFs are classified as unpolarized (spin-averaged), helicity (spin-dependent), and transversity (transverse-spin-dependent), and each can be obtained through the factorization of the cross-section from different high-energy processes. However, there is still missing information, either because it is challenging to extract certain PDFs or because some kinematic regions are not easily accessible. Therefore, lattice QCD input is crucial, and lattice results can either serve as predictions or as additional constraints on the phenomenological analyses. This section will discuss the current landscape of lattice results, highlighting their x-dependence in the presentation, rather than technical details.

2.1 Leading-twist PDFs

The non-perturbative part of the cross-section may be expanded in terms of the large energy scale of the process under study, Q. Taking as an example the unpolarized PDFs in Eq. (1), one can write the expansion in twist (mass dimension minus spin)

$$\begin{aligned} f_i = f_i^{(0)} + \frac{f_i^{(1)}}{Q} + \frac{f_i^{(2)}}{Q^2} \cdots , \end{aligned}$$
(3)

where \(f_i^{(0)}\) is the leading-twist (twist-2) contribution, while \(f_i^{(1)}\) and \(f_i^{(2)}\) are the twist-3 and twist-4 contributions, respectively. The leading-twist contributions have probabilistic interpretation, while the higher-twist are sensitive to the soft dynamics. In most of the studies, the main focus is on leading twist contributions, which are the easiest to isolate. While the higher-twist corrections are less-studied, there is a growing interest including lattice calculations (see, also, Sect. 2.4).

Progress has been made in several directions, and some will not be covered in this article due to space limitations. This includes, but is not limited to, the kaon distribution amplitude (DA) [29], the feasibility of extracting the x-dendendence of PDFs for the \(\Delta ^+\) baryon [30], and exploration of machine-learning methods for the pion and kaon DAs [31]. We also refer the Reader to a series of papers on moments of distribution amplitudes for the kaon [32], the lowest-lying baryon octet [33], and the double parton distributions of the pion [34]. Preliminary results on the second moment of the pion DA using heavy quark OPE can be found in the proceedings of Ref. [35].

The main effort of lattice calculations up to 2018 was to obtain x-dependent PDFs directly at the physical point. Once that was achieved, a shift towards understanding systematic uncertainties and using the methodology to extract more complicated quantities has been observed. This change of what is considered a priority in the field is reflected in this review.

2.1.1 quasi-PDFs framework

The rapid progress in the field of x-dependent distributions from lattice QCD, became possible with the pioneering work of Ji [19] on a new approach to extract PDFs from a Euclidean lattice. While this is not the first approach to be considered, it clearly renewed the interest in that direction, and marked the beginning for a new era in lattice calculations. This approach, the so-called quasi-distributions approach, demonstrated that matrix elements of fast-moving hadrons coupled with bilinear non-local operators can be related to light-cone distributions, via a multi-step process. As an example, we write the matrix element in the forward limit which leads to PDFs, that is

$$\begin{aligned} \mathcal{M}(P_3,z,\mu ) = Z_\Gamma (z,\mu ) \langle H(P_3) | {\bar{\psi }}(z)\Gamma W(z) \psi (0) |H(P_3) \rangle , \nonumber \!\!\!\!\!\\ \end{aligned}$$
(4)

where the hadron is boosted with momentum \(P_3\) chosen to be in the z direction. The operator contains a Wilson line of length z parallel to \(P_3\), and has Dirac structure \(\Gamma =\gamma ^0,\,\gamma ^3\gamma ^5,\,\sigma ^{3j}\) (\(j=1,2\)), for the unpolarized,Footnote 1 helicity and transversity PDFs, respectively. \(Z_\Gamma (z,\mu )\) is the renormalization function defined at renormalization scale \(\mu \). \(Z_\Gamma (z,\mu )\) is calculated at each z separately, and is scheme- and scale-dependence (except for the vector and axial currents at \(z=0\)). The quasi-PDFs are defined as the Fourier transform (FT) of \(\mathcal{M}(P_3,z,\mu )\)

$$\begin{aligned} {\tilde{q}} (x, P_3,\mu ) = 2 P_3 \int _{-\infty }^\infty \frac{dz}{4\pi } e^{-i z x P_3} \mathcal{M}(P_3,z,\mu ). \end{aligned}$$
(5)

The approach relies on the fact that the difference between quasi-PDFs and light-cone PDFs is nonzero only in the ultra-violet (UV) region, and can be calculated in continuum perturbation theory. For large, but finite \(P_3\), the quasi-PDFs can be matched to their light-cone counterparts using a kernel calculated perturbatively in Large Momentum Effective Theory (LaMET) [20], that is

$$\begin{aligned} {{\widetilde{q}}}_\Gamma (x,P_3,\mu ) = \int _{-1}^{1} \frac{d\xi }{|\xi |}\, C_\Gamma \Big ( \frac{x}{\xi }, \frac{\mu }{\xi P_3} \Big )\, q(\xi ,\mu ) {+} {\mathcal {O}}\bigg (\frac{m^2}{P_3^2},\frac{\Lambda _{\mathrm{QCD}}^2}{x^2P_3^2}\bigg ).\nonumber \\ \end{aligned}$$
(6)

Following the first calculations exploring the feasibility of the quasi-PDFs framework [37,38,39,40,41], x-dependent PDFs calculated directly at the physical point have been obtained [42,43,44]. Several factors contributed to this success, such as the use of momentum smearing [45], which offers better overlaps with the ground state. Another important development was the proposalFootnote 2 for a complete renormalization scheme [36, 47], which was complemented by the proof of renormalizability of non-local operators to all orders in perturbation theory [48,49,50,51,52,53]. Last but not least, a well-defined matching prescription appropriate for the various renormalization schemes has been developed [54,55,56,57,58,59]. All this progress and the ability to calculate PDFs at the physical point led to the need to refine the lattice estimates, with in-depth studies of various sources of systematic uncertainties.

A thorough investigation of systematic uncertainties in the unpolarized, helicity, and transversity PDFs is presented in Ref. [59]. The focus is on selected sources of systematic effects, as only one ensemble is used. The analysis is done using an \(N_f=2\) twisted-mass fermions ensemble with physical pion mass and spatial extent of 4.5 fm (\(48^3\times 96\)). Using four values of the source-sink separation, \(T_{\mathrm{sink}}=\) 0.75–1.13 fm, excited-states effects are addressed with single-state fits, two-state fits, and the summation method. This analysis concludes that lattice data at \(T_{\mathrm{sink}}\) below 1 fm are insufficient to suppress excited-states, particularly as \(P_3\) increases. Another effect that requires attention for matrix elements of non-local operators is finite-volume effects [60]. Direct study of volume effects is not possible with one ensemble, but one can examine whether the renormalization functions, Z, exhibit volume effects if obtained using multiple ensembles. In the work of Ref. [59], the estimates of Z are obtained using ensembles with different volume, that is, 2.25 fm, 4.5 fm, and 6 fm. It is found that volume effects are small and do not affect the final estimates of the PDFs. However, a proper study of volume effects on the matrix elements should be done, ideally at the physical point. Two other sources of uncertainties are presented: the reconstruction of the x-dependence via a FT and the matching procedure. For the former, the standard discretized FT and the “derivative method” [61] are tested. While both methods suffer from the ill-defined inverse problem, the derivative method leads to uncontrolled uncertainties in the small-x region (see discussion in Sect. 2.1.3). Regarding the truncation of the FT with respect to z, an optimal value of \(z_{\mathrm{max}}\) requires that both the real and imaginary parts of the matrix elements are zero. In practice, this is not always observed, and the behavior of the matrix element in the large-z is operator-dependent. The final choices for \(z_{\mathrm{max}}\) in Ref. [59] are 0.94 fm for the unpolarized and 1.13 fm for the helicity and transversity. Different prescriptions for the matching are employed, with the quasi-PDFs defined in the \(\overline{\mathrm{MS}}\), RI, \(\mathrm{M}\overline{\mathrm{MS}}\) and ratio scheme. Tension is observed between some of the methods in the small- and large-x regions, as expected. The final PDFs using \(P_3=1.38\) GeV are shown in Fig. 1.

Fig. 1
figure 1

The unpolarized (top), helicity (center) and transversity (bottom) PDFs for the nucleon at the physical point and \(P_3 = 1.38\) GeV (blue curve). The global fits of Refs. [62,63,64] (unpolarized), Refs. [65,66,67] (helicity), Ref. [68] (transversity) are shown for qualitative comparison. The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 2 GeV (\(\sqrt{2}\) GeV) for the unpolarized and helicity (transversity) PDF. Source: Ref. [59]. Article published under the terms of the Creative Commons Attribution 4.0 International license

Phenomenological estimates are also shown for qualitative comparison. Focusing on the positive x-region, the lattice unpolarized PDF lies above the global fits, while there is agreement in the helicity PDF for \(x\lesssim \) 0.4–0.5. The mild rise in the large x-region, could be due to the limitations of the FT, and advanced reconstruction methods must be investigated. Finally, the transversity PDF for \(x\lesssim \) 0.4–0.5 is in agreement with the SIDIS results with or without the tensor charge constraint. The rise in the large-x region is observed on all PDFs. The analysis of Ref. [59] provide insight on selected sources of systematic uncertainties using simulations at the physical point. The bands in Fig. 1 include only statistical uncertainties, as the analysis confirms that the effects are either negligible, e.g., in Z, or require further investigations, such as, advanced reconstruction techniques.

Finite-volume effects have been studied for the unpolarized and helicity PDFs in Ref. [69]. The calculation uses a mixed-action setup on three \(N_f=2+1+1\) HISQ ensembles, with clover valence fermions. The ensembles’ pion mass is 220 MeV, and the lattices have spatial extent \(L=2.88\), 3.84, and 4.8 fm. The momentum boost used is 1.3 for all three ensembles and 2.6 GeV for the smallest- and largest-volume ensemble. Excited-states effects are quantified using four \(T_{\mathrm{sink}}\) values between 0.72 and 1.08 fm, and the final results are presented using two-state fit. The matrix elements at the same \(P_3\) from different ensembles, are found to be compatible, indicating that volume-effects are not significant for \(m_{\mathrm{val}} L \in [3.3 - 5.5]\). Figure 2 shows the unpolarized and helicity PDFs using the \(L=2.88\) fm and \(L=4.8\) fm. For both cases, the curves are compatible for all regions of x. This is in agreement with the findings of Ref. [59], where the volume effects were tested on the renormalization functions and found small.

Fig. 2
figure 2

Unpolarized (top) and helicity (bottom) PDFs for the \(L=2.88\) fm (red) and \(L=4.8\) fm ensemble. Source: Ref. [69]. Reprinted with permission by the American Physical Society and SciPri

Reference [70] presents an analysis of systematic uncertainties also on a mixed action setup of clover fermions in the valence sector on \(N_f=2+1+1\) HISQ fermions with pion mass 310 MeV. The lattice spacing is 0.042 fm and \(L=2.7\) fm. While the physical volume of the ensemble is small, it is expected to have reduced discretization effects. As discussed above, there are indications that volume effects are not a major source of systematic uncertainties, at least for simulations at heavier pion mass. The 2pt-functions are calculated at seven values of \(P_3\) between 0 and 2.77 GeV to understand the effect of excited-states contamination, using up to three-state fits. As \(P_3\) increases, the fits become less reliable, and priors are needed to increase stability. For the 3pt-point functions, only \(P_3=1.84,\,2.31\) GeV are used and three values of \(T_{\mathrm{sink}}= 0.67 - 0.84\) fm. A 3-, 4-, 5-parameter two-state fit is applied, with the difference in the energy between the ground-state and first-excited state determined independently from the 2pt-functions fits. An agreement between the three fits is found; however, the matrix elements from the 5-parameter fit are a factor of two more noisy.

Fig. 3
figure 3

The renormalized matrix elements for the unpolarized (top) and helicity (bottom) case for \(P_3=2.31\) GeV. The real and imaginary parts are shown with red and blue points, respectively. The global fits of NNPDF31 [63] and CT18 [71] are shown for the unpolarized, and NNNPDF1.1pol [66] and JAM17 [67] for the helicity. The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 4 GeV. Source: Ref. [70]. Article published under the terms of the Creative Commons Attribution 4.0 International license

The renormalized matrix elements are compared in the coordinate space with the estimates from global analyses, as shown in Fig. 3. The real part of the unpolarized case shows agreement for all z values shown. However, the errors grow above 50\(\%\) at around \(z\sim \,0.6\) fm. This leads to unreliable comparison if points beyond 0.6 fm are included. The imaginary part has agreement up to \(z\sim \,0.2\) fm, with large uncertainties for large z values. The real part of the helicity matrix element agrees with global fits up to \(z\sim \,0.3\) fm, while the imaginary part up to \(z\sim \, 0.2\) fm. The large uncertainties and the long tails in the lattice data for \(z>0.6\) fm reveal the challenges of the inverse problems, as will be discussed below. Implementing a ratio renormalization prescription (pseudo-ITDs), the first moments of the PDFs are determined. More details can be found in Ref. [70].

The pion valence PDF of Ref. [72] follows the quasi-PDFs approach using a mixed action of clover fermions in the valence sector and \(N_f=2+1\) HISQ fermions with pion mass 300 MeV. The volume is \(48^3\times 64\), corresponding to a spatial extent of 2.9 fm (\(a=0.06\) fm). Three values of \(P_3\) are used, namely, 0.86, 1.29, and 1.72 GeV, and there is agreement between \(P_3=1.72\) GeV and the phenomenological fits of the JAM Collaboration [73]. Results from Ref. [72] are included in Fig. 9.

Fig. 4
figure 4

Pion valence PDF extracted using ensembles with \(a=0.06\) fm (red) and \(a=0.04\) fm (blue). The statistical (systematic) error is shown with dark (light) color band. The FNAL E−0615 [74] (green points), ASV [75] (green dashed line), JAM [73] (black band) and xFitter analysis [76] (purple line) are also shown. Insets: \(x f_v^\pi (x)\). The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 3.2 GeV. Source: Ref. [77]. Article published under the terms of the Creative Commons Attribution 4.0 International license

The calculation is extended in Ref. [77], in which the statistics is increased, and another ensemble with \(a=0.04\) fm and spatial extent 2.6 fm is added. The pseudo-ITD approach (see Sect. 2.1.2) is also explored. To this end, several values of \(P_3\) are used, up to 2.15 GeV for the \(a=0.06\) fm ensemble, and 2.42 GeV for the 0.04 fm ensemble. Systematic uncertainties related to excited-states contamination using \(T_{\mathrm{sink}}\) up to 0.72 fm, have been studied. Other systematic uncertainties are investigated related to renormalization, higher-twist effects for low momenta, and study of the functional form in the large-x region, using data with z up to 1 fm. The large-x region is of particular interest, as models predict different behavior of the form \((1-x)\), or \((1-x)^2\). The valence pion PDF is shown in Fig. 4 for the two ensembles. The comparison of the lattice data with other predictions and experimental data shows that there is agreement in the intermediate-x region. For \(x>0.7\), the \(a=0.06\) fm data are in agreement with the FNAL estimates [74], when systematic uncertainties are included to the lattice data. Also, it is found that fits of the form \((1-x)^\beta \) applied to the lattice data finds \(\beta \) anywhere between [1–2], and therefore, there is no clear preference in one of the two functional forms proposed. Comparison with other lattice calculations, reveals that the large-x as obtained from Ref. [77] is in agreement with Ref. [72], but disagrees with the results of Refs. [78, 79] (see Fig. 9).

Other calculations on quasi-PDFs and comparison with alternative approaches are presented in Sect. 2.1.2 for the nucleon and Sect. 2.1.3 for the pion.

2.1.2 pseudo-ITD framework

An alternative framework to quasi-PDFs is the pseudo-ITDsFootnote 3 developed by Radyushkin [25, 80,81,82,83,84,85,86]. The common feature with quasi-PDFs is the use of the same matrix elements \({\mathcal {M}}\) of boosted hadrons coupled to non-local operators (Eq. 4). In the pseudo-ITDs approach, the analysis uses the Lorentz invariants, \(z^2\) and \(\nu \equiv -p\cdot z\), instead of z and \(P_3\). The variable \(\nu \) is historically called Ioffe time [87, 88].Footnote 4 In Euclidean space and based on the conventions we use here, \(\nu =z P_3\). \({\mathcal {M}}(\nu ,z^2)\) are called Ioffe-time pseudo-distributions (pseudo-ITDs). One of the main differences between the quasi- and pseudo-distributions formulations is that the former is the Fourier transform of \({\mathcal {M}}(P_3,z)\) in z, while the latter is the Fourier transform of \({\mathcal {M}}(\nu ,z^2)\) in \(\nu \). Similar to quasi-distributions, the pseudo-distributions approach relies on the factorization of pseudo-ITDs obtained on the lattice to extract the light-cone ITDs, using a matching kernel calculable in perturbative QCD. The necessary matching is performed at the level of ITDs [56, 83, 84, 89] in coordinate space, while in the quasi-PDFs approach is done in the momentum space.

The analysis within the pseudo-ITDs framework is based on a ratio of matrix elements, namely

$$\begin{aligned} {\mathfrak {M}}(\nu ,z^2) = \frac{{\mathcal {M}}(\nu ,z^2)/{\mathcal {M}}(\nu ,0)}{{\mathcal {M}}(0,z^2)/{\mathcal {M}}(0,0)}, \end{aligned}$$
(7)

which are called reduced Ioffe time pseudo-distribution (pseudo-ITD). Such a ratio is a convenient choice and defines a gauge-invariant renormalization scheme. The renormalization scale in pseudo-ITD is set by 1/z. This consists a major practical difference with the renormalization prescription of quasi-PDFs, which is based on RI\('\)-type schemes [36, 47]. \({\mathfrak {M}}(\nu ,z^2)\) also reduces the higher-twist contamination, as discussed in Ref. [90]. The success of the approach is based on calculating \({\mathfrak {M}}(\nu ,z^2)\) for a wide range of \(\nu \), including small-\(\nu \) values, by varying either \(P_3\), or z, or both. Therefore, physical information is included in matrix elements from low values of \(P_3\). This is in contrast with the quasi-PDFs, where matrix elements at different \(P_3\) are not analyzed together.

The \({\mathfrak {M}}(\nu ,z^2)\) are matched to light-cone ITDs, \(Q(\nu ,\mu ^2)\) using a matching procedure, and at the same time are converted to the \(\overline{\mathrm{MS}}\) and evolved to a common scale, \(\mu =1/z'\) [56, 83, 84, 89]. Finally, the light-cone PDFs are related to \(Q(\nu ,\mu ^2)\) via a Fourier transform in Ioffe time.

$$\begin{aligned} Q(\nu ,\mu ^2) =\int _{-1}^1 dx \, e^{i\nu x} q(x,\mu ^2), \end{aligned}$$
(8)

Note that this is also subject to the ill-defined inverse problem. One can utilize the crossing symmetries of the distributions, which for the unpolarized case, q(x), reads \({\bar{q}}(x)=-q(-x)\), where \(x>0\). The real part of Eq. (8),

$$\begin{aligned} \mathrm{Re}[Q(\nu ,\mu ^2)] =\int _0^1 dx \cos (\nu x) q_v(x,\mu ^2), \end{aligned}$$
(9)

is related to the valence distribution, \(q_v=q-{\bar{q}}\), while the imaginary part,

$$\begin{aligned} \mathrm{Im}[Q(\nu ,\mu ^2)]= \int _0^1 dx \sin (\nu x) q_{v2s}(x,\mu ^2), \end{aligned}$$
(10)

to \(q_{v2s}\equiv q_v+2{\bar{q}}=q+{\bar{q}}\), which is the non-singlet distribution involving two flavors. Therefore, the full and sea-quarkFootnote 5 distributions can be extracted by combining the two equations. Whether this is possible in practice, depends on the statistical precision of the lattice data. As we will see below, most of the calculations focus on the valence distributions, namely, the real part of \(Q(\nu ,\mu ^2)\).

The pseudo-ITDs approach has been studied for the unpolarized PDF for the pion and the nucleon in a series of publications, with the more recent ones presented in Refs. [78, 91, 92]. Also, Ref. [93] discusses the challenges of reconstructing the light-cone distributions from lattice data due to the ill-defined inverse problem and the need for advanced reconstruction methods. This is relevant for all methods discussed in this article. Here, we focus on the results from Ref. [92] on the nucleon valence unpolarized PDF, which uses three ensembles with pion mass 172, 278, and 358 MeV to perform an extrapolation to the physical point. Using the 172 MeV ensemble, they calculate the first two moments of PDFs and pseudo-ITDs. The results for \(\langle x \rangle \) from pseudo-ITDs and PDFs are in agreement with phenomenological estimates [63, 64, 94], while there is some tension for \(\langle x^2 \rangle \) using the PDFs lattice estimates for \(z^2>4 a^2\). A better agreement is observed for the pseudo-ITDs estimates and NNPDF31 [63]. It was checked that higher-twist effects are negligible in accordance with Ref. [90], as there is no dependence of the moments on \(z^2\) within the reported uncertainties. The pion mass dependence between the 172 and 278 MeV is found to be within uncertainties. The same holds for the 358 MeV ensemble, which is, however, much noisier. Therefore, non-negligible pion mass dependence between data at the physical point and pion mass above 300–350 MeV is not excluded. Such a dependence was found in Ref. [42]. The final results use a linear extrapolation applied to the lightest two ensembles and is compared to the phenomenological fits in Fig. 5. The lattice data are in agreement with the determinations of Refs. [63, 64, 94] up to \(x \sim \, 0.25\). Beyond that, the lattice results overestimate the global fits, which is consistent with the tension observed in \(\langle x^2 \rangle \). This is because the numerical values of high moments receive important contributions from the large-x region. An in-depth investigation of systematic uncertainties is necessary to address the tension, such as discretization effects and volume effects, which is a multi-year effort and requires multiple ensembles.

Fig. 5
figure 5

The nucleon valence distribution compared to phenomenological determinations from the NLO global fit CJ15 [64] (green), and the NNLO global fits MSTW2008 [94] (red) and NNPDF31 [63] (blue). The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 2 GeV. Source: Ref. [92]. Article published under the terms of the Creative Commons Attribution 4.0 International license

Another recent calculation of the isovector unpolarized PDF using the pseudo-ITDs approach is presented in Ref. [95]. For the first time, the full PDF, \(q(x)=q_v(x)+{\bar{q}}(x)\), and sea-quark PDF, \(q_s(x)={\bar{q}}(x)\), are accessed through the calculation of both the valence, \(q_v\), and the combination \(q_v+2{\bar{q}}\). The latter is obtained from Eq. (10), which is more difficult to extract, as one requires control over statistical uncertainties. One ensemble at the physical point is used, with \(N_f=2\) twisted-mass fermions and lattice extent 4.5 fm (\(48^3\times 96\)). Six values of \(P_3\) are used, \(P_3=0 - 1.38\) GeV, combined with all possible values of z, so that \(\nu _{\mathrm{max}}\sim 8\). Based on the scale dependence of the reduced pseudo-ITD, it is concluded that z values up to \(8 a=0.75\) fm are safe to be used, as the effect is within the reported uncertainties. An extensive investigation is performed on identifying the optimal maximum value of \(\nu \) used to extract the final PDFs. It is found that \(\nu _\mathrm{max} = 5.2\) (\(z_{\mathrm{max}}=0.75\) fm) is optimal. One factor to consider is the number of data entering the reconstruction of PDFs given in Eq. (8). One seeks reliable and statistically meaningful results. Three approaches are explored and compared, namely, a discretized FT, the Backus-Gilbert (BG) [96], and the fitting reconstruction (FR) of Ref. [91]. The latter is implemented using an ansatz with the proper small- and large-x behavior, inspired by the global fits of experimental data sets. One of the main differences between the discretized FT and the other methods is that the former underestimates the uncertainties. The uncertainties do not reflect the lack of data in certain regions of \(\nu \). On the contrary, the BG and FR have larger uncertainties, reflecting the challenges of the reconstruction and the amount of information in the lattice data. It is found that the fitting reconstruction performs better than the other approaches and is, thus, used in the final analysis, as shown in Fig. 6.

Fig. 6
figure 6

Lattice estimates for the unpolarized PDFs for the valence (\(q_v\); upper left), valence + 2 sea (\(q_{v2s}=q_v+2{\bar{q}}\); upper right), full (\(q=q_v+{\bar{q}}\); lower left) and sea (\({\bar{q}}\); lower right). The global fits of NNPDF [63] are shown with a grey band. The bands in the lattice data represent: the statistical uncertainty (purple), the combination of statistical and systematic due to the choice of \(\nu _{\mathrm{max}}\) and \(\alpha _s\) (blue), and the total error including also an estimate for the uncertainties related to cutoff effects, finite-volume effects, excited states contamination, truncation and higher-twist effects (cyan). The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 2 GeV. Source: Ref. [95]. Reprinted based on the arXiv distribution license

These data correspond to \(\nu _{\mathrm{max}}=5.2\) (\(z_{\mathrm{max}}=0.75\) fm). The lattice data include both statistical and systematic uncertainties. The various bands shown are due to varying \(\nu _\mathrm{max}\) and the strong coupling. A third band is due to a combined estimate of cutoff effects, finite-volume effects, excited states contamination, truncation, and higher-twist effects. The lattice results for \(q_v(x)\), \(q_{v2s}(x)\), q(x), and \({\bar{q}}(x)\) are shown and compared to the global fits of NNPDF [63]. A remarkable agreement is found for all distributions in all regions of x.

Fig. 7
figure 7

Lattice results for the unpolarized PDF using quasi-PDFs [59] (red band) and pseudo-ITDs from Ref. [92] (gray band) and Ref. [95] (blue band). The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 2 GeV. Source: Ref. [92]. Source: Ref. [92]. Reprinted with permission of the Authors

The unpolarized PDF is studied with the same ensemble at the physical point using the quasi-PDFs [59] and the pseudo-ITDs [95] formulation. A near-physical-point ensemble is also used in Ref. [92], which is included in the extrapolation to the physical point. These three results are compared in Fig. 7. It should be noted that the results of Ref. [59] use the standard discretized FT, while the results of Refs. [92, 95] are obtained with the fitting reconstruction. Also, the pseudo-ITDs analysis utilizes matrix elements at all values of \(P_3\), as they enter the reduced pseudo-ITD, and the analysis is in terms of \(\nu \). This fact, together with the fitting reconstruction, reduces uncertainties related to the ill-defined inversion problem. Despite the above differences, the results exhibit a very good agreement. The quasi-PDFs data have a mild oscillation between \(x=0.7\) and \(x=1\) and are slightly above the results of the other two calculations. This level of agreement suggests that systematic uncertainties related to the inverse problem, to \(P_3\) and the maximum value of \(z P_3\) are within the uncertainties. One can also address the concern on the renormalization prescription. As mentioned above, the scheme in pseudo-ITDs is gauge invariant and avoids the need for an additional renormalization function in an intermediate RI\('\)-type scheme prior the matching to the light-cone PDFs. Furthermore, there are indications that the RI-type scheme is meaningful in the small-z region, \(z/a=3 - 4\) [97]. For the values of the lattice spacing used in current calculations, it corresponds to the region \(z =\) 0.2–0.4 fm. Therefore, it is expected that there are systematic uncertainties related to the renormalization of quasi-PDFs. While this is valid, the agreement observed between the results using different lattice formulations and other approaches suggests that such effects are probably within the reported uncertainties.

2.1.3 Current–current correlators

Besides the matrix elements of bilinear non-local operators used in the quasi-PDFs and pseudo-ITDs methods, there is a more general set of hadronic matrix elements, the good “lattice cross-sections” (LCSs). The latter can be used to extract PDFs and eventually analyzed together, in an analogous way as the experimental data [22,23,24]. In such global fits of lattice data, systematic effects from different results may have less influence in the combined PDFs. However, this is an ideal scenario, and it is not clear how systematic uncertainties from each lattice data will affect the combined fits in practice. Good candidates to construct the set of LCSs share similar features: (a) are calculable in lattice QCD; (b) are renormalizable; (c) have the same and factorizable logarithmic collinear divergences as PDFs. As discussed in Ref. [24], both the quasi-PDFs, pseudo-ITDs fall under the LCSs category.

A good choice for the operator is a two-current local operator, \(N J_1(z)J_2(0)\), with the currents being space-like separated with distance z. Therefore, this operator is gauge invariant, without the need of a Wilson line. N is a dimensionful normalization factor. The matrix elements of such operators (current–current correlators) are renormalized by multiplying the two renormalization functions of the local currents, \(J_1\) and \(J_2\). Therefore, the renormalization procedure is easy and uses well-established techniques.

A recent work on the pion valence quark distribution using the current–current correlators method is presented in Ref. [98]. The calculation is performed on \(N_f=2+1\) clover fermion ensembles with three values of the pion mass 278, 358, 413 MeV, with two volumes for the heavy pion mass (3 fm and 4 fm). The vector and axial currents are used as \(J_1\) and \(J_2\), respectively. The momentum boost employed is between 0.41 and 1.65 GeV. Several values of the current separation length are used, up to \(z=6a=0.76\) fm. As discussed in Ref. [98], the signal decays significantly with increasing \(P_3\), particularly for \(m_\pi =268\) MeV. For example, for current separation of 3a, the errors are 3 times larger for \(P_3=1.65\) GeV as compared to \(P_3=0.41\) GeV. Consequently, the data for source-sink separation greater than \(12 a=1.5\) fm and \(P_3\ge 1.25\) GeV are not reliable due to the uncontrolled statistical uncertainties. This conclusion is compatible with the findings of other calculations (see, e.g., Sect. 5). Only the lattice data with a signal-to-noise ratio greater than 1 enter the excited-states analysis. The data from all ensembles are simultaneously analyzed to obtain the functional form of the ITDs. A chiral, continuum, volume, and higher-twist extrapolation are employed using the z-expansion, which is frequently used to fit the momentum transferred square, \(Q^2\), dependence of form factors. We note that three pion masses, two lattice spacings, and two volumes enter the fit, making the extraction of some of the fit parameters susceptible to systematic uncertainties. Only \(z<0.56\,\mathrm{fm} < 1/\Lambda _{\mathrm{QCD}}\) is used in the final analysis, for which the short-distance factorization seems to hold, and the higher-twist contributions are suppressed. However, a negligible dependence on z is found within the reported uncertainties, which is in agreement with the findings of Refs. [92, 95]. This corroborates the suggestion that the short-distance factorization (SDF) can be applied beyond 0.3–0.4 fm for the current data, as the corresponding systematic effects are within the uncertainties. Next-to-leading (NLO) order factorization is applied to the extrapolated current–current correlators, and are then parameterized to \(q_{\mathrm{v}}^\pi (x)\). Two sets of the parameters are identified as good estimates. The analysis shows that larger values of the Ioffe time are needed to constrain the fit parameters better. Given that the values of z must be in a perturbative region, one should increase \(P_3\) to achieve higher values of the Ioffe time. As already discussed, increasing \(P_3\) while statistical uncertainties are controlled is a major challenge of the lattice calculations.

Fig. 8
figure 8

Lattice results of pion \(xq^\pi _\mathrm{v}(x)\)-distribution using two parameterizations (cyan and red bands), shown together with the E615 data from Ref. [74] (blue band) and Ref. [99] using E615 re-scaled data [75]. The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of \(\sqrt{27}\) GeV. Source: Ref. [98]. Article published under the terms of the Creative Commons Attribution 4.0 International license

The lattice-extracted \(q^\pi _{\mathrm{v}}(x)\) using the two fits are in agreement within errors, as can be seen in Fig. 8. Given the current uncertainties related to the fit, one could consider as a final fit the combination of the two bands. The lattice data are compared with using E156 data [74, 75]. One of the motivations of this work is to identify the large-x behavior of \(q_\mathrm{v}^\pi (x)\), and which functional form it favors. The older analysis of Ref. [74] exhibit a \(1-x\) fall, while the data of Ref. [75] have a \((1-x)^2\) dependence. Based on the current uncertainties, the lattice data do not favor one over the other. However, the results are very promising, and, in the future, improved data can potentially be used to identify the large-x limit.

There is a large number of calculations on the pion distribution using different approaches, lattice formulations, and analysis. It is, therefore, desirable to compare these results. Here we show Fig. 9 taken from Ref. [78], in which four calculations are compared using quasi-PDFs [72, 100], pseudo-ITDs [78] and current–current correlators [79].

Fig. 9
figure 9

Lattice results for the pion PDF from Ref. [100] on quasi-PDF (pink band), Ref. [72] on quasi-PDF (green band), Ref. [78] on pseudo-ITDs (red band), and Ref. [79] on current–current correlators (cyan band). The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale \(\mu =2\) GeV for Ref. [78], \(\mu =3.2\) for Ref. [72], and \(\mu =4\) GeV for Refs. [79, 100]. Source: Ref. [78]. Article published under the terms of the Creative Commons Attribution 4.0 International license

An \(N_f=2+1+1\) mixed-action ensemble of HISQ/clover fermions with pion mass of 310 MeV and volume \(24^3\times 64\) is used for the calculation of Ref. [100]. The pion PDF is extracted using the quasi-PDF formulation, and the x-dependence reconstruction is performed using the derivative method [61], according to which one neglects the surface term in the integration by parts of the FT. This method does not solve the inverse problem, as discussed in Ref. [101]. It also introduces further uncertainties (see Sect. 3). The results presented in Ref. [72] also employ the quasi-PDFs method and are discussed in Sect. 2.1.1. Instead of a standard FT, two approaches are tested that rely on fits applied to the lattice data in the coordinate space. This is closer to the approach employed in Refs. [78, 79, 98]. The valence pion distribution is shown in Fig. 9. Note that the various calculations are evolved to different scales: \(\mu =2\) GeV [78], \(\mu =3.2\) GeV [72], and \(\mu =4\) GeV [79, 100]. However, scale dependence is expected to be small within these values. An agreement is observed between the results of Refs. [72, 78, 79] up to \(x\sim 0.5\), and find the distribution to reach its maximum at around \(x=0.3\). Note that Ref. [78] combines two \(N_f=2+1\) ensembles with clover fermions reproducing a pion mass of 415 MeV and different volumes (3 fm and 4 fm). Reference [79] uses the same ensemble with 4 fm. Even though the two calculations are completely independent and do not share any matrix elements, the distribution extracted is in full agreement. The results of Ref. [72] have a smaller slope after the peak and approaches \(x=1\) differently. As mentioned above, a fit on the lattice data is applied to handle the inverse problem, and the \(x\rightarrow 1\) limit is one of the constraints. The results found in Ref. [100] disagree with the other calculations, as they have a maximum of around \(x=0.5\) and is higher than the other calculations beyond \(x=0.5\). The use of the derivative method may contribute to this disagreement.

2.1.4 Hadronic tensor

The hadronic tensor approach was first proposed in the 90’s [13,14,15], and was recently revived. The methods discussed above rely on factorization (in momentum or coordinate space) and require large \(P_3\) or large \(z P_3\) while ensuring that the values used are within the perturbative region. The hadronic tensor method does not require large momentum, as it is frame-independent. It is also free of renormalization. The feasibility of the approach in terms of technical methods can be found in Ref. [102]. The method relies on analyzing ratios of suitable four-point and two-point functions, which is computationally very challenging, as one has to suppress the gauge noise and isolate the ground state at the same time. Another difficulty is the reconstruction of the Minkowski hadronic tensor, a process that also suffers from the inverse problem. In-depth investigations are needed to identify which methods are promising.

Reference [102] undertakes such a study using a two ensemble. A mixed action of clover on DWF with \(m_\pi =\) 370 MeV is used for the elastic case, \(W_{44}\). This serves as the benchmark of the approach, as it should give the vector charge. An anisotropic clover lattice with \(m_\pi =\) 380 MeV is used to obtain the Euclidean hadronic tensor with nonzero momentum transfer (3.56 GeV), \(W_{11}\). The inverse Laplace transform of the Euclidean hadronic tensor is studied with three methods: the Backus–Gilbert method [96], the maximum entropy method [103], and the Bayesian Reconstruction method [104]. It is found that the vector charge is reconstructed reliably from \(W^E_{44}\), with the maximum entropy method having a better resolution.

Fig. 10
figure 10

The Minkowski hadronic tensor \({W}^M_{11}\) for u-quark (left) and d-quark (right) as a function of energy transfer \(\nu \). Source: Ref. [102]. Article published under the terms of the Creative Commons Attribution 4.0 International license

The reconstructed Minkowski tensor \(W_{11}\) is shown in Fig. 10 for the up-and down-quark contributions, as a function of the energy transfer. The reconstruction method used is the maximum entropy method, and the bands represent the uncertainties due to various models. The statistical errors are not considered, possibly because they would dominate the analysis of the reconstruction methods. Note that the momentum transfer of 3.56 GeV is high, and the electromagnetic form factors are suppressed. Therefore there is no peak around \(\nu =0\), which is the elastic point. There is a peak around \(\nu =1\) GeV, which is below the range leading to the parton structure functions (\(\nu \in [2.96 - 3.68] \, \mathrm{GeV}\)). The same conclusions are found with the Backus Gilbert method. This is an on-going effort, and improvements are needed in several directions. The ultimate goal is to obtain lattice data for the Minkowski hadronic tensor for a wide set of kinematic setups, which can be used to extract x-dependent PDFs via a factorization theorem.

2.1.5 OPE without OPE

Another method to access hadronic distributions was proposed recently in Ref. [26]. It is known as “OPE without OPE”, as it is related to the work of G. Martinelli [105]. Based on this approach, one computes matrix elements of the time-ordered product of two local currents,

$$\begin{aligned} \mathcal{M}_{\mu \nu }(p,s,s^\prime ,z) = \langle p,s^\prime | \,\mathcal{T} \left\{ {\mathcal {J}}_\mu (z), {\mathcal {J}}_\nu (0)\right\} \,|p,s \rangle . \end{aligned}$$
(11)

The currents are separated by a distance z, which has to be small for perturbation theory to be applicable. There is also a lower limit for z for discretization effects to be under control. The Compton tensor

$$\begin{aligned} T_{\mu \nu }(p,q) = \frac{i}{2} \int d^4z\, e^{i q \cdot z} \delta _{s s^\prime } \mathcal{M}_{\mu \nu } , \end{aligned}$$
(12)

can be decomposed into two scalar functions \({\mathcal {F}}_{1,2}(\omega ,Q^2)\), which are related to the DIS structure functions \(F_{1,2}\) from the hadronic tensor decomposition, via the optical theorem:

$$\begin{aligned} {\mathrm{Im}}{\mathcal {F}}_{1,2}(\omega ,Q^2) = 2\pi F_{1,2}(x,Q^2),\quad \omega = \frac{2 p\cdot q}{Q^2}. \end{aligned}$$
(13)

The indices \(\mu ,\,\nu \) control the kinematic factors of the decomposition, with the choice \(\mu =\nu =3\) being convenient. Combining Eq. (13) with analytical and crossing symmetries, one can write

$$\begin{aligned} \overline{{\mathcal {F}}}_1(\omega ,Q^2)&= 4\omega ^2\int _0^1 dx \frac{x\,F_1(x,Q^2)}{1-x^2\omega ^2-i\epsilon }, \end{aligned}$$
(14)
$$\begin{aligned} {\mathcal {F}}_2(\omega ,Q^2)&= 4\omega \int _{0}^1 dx\,\frac{F_2(x,Q^2)}{1-x^2\omega ^2-i\epsilon }, \end{aligned}$$
(15)

where \(\overline{{\mathcal {F}}}_1(\omega ,Q^2) \equiv {\mathcal {F}}_1(\omega ,Q^2) - {\mathcal {F}}_1(\omega ,0)\).

The method has been recently studied in Ref. [106] using one \(N_f=2+1\) ensemble of clover fermions with pion mass 467 MeV and \(L=2.4\) fm (\(a=0.074\) fm). The matrix element of the current–current operator is computed using the Feynman–Hellmann relation [107], which avoids the computation of four-point functions. This method requires an additional term in the QCD Lagrangian

$$\begin{aligned} {\mathcal {L}}(x) \rightarrow {\mathcal {L}}(x) + \lambda \left[ Z_V\cos (\mathbf {q}\cdot \mathbf {x})\; e_f \,{\bar{\psi }}_f(x)\gamma _3 \psi _f(x) \right] , \end{aligned}$$
(16)

where \(e_f\) is the electric charge of the \(f{\mathrm{th}}\) flavor and \(\lambda \) is a parameter with dimension of mass. \(\mathbf {q}\) is the external current momentum. To extract \(T_{33}\), one needs separate simulations for a few values of \(\lambda \), and calculate the derivative of the nucleon energy with respect to \(\lambda \), that is

$$\begin{aligned} \left. \frac{\partial ^2 E_{N_\lambda }({p})}{\partial \lambda ^2} \right| _{\lambda =0} = - \frac{T_{33}(p,q) + T_{33}(p,-q)}{2 E_{N}({p})}. \end{aligned}$$
(17)

The flavor combinations considered correspond to the two currents coupled to the same flavor, indicated by uu and dd, which are leading-twist. Several values of the current momentum are considered within \(3\, \mathrm{GeV}^2 \lesssim Q^2\lesssim 7\, \mathrm{GeV}^2\). Several values of p and q are used leading to \(\omega \) up to 1. The \(\omega \)-dependence of \(\overline{{\mathcal {F}}}_1(\omega ,Q^2)\) for the uu and dd combinations is show in Fig. 11. As can be seen, the quality of the signal is good for both uu and dd. Knowledge of \(T_{33}(p,q)\) for several values of \(\omega \), can lead to the extraction of the x dependence of \(F_1\), at fixed values of \(q^2\).

Fig. 11
figure 11

\(\omega \) dependence of \(\overline{{\mathcal {F}}}_1^{qq}(\omega , Q^2)\) at \(Q^2 = 4.66\) \(\hbox {GeV}^2\) for the uu (blue) and dd (red) combinations. The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 2 GeV. Source: Ref. [106]. Article published under the terms of the Creative Commons Attribution 4.0 International license

2.2 Flavor decomposition of PDFs

One of the recent highlights is the calculation of disconnected contributions of matrix elements of non-local operators, which have been analyzed within the quasi-PDFs framework. This is an important direction, as some of the PDFs are not well-constrained from global analyses. For example, most of the high-energy processes are not sensitive to the strangeness. Therefore, it is difficult to disentangle the strange-quark PDF from the down-quark PDF. This results in large ambiguities in its extraction, and, for example, global analysis of DIS + SIDIS data sets gives different sign for the strange-quark helicity PDF, \(\Delta _s\), for different fragmentation functions [108, 109]. This poses limitations on the reliable extraction of the W-boson mass and the determination of the CKM matrix element \(V_{cs}\) [110, 111]. Thus, the lattice calculations are quite timely, and while preliminary, demonstrate their potential. To date, there are only two calculations at higher-than-the-physical pion mass, which we briefly summarize here [112, 113]. Both calculations apply the non-singlet renormalization functions calculated non-perturbatively, and the mixing with the gluon PDFs has been neglected. It should be noted that this mixing arises at the matching level and not at renormalization because there is no additional non-local ultraviolet divergence in the quasi-PDF [52, 114, 115]. The effects of neglecting both the singlet renormalization and the mixing are expected to be within the calculations’ uncertainties.

Reference [112] presents the strange and charm unpolarized PDFs using an \(N_f=2+1+1\) ensemble with mixed fermion action (clover on HISQ) with volume \(24^3\times 64\) (\(\sim \)3 fm). The nucleon is boosted with momentum up to \(P_3=2.18\) GeV. Two values of the valence pion mass are used: 310 and 690 MeV, on which a 2-parameter chiral extrapolation is performed, making it less reliable. Also, unlike the 2-parameter fit applied in Ref. [92], here the lightest ensemble is above 300 MeV, possibly not capturing the pion mass dependence near the physical point properly. Furthermore, similar statistical accuracy is required for all data for an unbiased fit. However, the data obtained for the 690 MeV ensemble are a factor of 2–3 more accurate than the 310 MeV data, at \(P_3=1.74\) GeV. The lattice data are presented in coordinate space as a function of \(z P_3\), and an inverse matching and inverse FT is applied on the global fits from NNPDF31 [63] and CT18 [71].

Fig. 12
figure 12

The renormalized unpolarized matrix element (data points) at \(P_3\) up to 2.18 GeV. The global fits of NNDPF3.1 [63, 71] are shown upon inverse matching and FT. The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 2.3 GeV. Source: Ref. [106]. Source: Ref. [112]. Reprinted based on the arXiv distribution license

The data are shown in Fig. 12 favoring symmetry between s and \({\bar{s}}\), as the real part of the lattice data is compatible with zero, which is also assumed by CT18. Whether the nonzero signal for NNPDF31 is due to truncation of the matching, is not clear at this stage. The imaginary part is nonzero, and in agreement with CT18 up to \(z P_3=12\pi /24 \sim 1.6\), and with the less-accurate NNPDF31 up to \(z P_3=18\pi /24 \sim 2.4\). The data shown correspond to \(\{n_3,z_{\mathrm{max}}\}= \{1, 6a\},\,\{2, 6a\},\,\{3, 6a\},\,\{4, 4a\},\,\{5,3a\}\), where \(a P_3=n_3 \times 2\pi /24\), and \(z_{\mathrm{max}}\) is the maximum value of z used (\(a=0.12\) fm). As can be seen, the lattice data for \(P_3=1.75\) and 2.18 GeV are very noisy, and thus not very useful. Such rapid decrease of the signal with increase of the momentum boost has been discussed by various groups (see, e.g., Refs. [59, 77, 98]). In general, the lattice data have flat behavior, which is not supported by the global fits, demonstrating a tension of a factor of 2 for the large values of \(z P_3\). It is assumed that this is due to not taking into account the other flavors in the matching [112]. Results on the charm unpolarized PDF can be found in Ref. [112].

The flavor decomposition of the up- and down-quark PDFs requires calculating both the connected and quark disconnected diagram. The disconnected \(u+d\) was not calculated until recently, preventing lattice calculations to provide the individual quark PDFs. Such a decomposition is presented for the helicity PDFs, \(\Delta u,\,\Delta d\), in Ref. [113]. \(\Delta s\) is also presented, which is not combined with the light-quark PDFs, as the mixing with the gluon PDFs is not considered. The calculation is performed using one ensemble of \(N_f=2+1+1\) twisted mass fermions, at pion mass 260 MeV, and a volume of 3 fm (\(32^3\times 64\)). Three values of \(P_3\) are used, with the highest being 1.24 GeV. Convergence between \(P_3=0.83\) GeV and \(P_3=1.24\) GeV is observed within the reported uncertainties. Hierarchical probing [116] is used together with momentum smearing, which successfully improves the signal.

Fig. 13
figure 13

Comparison of lattice data on the up (upper), down (middle), and strange (bottom) quark helicity PDFs (blue) with the JAM17 phenomenological data [67] (gray). The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 2 GeV. Source: Ref. [113]. Reprinted based on the arXiv distribution license

The final PDFs, \(x \Delta ^+_q \equiv x\Delta _q + x \Delta {\bar{q}}\), are shown in Fig. 13 for the three flavors, and are compared to the latest global fits of the JAM Collaboration (JAM17) [67]. Despite neglecting the mixing with the gluon PDFs, the agreement between the lattice results and global fits is very encouraging. \(x \Delta ^+_u\) exhibits an agreement up to \(x\sim 0.65\), while \(x \Delta ^+_d\) is in full agreement for all x regions. The statistical uncertainties of the lattice data for the light-quark PDFs are comparable to the uncertainties of the global fits. The most striking case is that of \(x \Delta ^+_s\), for which the lattice data exhibit smaller statistical uncertainties than JAM17 for \(x<0.6\), and both data are compatible within uncertainties for the whole x-region. We remind the Reader that the comparison is qualitative, as a thorough investigation of systematic uncertainties is needed. This includes a proper treatment of the mixing between the singlet-quark and gluon PDFs, which currently has an unknown effect.

2.3 Gluon PDFs

The formalism to extract the x-dependence of quark distributions is also applicable for gluons PDFs. This has already been discuss in several papers for the quasi-PDFs [51,52,53,54, 115], pseudo-ITDs [117], and current–current correlators [24] methods. Ab initio calculations of gluon PDFs is a new direction and largely unexplored. However, it is an essential one because gluonic contributions are sizeable, for example in the proton spin [118,119,120,121,122]. Global analysis find that gluon PDFs are dominant in the production of Higgs boson and heavy quarkonium [62,63,64, 71, 123,124,125,126]. In fact, the recent analyses constrained better the small-x region, using recent data on the \(t{\bar{t}}\)-quark data at the LHC [127,128,129] and Tevatron [130], as well as the HERA run I+II data [131]. The gluon PDFs play an essential role in the structure of hadrons. For example, the gluon PDFs dominate over the quark PDFs in the small-x region. Besides their essential role in the hadron structure, calculations of gluon PDFs serve another important purpose: they are necessary to eliminate the mixing with the quark singlet PDFs [132]. Disentanglement of the gluon and quark singlet PDFs will eliminate one of the sources of systematic uncertainties in lattice calculations. For the case of Mellin moments of PDFs, the mixing appears in the renormalization. However, as discussed above, the non-local operators are multiplicatively renormalizable.Footnote 6 The mixing between the flavor-singlet quark and gluon PDFs should be eliminated at the factorization level.

A preliminary calculation of matrix elements of non-local gluonic operators is presented in Ref. [133]. A mixed action is used with pion mass 330 MeV for the sea sector and volume 2.7 fm (\(24^3\times 64\)). The valence sector reproduces a pion mass of 340 and 678 MeV. The nucleon is boosted with \(P_3=0.46,\,0.92\) GeV. One of the operators employed exhibits mixing with other gluonic operators under renormalization, in addition to the mixing with the quark singlet PDFs. Also, the calculation lacks renormalization, and instead, the matrix elements are normalized. Such a normalization changes the quantity under study, and the matching kernel of quasi-PDFs is not applicable. Therefore, only matrix elements are extracted, demonstrating the challenges of the calculation: for \(m_\pi =678\) MeV at \(P_3=0.92\) GeV the noise exceeds \(40\%\) of the signal beyond \(z=3a\), while for \(m_\pi =340\) MeV the noise is \(40\%\) already at \(z=0\) and the signal is lost at \(z=4a\).

A calculation using the pseudo-ITDs formulation is presented in Ref. [134], using the same ensemble as Ref. [112]. The analysis follows Ref. [117], for an operator that is multiplicatively renormalizable and a matching kernel known to one-loop accuracy. From a technical standpoint, the advantage of pseudo-ITDs is the cancellation of the renormalization functions in the reduced pseudo-ITD. As discussed in Sect. 2.1, the RI-type renormalization procedure does not pose a difficulty for quark PDFs, which can be seen by the agreement of unpolarized PDFs obtained from the same ensemble using the quasi- and pseudo-ITDs methods [59, 95] (Fig. 7). However, the calculation of the renormalization function for non-local gluon operators cannot be reliably extracted with the methods used for \(\langle x \rangle _g\) [121, 122]. The mixing between the quark-singlet and gluon PDFs appears at the matching procedure for both quasi- and pseudo-ITDs. In Ref. [112], the mixing is not considered, as the \(u+d\) PDFs for the disconnected diagram have not been calculated. The calculation employs momentum boost from zero up to 2.16 GeV and considers up to \(z=5a = 0.6\) fm, for all momenta. It is found that the matrix elements from different \(n_3\) and z at the same \(z P_3\) are compatible within the reported uncertainties, even though there are indications that a cut should be applied at around \(z=0.33\) fm [97].

Fig. 14
figure 14

Lattice data on the unpolarized gluon PDF for \(m_\pi =\) 310 and 690 MeV (blue and green bands), and the extrapolated values at \(m_\pi =135\) MeV (purple bands). The global fits of CT18 (red band) and NNPDF31 (orange band) are also shown. The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 2 GeV. Source: Ref. [134]. Reprinted based on the arXiv distribution license

It is found that the uncertainties for the 310 MeV ensemble are significantly larger than the corresponding ones at 690 MeV. This happens already at \(P_3=1.3\) GeV, where uncertainties of \(\sim 30 \%, 45 \%, 70 \%\) are found for \(z=3,4,5\), respectively, and further increase for the highest momenta. Therefore, only the first few values of \(z P_3\) have controlled uncertainties for large \(P_3\) in the 310 MeV ensemble, which demonstrates the severe challenges to reliably extract the gluon PDFs. The final results are shown in Fig. 14, for the two ensembles. The same 2-parameter extrapolation of Ref. [112] is also applied here. As discussed in Sect. 2.2, the uncertainties of such procedure are uncontrolled, as only two ensembles are available, with much different statistical uncertainties. The global fits from NNPDF31 [63] and CT18 [71] are also shown for comparison, and an agreement is reported for \(x>0.25\). These results are encouraging, but careful analysis is needed, as the current comparison is inconclusive.

Preliminary results on the gluon pseudo-ITDs using a different lattice formulation have also been presented recently [135]. For results on the gluon helicity \(\Delta G\) using LaMET, see Ref. [118].

2.4 Twist-3 PDFs

Twist-3 PDFs appear in QCD factorization theorems for a variety of hard scattering processes together with their leading twist (twist-2) counterparts. While twist-3 PDFs lack a probabilistic interpretation, they are interesting in their own right, as they contain information on the soft dynamics, such as the transverse polarization. Also, they are not smaller in magnitude than the twist-2 PDFs, and thus, are important to be considered. Most of the work on twist-3 PDFs is theoretical (see e.g., Refs. [136,137,138,139,140]), while the experimental studies are limited. It is very challenging to probe them experimentally and isolate them from the leading-twist contributions. Also, their theoretical analysis is more evolved and goes beyond the simple partonic model. A recent re-analysis of the JLab Hall A data on the absolute DVCS beam helicity cross-section differences [141, 142] addressed twist-3, but it is inconclusive on their contribution.

Twist-3 PDFs that enter the description of transverse-spin observables in inclusive high-energy processes include contributions from several matrix elements. The latter are classified as intrinsic, kinematical, and dynamical (see, e.g., Ref. [138]). Intrinsic twist-3 observables are extracted from matrix elements with two-parton non-local operators. The kinematical ones are the first moments of transverse momentum dependent (TMD) two-parton PDFs, and the dynamical ones from matrix elements with quark-gluon-quark non-local operators. The latter are also known as pure twist-3 correlation functions.

The most-well studied structure function is the transverse spin-dependent \(g_2^{\mathrm{s.f.}}\) [143]. One can separate the non-perturbative component, the intrinsic \(g_T\) PDF, defined as \(g_2=g_T-g_1\), where \(g_1\) is the twist-2 counterpart. The properties of \(g_2^\mathrm{s.f.}\) in DIS were first studied by Wandzura and Wilczek [144], proposing the Wandzura-Wilczek (WW) approximation. In their model calculation, it is suggested that the Mellin moments of \(g_T(x)\) receive contributions from twist-2 operators and twist-3 operators, allowing one to write

$$\begin{aligned} g_T(x) = g_T^{\mathrm{WW}}(x) + g^{\mathrm{twist-3}}_T(x) , \end{aligned}$$
(18)

where \(g^{\mathrm{twist-3}}_T\) the contribution from twist-3 operators [145]. In the WW approximation,Footnote 7 the moments of twist-3 operators vanish. Thus, \(g_T(x)\) is determined only by the twist-2 operator of the helicity PDF \(g_1(x)\). The validity of the WW-approximation has been under investigation since its proposal. In Ref. [145], the WW approximation was tested using experimental data, leading to a possible violation at the level of 15–40%.

The field of extracting the x-dependence of PDFs using the methodologies outlined in Sect. 2 has reached sufficient maturity that enables the study of twist-3 PDFs and their properties. To date, there is only one lattice calculationFootnote 8 on twist-3 PDFs, in particular, on \(g_T(x)\) presented in Ref. [148]. The calculation is performed using \(N_f=2+1+1\) twisted mass fermions at a pion mass of 260 MeV and employing the quasi-PDFs method. The relevant matrix element is \(\langle P\vert \, {\overline{\psi }}(0,z)\,\gamma ^j\,\gamma ^5\, W(z)\,\psi (0,0)\,\vert P\rangle \), where j is in one of the transverse directions. The maximum proton momentum boost used is \(P_3=1.67\) GeV. The necessary matching was calculated in [149], demonstrating that the twist-3 light-cone PDFs and their quasi-PDF counterparts have the same infrared physics, as in the case of all twist-2 PDFs. It was also shown that there are no delta-function singularities at \(x=0\) (\(\delta (x)\)), the so-called zero-mode contributions. An extension of this work to the twist-3 e(x) and \(h_L(x)\) PDFs confirmed the infrared poles exactly match in those cases too [150]. This is a very important and positive finding, as it confirms that the factorization formalism of quasi-PDFs is valid for the twist-3 case; presence of such contributions would lead to different infrared physics between the quasi- and light-cone PDFs, and the matching formalism would break down.

Fig. 15
figure 15

Comparison of lattice estimate of \(g_T(x)\) (blue band) with its WW approximation (red band). The WW approximation using global fits also shown (NNPDF1.1pol [66] orange band, JAM17 [67] purple band). The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 2 GeV. Source: Ref. [148]. Article published under the terms of the Creative Commons Attribution 4.0 International license

The lattice data on \(g_1(x)\) have been used to test the WW approximation, which suggests that it fully defines \(g_T\)(x), via

$$\begin{aligned} g_T^{\mathrm{WW}}(x)=\int _x^1 \frac{dy}{y} g_1(y). \end{aligned}$$
(19)

The lattice results on \(g_T^{\mathrm{WW}}(x)\), as well as the actual lattice results on \(g_T(x)\) are compared in Fig. 15 for \(P_3=1.67\) GeV. The estimate from global fits by the NNPDF [66] and JAM [67] collaborations are also shown. As can be seen, the lattice \(g_T(x)\) is consistent with the lattice \(g_T^{\mathrm{WW}}(x)\) for \(x \le 0.5\). However, not all sources of systematic uncertainties have been accounted for. Therefore, a violation of the WW approximation is still possible at the level of up to 40% for \(x\lesssim 0.4\), which is similar to the one found using experimental data [145]. The comparison is qualitative, as simulations at the physical pion mass and investigation of systematic uncertainties are essential. It should also be noted that there is a mixing present in the renormalization of the operator for \(g_T\) and involves three other operators (see, e.g., Ref. [151]). One of these operators vanishes by the equation of motion, one is gauge variant, leaving only two operators with non-vanishing physical matrix elements: a quark-quark operator of the form \({\overline{\psi }}\,\gamma ^j\,\gamma ^5\,\psi \) and a quark-gluon-quark operator \({\overline{\psi }}\,F_{\alpha \beta }\gamma ^j\,\psi \). In the quasi-distributions approach, such mixing should be dealt with at the matching level, as discussed above for the gluon and singlet quark PDFs. However, twist-3 is a completely unexplored area for the quasi-PDFs approach. While a lot of work is needed in the direction of twist-3 PDFs, the work of Refs. [148,149,150] demonstrates the feasibility of lattice calculations and the matching formulation.

2.5 Other developments

In parallel to improving the numerical calculations, there have been exciting developments in a more theoretical direction, and here we discuss a few of them.

Some of the highlights regarding the matching formalism are the calculations presented in Refs. [152,153,154,155], which extend the matching kernels to next-to-next-to-leading order (NNLO). In practice, this means two-loop calculations with respect to the strong coupling constant. It has been discussed extensively that the truncation of the matching formula to NLO is one of the major sources of systematic uncertainties. Therefore, this is an important development. In particular, the flavor non-diagonal quark-quark channel in the modified minimal subtraction scheme for the quasi-PDFs is discussed in Refs. [152, 153]. The NNLO matching for the valence PDFs in coordinate space is presented in Ref. [154], and the NNLO matching for quasi-PDFs in Ref. [155]. The calculations reach a consensus on the importance of the two-loop, emphasizing that this is the natural extension for lattice calculations.

As discussed in previous subsections, the matching (factorization) between the light-cone distributions and the lattice component (quasi-PDF, pseudo-PDF, etc) is calculable in perturbative QCD. Thus, there are limitations in the z-range for this process to be well-defined. This also concerns the renormalization. For example, in an RI-type scheme used in the quasi-PDFs method, there could be non-perturbative effects in the large-z region through logarithms, such as \(\log (z^2 \mu ^2)\). Also, in the pseudo-ITDs ratio, there could be higher-twist effects in large z. However, both the RI-type and ratio schemes are suitable to remove all divergence for small and intermediate values of z. This is the main motivation of Ref. [97], proposing a hybrid renormalization, which treats the short-distance and long-distance renormalization separately. This consists of using an RI-type, or ratio scheme up to some \(z_{\mathrm{max}}\), and a mass-subtraction scheme for the z beyond \(z_{\mathrm{max}}\). At \(z_{\mathrm{max}}\), the renormalization functions from the two schemes should be the same based on continuity arguments. It is estimated that an upper limit for \(z_{\mathrm{max}}\) is around 0.4 fm. This constraint for \(z_{\mathrm{max}}\) may be beyond the current precision level of state-of-the-art calculations. For example, the agreement observed in Figs. 7 and 9 between lattice data from different lattice actions and renormalization schemes, does not indicate large effect from treating renormalization differently. The validity of extrapolating the lattice data based on the asymptotic long-range Regge-type behavior [156] is also presented. This type of fits is also used in the pseudo-ITDs and current–current correlators method.

Reference [157] addresses the renormalization and the validity of the methods factorizing equal-time correlators into light-cone PDFs. The work is based on one-loop calculations of quasi-PDFs and pseudo-PDFs in a renormalizable scalar theory. The calculation addresses the erroneous claims [158, 159] that quasi-PDFs cannot be related to light-cone PDFs because the moments of the former exhibit power-divergent mixing. It should be noted that the claims have been refuted in more than one publications [84, 101, 160]. For instance, one of the counter-arguments is that, while all moments except for the zeroth do not converge, the light-cone PDFs are extracted from the non-local quasi-distributions avoiding the power divergence problem [160]. The non-local operators under study contain logarithmic divergences and power-law divergence due to the Wilson line. Both are renormalizable by adopting a non-perturbative RI-type prescription. Also, the required matching kernel is responsible for extracting light-cone PDFs with finite moments. A numerical example was given in Ref. [101] calculating the two lowest moments of pseudo-PDFs and found agreement with the results of a direct calculation of moments using one- and two-covariant derivative local operators. Finally, the use of a Taylor expansion in z in Refs. [158, 159], is only justifiable when all derivatives with respect to \(z^2\) exist at \(z = 0\) [84], which is not the case for the non-local operators discussed here. The work of Ref. [157] generalizes Collins’ work [161] to off-light-cone matrix elements for leading twist and found that quasi-PDFs and pseudo-PDFs are equivalent. These, as well as their light-cone counterparts, are renormalizable. It is also demonstrated that factorization exists at small z, large \(P_3\), and small flow times. Yet again, the claims of Refs. [158, 159] have been proven invalid. Light-cone PDFs can indeed be accessed properly via the methods discussed in this review, and other methodologies.

2.6 Synergy of lattice QCD and global fits

Recently, there has been an interest from the phenomenological community to incorporate lattice data in the global analyses, in the same fashion as the experimental data sets. The primary motivation is to explore whether the PDFs can be constrained more successfully in regions where the experimental data are either sparse, imprecise, or non-existing.

Fig. 16
figure 16

\(V_3\) (top) and \(T_3\) (bottom) non-singlet PDF using lattice data with systematic uncertainties based on scenario S2 (green band) and S5 (salmon band). The NNPDF31 estimates are shown with a blue band. The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 1.6 GeV. Source: Ref. [162]. Published under open access license

A general framework to extract PDFs from the lattice data treating them on the same footing as experimental data is presented in Ref. [162]. The methodology is implemented for the unpolarized PDFs within the NNPDF fitting framework using the results of Refs. [42, 59]. Two non-singlet flavor combinations are considered, that is

$$\begin{aligned}&V_3 ( x ) = u\left( x\right) - {\bar{u}}\left( x\right) -\left[ d\left( x\right) -{\bar{d}}\left( x\right) \right] , \end{aligned}$$
(20)
$$\begin{aligned}&T_3 ( x ) = u\left( x\right) + {\bar{u}}\left( x\right) -\left[ d\left( x\right) +{\bar{d}}\left( x\right) \right] , \end{aligned}$$
(21)

which are related to the real and imaginary parts of the vector matrix element, via

$$\begin{aligned} \text {Re}\left[ M(\nu ,z^2,\mu )\right]&= \int _{0}^{1} dx \,{\mathcal {C}}_3^{\text {Re}}\left( \nu x, z, \frac{\mu }{P_z} \right) V_3\left( x,\mu \right) , \end{aligned}$$
(22)
$$\begin{aligned} \text {Im}\left[ M(\nu ,z^2,\mu )\right]&= \int _{0}^{1} dx \,{\mathcal {C}}_3^{\text {Im}}\left( \nu x, z, \frac{\mu }{P_z} \right) T_3\left( x,\mu \right) . \end{aligned}$$
(23)

Therefore, the lattice matrix elements can be analyzed within the framework applied on experimental data sets. The procedure is based on finding a \(\chi ^2\)-function, which is minimized in order to extract \(V_3\) and \(T_3\). Quantifying statistical and systematic uncertainties is important in this process. Six plausible scenarios are considered to assess systematic uncertainties that are not known, focus on the two realistic ones, labeled S2 and S5. Scenario S2 gives estimates for the uncertainties that are a percentage of the value of the matrix element at each z. Scenario S5 assumes that the uncertainties are z-independent. In particular, scenario S2 attributes 20\(\%\), \(5\%\), \(10\%\), \(20\%\) for cut-off effects, finite-volume effects, excited-states contamination truncation of perturbative expansion, respectively. Scenario S5 adds (0.2, 0.05, 0.1, 0.2) for all z values, for the systematic effects mentioned above. The analysis of lattice data including these systematic effects is shown in Fig. 16, and are compared to the phenomenological fits NNPDF31. It is observed that the more conservative scenario S2 leads to larger uncertainties, and is marginally compatible with the global analysis estimate at most regions of x. This analysis emphasizes the importance of properly quantifying systematic uncertainties in lattice estimates. After the submission of this review, the work of NNPDF was extended to include pseudo-PDFs data [163].

Ref. [164] presents a combined Monte Carlo based QCD global analysis within the JAM framework for both the unpolarized and helicity PDFs. The lattice data of Refs. [42, 59] are treated under the same conditions as the experimental data sets from unpolarized and polarized DIS and Drell-Yan (note this is not the same as the JAM17 results [67]). The analysis uses the Monte Carlo Bayesian inference sampling methodology, parametrizing the PDFs at the input scale \(\mu \), and solving the evolution equations in Mellin space to bring all the observables at the same scale. In a nutshell, the analysis shows a relatively weak constraint from the lattice data on the phenomenological unpolarized PDFs. This behavior is in agreement with the findings of Ref. [162]. In contrast, the lattice data on the helicity PDFs provide significant constraints to the combined estimates, as can be seen in Fig. 17. It is found that the combined estimates of the quark PDFs have reduced uncertainties by a factor of 3 in the small- and large-x regions and a factor of 5 in the intermediate-x regions. The anti-quark isovector helicity PDF exhibits an equally large improvement. It is also interesting that the combined fit (“exp+lat”) is compatible with the discretized FT (“DFT”) of Refs. [42, 59]. It should be noted that the implementation of the lattice data is done at the level of the renormalized matrix elements in coordinate space. Therefore, the challenges of the inverse problem do not appear directly in this analysis, as the discretized lattice data are fitted within the JAM framework. However, the effectiveness of such fits is driven by the number and quality of the lattice data. As mentioned previously, properly quantifying both statistical and systematic uncertainties is crucial.

Fig. 17
figure 17

The isovector helicity PDF. The JAM17 [67] results are shown with red band, while the blue band shows the combined fits using both lattice and experimental data. The yellow band shows the data from Refs. [42, 59] using the standard discretized FT. The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 2 GeV. Source: [164]. Reprinted based on the arXiv distribution license

Based on the analyses of Refs. [162, 164], one can conclude that the current precision of the lattice data on the unpolarized case must improve significantly to reach the quality of the experimental data. The case of the helicity PDFs is more promising, as the independent analysis of Refs. [42, 59] shows a very good agreement with the global fits for a wide range of x. The lattice data compare favorably with the experimental data sets and demonstrate a great constraining power, as shown in Fig. 17. The work of Refs. [162, 164] opens up new possibilities for synergy between lattice QCD and global QCD fits. It is expected that, soon, combined analysis of multiple factorizable lattice matrix elements from various approaches will be available, an idea already discussed in Refs. [22, 24]. Reference [157] proposes a framework of a global analysis of lattice data.

3 GPDs

GPDs were first introduced in the 1990s, and it was demonstrated that they can be accessed in Deeply-Virtual Compton Scattering (DVCS) and Deeply-Virtual Meson Production (DVMC) [3,4,5]. It was shown by Ji [3] that GPDs can offer important information for the proton spin. For example, the proton angular momentum can be decomposed in a gauge-invariant way into quark and gluon parts. GPDs are relevant to non-forward scattering processes that capture the momentum transfer between the initial and final hadron states. They depend on two additional kinematic variables in addition to x, that is, the light-cone component of the longitudinal momentum transfer (skewness), \(\xi \), and the momentum transfer squared \(t=\Delta ^2\). Therefore, t contains information on the transverse part of the longitudinal momentum transfer (\(\Delta _\perp \)). Consequently, the GPDs are multi-dimensional quantities, and mapping their other two-dimensional functional surface \((t,\xi )\) allows one to get information on the two-dimensional structure of hadrons in terms of these kinematic variables.

There are a number of challenges in extracting GPDs from experimental data, as the latter is limited and covers a small kinematic region. Not only these data sets are indirectly related to GPDs through the Compton form factors, but it is challenging to disentangle the GPDs entering the same high-energy process [165,166,167,168].

The x-dependence of GPDs for fixed values of t and \(\xi \) can be accessed using the methods developed for PDFs. This new direction has recently been explored using the quasi-PDFs method in Refs. [169, 170].Footnote 9 The possibility to access GPDs through the pseudo-GPDs method has also been proposed [85], but yet to be implemented in lattice calculations.

The connected contribution for the pion GPD, \(H_v^\pi \), is presented in Ref. [169]. The motivation is to study the large-x behavior and test, which parameterization of kinematic dependence of the GPD is preferable [172, 173]. The calculation is performed using an \(N_f=2+1+1\) mixed-action ensemble of HISQ/clover fermions with pion mass of 310 MeV and volume \(24^3\times 64\). Only zero skewness is considered in the calculation, and two values of the momentum transfer, that is \(-t\sim 0.40,\,0.92\) \(\hbox {GeV}^2\). Since there is only one GPD for the pion, one avoids the complications of disentangling GPDs, and the need for more than one independent matrix elements to achieve the isolation. As a consequence, the calculation of the pion GPDs is computationally less expensive than the nucleon ones. The derivative method is employed, according to which one neglects the surface term in the integration by parts of the FT. This process is controlled only if the renormalized matrix elements decay to zero. Another concern about this method is that the surface term contains a factor of 1/x, which makes the extraction of the PDF for small values of x less reliable. In Ref. [59] it was demonstrated numerically that the difference between the discretized FT and the derivative method differ up to \(x=0.2\) for the case of the nucleon unpolarized PDFs.

Fig. 18
figure 18

The x-dependence of the pion GPD \(H^{\pi ^{+}}_{v}\) for zero skewness and \(t=0.40,\,0.92\) \(\hbox {GeV}^2\) (red and green bands), as well as the corresponding PDF (blue and salmon bands). The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 4 GeV. Source: Ref. [169]. Article published under the terms of the Creative Commons CC-BY license

The main result of the work is shown in Fig. 18 for the pion PDF, and GPD for \(-t\sim 0.40,\,0.92\) \(\hbox {GeV}^2\) at \(P_3=1.74\) GeV. Meson-mass corrections have also been applied. One observes that as t increases, the GPD decreases in magnitude. It is concluded that the accuracy of these results is not sufficient to favor one of the two parameterizations, and further studies are necessary.

The isovector unpolarized and helicity GPDs for the proton, \(H(x,\xi ,t),\,E(x,\xi ,t)\) and \(\widetilde{H}(x,\xi ,t),\,\widetilde{E}(x,\xi ,t)\), have been studied using an \(N_f=2+1+1\) ensemble of the twisted mass formulation [170]. The ensemble has volume \(32^3\times 64\) and pion mass 260 MeV. Both zero and nonzero skewness (\(\xi =\pm 1/3\)) have been studied, for \(P_3\) up to 1.67 GeV, and \(-t=0.69,\,1.39\) \(\hbox {GeV}^2\). As these are discretized on the lattice, specific combinations of the above values of \(P_3,\,\xi ,\,t\) have been obtained. Note that the skewness defined in the lattice calculation (quasi-skewness) is defined as \(\xi =-\frac{Q_3}{2P_3}\) and differs to the light-cone \(\xi \) in power corrections. As mentioned above, extracting nucleon GPDs is computationally very costly. In addition, one requires a sophisticated analysis to properly disentangle the GPDs, as each matrix elements contributes, in general, to more than one GPDs.Footnote 10 For example, the off-forward proton matrix element of the vector current decomposes into

$$\begin{aligned}&\langle N(P_f)|\mathcal{O}_{\gamma _\mu }(z)|N(P_i)\rangle = \nonumber \\&\quad \langle \langle \gamma _\mu \rangle \rangle F_H(z,P_3,t,\xi ) -i\frac{\langle \langle \sigma _{\rho \,\mu }\rangle \rangle \, Q_\rho }{2m} F_E(z,P_3,t,\xi ),\nonumber \\ \end{aligned}$$
(24)

where \(Q\equiv P_f-P_i\), and \(\langle \langle \Gamma \rangle \rangle \equiv {\bar{u}}_N(P_f,s')\, \Gamma \,u_N(P_i,s)\) with \(u_N\) the proton spinors. Therefore, a minimum of two parity projectors are needed, leading to two independent matrix elements, in order to disentangle \(F_H\) and \(F_E\). Another critical aspect of the calculation of GPDs is the need for implementation of the Breit (symmetric) frame. In this setup, both the momentum of the initial and final states, \(P_i, \, P_f\) carry half of the momentum transfer, that is \(\mathbf {P}_i=P_3 {\hat{z}} -\mathbf {Q}/2\) and \(\mathbf {P}_f = P_3 {\hat{z}} +\mathbf {Q}/2\). This has major implications in the computational cost, as separate calculations are needed for every value of t, because both the source and sink momenta change. We emphasize that there is a fundamental difference between the calculation of Mellin moments of GPDs (form factors and generalized form factors) and the light-cone GPDs. The former are frame independent, and one can extract them in a more convenient frame, that is, fix the sink momentum and attribute t to the source (\(\mathbf {P}_f=P_3{\hat{z}}\), \(\mathbf {P}_i=P_3 {\hat{z}} -\mathbf {Q}\)). However, the light-cone GPDs are frame dependent, and their standard definition is in the symmetric frame. The connection between different frames is not a simple kinematical one, and therefore, the lattice calculations must be performed in the frame where GPDs are defined. Introducing momentum transfer in the matrix element, coupled to the momentum boost of the proton states, increases significantly the statistical noise. Another complication of \(t\ne 0\) is related to the momentum smearing method, which is necessary to suppress the statistical uncertainties. As pointed out in Ref. [45], the exponent of the momentum smearing phase must be parallel to the proton momentum to ensure overlap with the ground state. Therefore, all ingredients of the optimized ratio between 3pt- and 2pt-correlation functions (see, e.g., Eq. (S4) of the supplementary document of Ref. [170]) requires a separate momentum smearing.

Fig. 19
figure 19

\(H(x,\xi ,t)\) for \(\xi =0\) (blue band) and \(\xi =|1/3|\) (green band), and \(f_1(x)\) (violet band) for \(P_3=1.25\) GeV. The area between the vertical dashed lines is the ERBL region. The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 2 GeV. Source: Ref. [170]. Article published under the terms of the Creative Commons Attribution 4.0 International license

Fig. 20
figure 20

Comparison of \(\widetilde{H}(x,\xi ,t)\) and \(g_1(x)\). Notation as in Fig. 19. The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 2 GeV. Source: Ref. [170]. Article published under the terms of the Creative Commons Attribution 4.0 International license

Figures 19 and 20 show \(H(x,\xi ,t)\) and \(\widetilde{H}(x,\xi ,t)\) at \(P_3=1.25\) GeV, for which both zero and nonzero skewness are available. Each is compared to the corresponding PDF, \(f_1(x)\) and \(g_1(x)\), respectively. As can be seen, there is a qualitative difference between the GPDs at zero and nonzero skewness. Let us remind the Reader that this mean zero and nonzero momentum transfer in the direction of the boost. In the latter case, the x-range separates into two kinematic regions, the ERBL (\(|x|< \xi \)), and the DGLAP region (\(|x|> \xi \)), which have different physical interpretation [174]. From the practical standpoint, the matching for \(\xi \ne 0\) becomes more complicated than the one for \(\xi =0\), with the latter being the same as for the PDFs [175]. The PDFs are more dominant, as the probability of finding quarks with \(t\ne 0\) decreases with increase of t. For the unpolarized case, we can compare the lattice data in the DGLAP region with the power-counting analysis of Ref. [176] at the \(x\rightarrow 1\) limit. One of the predictions of this analysis is that \(H(x,\xi =0,t)\) approaches \(f_1(x)\), which is confirmed by the lattice data. It also predicts that at \(x\rightarrow 1\) the t-dependence vanishes, which is also observed by the lattice data. For results on the E-GPD and \(\widetilde{E}\)-GPD, as well as results on the highest momentum, see Ref. [170].

4 TMD PDFs

To map the three-dimensional structure of hadrons, knowledge of TMD PDFs is necessary, as it provides information not only on the longitudinal momentum but also on the transverse momentum of partons. TMD PDFs arise in processes that involve multiple kinematic scales, such as Drell-Yan, \(e^{+} e^{-}\) annihilation, and semi-inclusive deep inelastic scattering. Consequently, a more complicated factorization formalism is required. For instance, the differential cross-section is sensitive to the small transverse momenta, for which the non-perturbative dynamics cannot be neglected. Thus, the TMD PDFs contain the small transverse momenta information, making them more detailed than collinear PDFs.

The TMD factorization of the singular cross-section is not unique, and the various schemes are related to each other. The original proposal for TMD PDFs by Collins, Soper, and Sterman [177,178,179] is such that the hard factor is absorbed in the definition of TMDs. In the revised definition for the TMD factorization [180], the TMD PDFs have two components: the beam function (one for each incoming hadron) and the soft function. The former describes collinear radiation near the hadron and is defined via matrix elements of non-local operators containing a staple-shaped Wilson line. The latter encodes the soft-gluon effects between the partons. The soft function contains little physical information on incoming hadrons but has practical importance: it cancels the rapidity renormalization scale in the beam function. A combination of the two components is necessary to extract the TMDs. It should be noted that both the beam function and the soft function are calculable in lattice QCD.

There have been extensive discussions within the phenomenological community on the factorization related to TMD PDFs and their definitions (see, e.g., Ref. [181]). Recently, such discussions became relevant within the lattice QCD community due to the increased interest to generalize methods initially developed for PDFs, to access TMD PDFs. Lattice calculations of matrix elements with non-local operators related to TMDs, their renormalization, and the soft function have recently emerged. However, the field is still in its infancy, and a lot of progress is required. In this article, we briefly summarize the latest progress, mostly on the theoretical side. More details on the first explorations of TMDs from lattice QCD [182,183,184,185,186,187,188] can be found in Ref. [189]. The most recent studies that are discussed here can also be found in Refs. [21, 189].

4.1 Quasi-TMDs approach

The light-cone TMD PDFs involve matrix elements of non-local operators containing a staple-shaped Wilson line. The latter is infinite, and their edges are on or near the light-cone. Therefore, they cannot be accessed directly in lattice QCD. A way around this issue is to calculate correlation functions with space-like separated partons, as done for PDFs and GPDs, and then properly match them to their light-cone counterparts. The quasi-distributions approach is the first to be explored for TMDs, and recent work can be found in Refs. [21, 190,191,192,193,194,195,196,197,198].

A prescription to obtain the Collins-Soper evolution kernel non-perturbatively from ratios of quasi-TMDs formed at different momenta is given in Ref. [192]. Ref. [193] presents a study of quasi-TMD PDFs and ratios of impact-parameter quasi-TMDs, which can be matched to their light-cone counterparts. Also, the appropriate matching formula is discussed. Reference [194] presents a renormalization prescription and matching of quasi-TMDs as defined in Ref. [193]. A matching kernel of ratios of TMDs for different spin structures is proposed in Ref. [198].

The methodology proposed in Refs. [192,193,194] was implemented numerically in Ref. [199] for the renormalization using the gradient flow method, and in Ref. [200] for the Collins-Soper kernel. The calculation is performed in quenched LQCD with a pion mass of 1.207 GeV. The hadron is boosted with \(P_3=1.29,\,1.94,\,2.58\) GeV and for transverse parton momentum \(q_T\) in the range of 250 MeV and 2 GeV. The staple extent, \(\eta \), is chosen 0.6, 0.7 and 0.8 fm, while the asymmetry in the staple, \(b_T\), is between \(-(\eta - a)\) and \((\eta - a)\), with \(a=0.06\) fm. The extracted Collins-Soper kernel is shown in Fig. 21, together with results from perturbation theory. An agreement is observed only around \(b_T\sim 0.2 - 0.3\) fm. This indicates non-negligible systematic effects in the lattice data, such as power corrections in the small-\(b_T\) region. The inverse problem’s challenges also arise in these calculations, as the Fourier transforms over \(b_T\) is required. A way to perform the Fourier transform with limited data is to apply fits on the lattice data, as performed in Ref. [200]. However, an extensive study of the systematic uncertainties is necessary to control unwanted effects.

Fig. 21
figure 21

Lattice results on the Collins–Soper evolution kernel as a function of \(b_T\). The interpolation of the unsubtracted quasi-TMD PDF using Hermite and Bernstein polynomial bases are shown with red diamonds and purple triangles, respectively. Results from perturbation theory [201, 202] are shown with dashed and solid lines. Source: Ref. [200]. The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 2 GeV. Article published under the terms of the Creative Commons Attribution 4.0 International license

The study of the soft function as defined in the Collins–Soper TMD factorization is discussed in Ref. [195]. One of the findings is that the soft function can be defined as the form factor of a pair of color sources traveling at almost the speed of lights and can be calculated using lattice Heavy-Quark Effective Theory (HQET). However, this is of little practical use for lattice calculations. An alternative proposal is to extract the soft function through the factorization of a fast-moving light-meson form factor, which can be combined with the quasi-TMD wavefunction. In the follow-up calculation of Ref. [196], the matching kernel is extracted, based on the definition of Ref. [195]. Reference [203] presents a lattice calculation of the rapidity-independent part of soft function presented in the transverse coordinate space, indicated as \(b_\perp \). The calculation is performed using one \(N_f=2+1\) CLS ensemble of clover fermions with pion mass 333 MeV in the sea sector and 547 MeV in the valence sector. The momentum boost is up to 2.1 GeV. The final results for the two ensembles are shown in Fig. 22 and are compared with perturbative results, as well as the results from Ref. [200]. We note that the comparison is qualitative, as the latter calculation is quenched and at a very heavy pion mass. This has the benefits of small statistical uncertainties. An agreement is observed between the different lattice results within the errors. The results of Ref. [203] are compatible with the 3-loop perturbative results up to \(b_T\sim 0.4\) fm.

Fig. 22
figure 22

The Collins–Soper kernel from the lattice calculation is shown with blue and red points, and is compared to the quenched calculation of [200]. Results from perturbation theory [201, 202] are shown with solid and dashed lines, for 1-loop, 2-loop and 3-loop estimates. The results are given in the \(\overline{\mathrm{MS}}\) scheme at a scale of 2 GeV. Source: Ref. [203]. Article published under the terms of the Creative Commons Attribution 4.0 International license

Table 1 Parameters of various lattice calculations and comparison of the noise-to-signal ratio

5 Closing remarks

The work presented in this review demonstrates the advancement of lattice QCD in novel methods of obtaining PDFs, GPDs, and TMD PDFs that, only a decade ago, were considered impossible to calculate “directly”. In particular, we have demonstrated that PDFs calculations have reached an advanced level, with dedicated studies of various sources of systematic uncertainties. Given the progress, precision calculations with controlled uncertainties may be available in the next few years. Gluon PDFs and flavor-singlet quark PDFs are actively being pursued with encouraging results. This direction progressed more slowly than the non-singlet PDFs, as the calculation of disconnected contributions of non-local operators and boosted hadrons are computationally very challenging. The natural extension of the methods to access x-dependent distribution functions is studies of GPDs. Interestingly, only exploratory studies are available, providing the x-dependence of GPDs for selected values of the momentum transfer at zero skewness or limited values of nonzero skewness. This is because the introduction of momentum transfer comes with several complications, most of the computational. The above makes the computation of GPDs much more challenging than the form factors due to the former’s frame-dependence. A serious effort is invested in TMD PDFs studies, with most of the work being on theoretical developments. The lattice calculations are still very preliminary, and some of them at very heavy pion mass. A lot more numerical explorations are expected to follow.

As concluded from the main text, the various methods to access x-dependent distribution functions are based on different theoretical formulations and are susceptible to different systematic uncertainties. Regardless, all methods pursue the same physical quantities, even though the factorization procedure is realized differently. Some methods rely directly on large hadron momentum, \(P_3\), and others on large Ioffe time, \(z P_3\). Regardless of the classification of the factorization (LaMET, or short-distance), the calculations of matrix elements with boosted hadrons and non-local operators are computationally intensive. It is apparent that there are limitations to the value of \(P_3\) due to the computational cost and the value of a. For coordinate space factorization, there are also limitations on z, which has to be small and within the perturbative region. Since there is a need for large \(P_3\), or large \(z P_3\), it is undoubtedly desirable to explore the lattice calculations’ current capabilities. Often, there is the misconception that large momentum can be achieved with controlled uncertainties. Instead of making assumptions based on phenomenological models or otherwise, we take a close look at the current calculations to address whether it is feasible to reach high momentum without compromising the reliability of the results. Such a question becomes more pressing for simulations near or at the physical point.

Table 1 presents some of the parameters of recent calculations presented in this article, either for the pion or the nucleon. The main focus is on the noise-to-signal ratio, \(\displaystyle \frac{n}{s}\), at the maximum value of the matrix elements. For the real part of the matrix elements, the peak is at \(z=0\), while for the imaginary part is at some intermediate value of z depending on the quantity under study. As most calculations include multiple values of \(P_3\), we present \(\displaystyle \frac{n}{s}\) at the highest \(P_3\), which will provide some indication of the challenges to increase \(P_3\) for non-local operators. It should be emphasized that the noise-to-signal percentages should not be directly compared between methods, as the focus of each approach is different. For example, the pseudo-PDF method and current–current correlators method utilize all values of \(P_3\), while for the quasi-PDFs analysis, one uses the data at the highest possible momentum. Therefore, the control of statistical uncertainties in the former methods can and is achieved at low momenta. For example, suppose one would analyze the data of Ref. [92] within the quasi-PDFs method. In that case, the maximum momentum, which leads to reliable renormalized results for a wide range of z values, is \(P_3\sim 1.3\) GeV for the 172 MeV ensemble. However, the results at \(P_3=2.1\) GeV are accurate enough for small values of z within the pseudo-ITDs renormalization scheme. As can be seen, the picture is consistent from all calculations: \(P_3\) close or beyond 2 GeV cannot be reached reliably. This is particularly true for ensembles with a pion mass near or at the physical point.

The general picture of Table 1 should not be viewed as a major disadvantage of the field but as a motivation to develop and explore novel approaches, innovative techniques, and algorithms.Footnote 11 Such an example is the momentum smearing method [45], first explored in Ref. [41] for non-local operators. Since then, it has been integrated into all calculations. This direction has been instrumental in allowing access to higher values of \(P_3\). A recent exploration of the momentum-smearing within the distillation framework can be found in Ref. [206]. We will soon enter the exascale computing era, which promises more computing power. The combination of software and hardware developments will certainly benefit calculations of non-local operators, leading to precision calculations with controlled uncertainties.

The purpose of this review is to highlight the many successes of the field and point out the aspects that need further exploration. Regardless of the necessary improvements, there is no doubt that accessing x-dependent parton distributions from lattice QCD is a significant success towards understanding the complex structure of hadrons and gains visibility from the experimental and theoretical nuclear physics communities.