1 Introduction

As in [4, 5], let \({\mathbb {G}}\) under \(\odot \) be a group of order r generated by g, and let

$$\begin{aligned} x = [d] \, g = \underbrace{g \odot g \odot \cdots \odot g}_{d \text { times}}. \end{aligned}$$

Given g and x the discrete logarithm problem (DLP) is to compute \(d = \log _g x\). In the general DLP \(0 \le d < r\), whereas d is smaller than r by some order of magnitude in the short DLP. The order r may be assumed to be known or unknown. Both cases are cryptographically relevant.

1.1 Earlier works

In 2016 Ekerå [4] introduced a modified version of Shor’s algorithm [16, 17] for computing discrete logarithms that is more efficient than Shor’s original algorithm when the logarithm is short. This work was originally motivated by the use of short discrete logarithms in instantiations of cryptographic schemes based on the computational intractability of the DLP in finite fields. A concrete example is the use of short exponents in the Diffie–Hellman key exchange protocol when instantiated with safe-prime groups.

This work was subsequently generalized by Ekerå and Håstad [5] so as to enable tradeoffs between the number of times that the algorithm needs to be executed, and the requirements it imposes on the quantum computer. These ideas parallel earlier ideas by Seifert [15] for making tradeoffs in Shor’s order finding algorithm; the quantum part of Shor’s general factoring algorithm.

Ekerå and Håstad furthermore explained how the RSA integer factoring problem may be expressed as a short DLP. This gives rise to a new algorithm for factoring RSA integers that does not rely directly on order-finding, and that imposes less requirements on the quantum computer in each run compared to Shor’s or Seifert’s general factoring algorithms.

1.2 Background on quantum cost metrics

The hard part when implementing Shor’s algorithms and their various derivatives in practice is to exponentiate group elements in superposition. The algorithms of Seifert, Ekerå and Ekerå–Håstad reduce the requirements on the quantum computer compared to Shor by reducing the exponent length in each run.

In virtually all practical implementations, an exponent length reduction translates into a reduction in the number of controlled group operations that need to be evaluated quantumly.Footnote 1 This in turn translates into a reduction in the number of logical qubit operations, the circuit depth, the runtime and the required coherence time. The number of logical qubits required is typically not reduced, however. This is because control qubits can be recycled [10, 11] in implementations.

When accounting for the need for quantum error correction, the number of physical qubits required may nevertheless be reduced as a result of reducing the exponent length, and it is physical rather than logical qubit counts that matter in practice. To understand why physical qubits may be saved, note that quantum error correcting codes, such as the surface code (see [6] for a good overview), use an array of physical qubits to construct each logical qubit. In simple terms; the fewer the number of operations that need be performed with respect to a logical qubit, for some fixed maximum tolerable probability of uncorrectable errors arising, the fewer the number of physical qubits needed to construct said logical qubit.

Quantum spacetime volume, the product of the runtime and the number of physical qubits required, is sometimes used as a complexity metric in the literature. For the reasons explained above, a reduction in the exponent length typically gives rise to a reduction in the spacetime volume in each run, and in some cases overall.

1.3 Our contributions

To compute an m bit short discrete logarithm d, the quantum algorithm of Ekerå and Håstad exponentiates group elements in superposition to \(m+2m/s\) bit exponents for \(s \ge 1\) an integer. Given a set of s good outputs, the classical post-processing algorithm in [5] recovers d by enumerating vectors in an \(s+1\)-dimensional lattice. The probability of observing a good output in a run is lower bounded by 1/8 in [5], so we expect to find s good outputs after at most about 8s runs. As good outputs cannot be efficiently recognized, d is recovered in [5] by exhaustively post-processing all subsets of s outputs from a set of about 8s outputs.

We refer to s as the tradeoff factor, as it controls the tradeoff between the number of runs required and the exponent length in each run. To minimize the exponent length we would like to select s large. However, the fact that the post-processing complexity grows exponentially in s limits the achievable tradeoff. Furthermore, the fact that we expect to have to perform about 8s runs implies that an overall reduction in the number of group operations performed quantumly may not be achieved, even if the number of operations in each run is reduced.

In this work, we analyze and capture the probability distribution induced by the quantum algorithm of Ekerå and Håstad. This analysis allows us to tightly estimate the probability of observing a good output, and to show that it is significantly higher than the lower bound of 1/8 in [5] guarantees.

More importantly, the fact that we can capture the probability distribution induced by the quantum algorithm of Ekerå and Håstad for known d enables us to classically simulate the quantum algorithm by sampling the distribution. This simulator is in itself a key contribution. Inspired by our improved understanding of the distribution, we design an improved and considerably more efficient classical post-processing algorithm. We then use the simulator to heuristically estimate the number of runs required to solve for d for different tradeoff factors, and to practically verify the heuristic estimates by post-processing simulated outputs.

With our new post-processing and analysis, Ekerå–Håstad’s algorithm achieves an advantage over Shor’s algorithms, not only in each individual run, but also overall, when targeting cryptographically relevant instances of RSA and Diffie–Hellman with short exponents. When making tradeoffs, Ekerå–Håstad’s algorithm with our new post-processing furthermore outperforms Seifert’s algorithm in practice when factoring RSA integers. To the best of our knowledge, this makes Ekerå–Håstad’s algorithm the currently most efficient polynomial time quantum algorithm for breaking the RSA cryptosystem [12], and DLP-based cryptosystems such as Diffie–Hellman [3] when instantiated with safe-prime groups with short exponents, as in TLS [7], IKE [8] and standards such as NIST SP 800-56A [2].

For example, compared to Shor’s original algorithms, Ekerå–Håstad can achieve a reduction in the number of group operations that are evaluated quantumly in each run, of a factor 6.1 or 14.2 for FF-DH-2048 in safe-prime groups with 224 bit exponents, and of a factor 1.35 or 3.6 for RSA-2048, depending on whether tradeoffs are made. When not making tradeoffs, a single run of Ekerå–Håstad in general suffices. For other options and technical details, please see Appendix A.

Both RSA and Diffie–Hellman are widely deployed cryptosystems. It is important to understand the complexity of attacking these systems quantumly to project the remaining lifetime of confidential information protected by these systems, and to inform migration strategies. This paper helps further this understanding.

1.4 Overview

The remainder of this paper is structured as follows:

We recall the quantum algorithm of Ekerå and Håstad in Sect. 2 and we analyze the probability distribution it induces in Sects. 34. In particular, we derive a closed form expression for the probability of observing a given output.

In Sect. 5 we use the closed form expression to generate a high resolution histogram for the probability distribution, and describe how it may be sampled to classically simulate the quantum algorithm, for known d. In Sect. 6 we introduce our improved post-processing algorithm. We furthermore use the simulator to heuristically estimate and experimentally verify the number of runs required to solve for d. We summarize and conclude the paper in Sect. 7. Concrete cost estimates for RSA and Diffie–Hellman are given in Appendix A.

1.5 Notation

Before proceeding, we introduce notation used throughout this paper:

  • \(u \text { mod } n\) denotes u reduced modulo n constrained to \(0 \le u \text { mod } n < n\).

  • \(\{ u \}_n\) denotes u reduced modulo n constrained to \(-n/2 \le \{ u \}_n < n/2\).

  • \(\left\lceil u \right\rceil \), \(\left\lfloor u \right\rfloor \) and \(\left\lceil u \right\rfloor \) denotes u rounded up, down and to the closest integer.

  • \(\left| a + ib\right| = \sqrt{a^2 + b^2}\) where \(a, b \in {\mathbb {R}}\) denotes the Euclidean norm of \(a + ib\).

  • \(\left| \, {\mathbf {u}} \, \right| \) denotes the Euclidean norm of the vector \({\mathbf {u}} = \left( u_0, \, \ldots , \, u_{n-1} \right) \in {\mathbb {R}}^n\).

  • \(u \lll v\) is used to denote that u is less than v by some order of magnitude.

  • \(u \sim v\) is used to denote that u and v are approximately of similar size.

2 The quantum algorithm

Recall that given a generator g of a finite cyclic group of order r and \(x = [d]g\), where d is such that \(0< d < 2^m \lll r\), the quantum algorithm of Ekerå and Håstad in [5] computes the logarithm \(d = \log _g x\) by inducing the system

$$\begin{aligned} \frac{1}{2^{2\ell +m}} \sum _{a, j = 0}^{2^{\ell +m}-1} \sum _{b, k = 0}^{2^{\ell }-1} \text {exp} \left[ \frac{2 \pi i}{2^{\ell +m}} \; (a j + 2^m b k) \right] \left| \, j, k, [e = a - b d]g \, \right\rangle \end{aligned}$$
(1)

where \(\ell \sim m / s\) is an integer for \(s \ge 1\) the tradeoff factor. Observing (1) yields a pair (jk) for j and k integers on the intervals \(0 \le j < 2^{\ell + m}\) and \(0 \le k < 2^\ell \), respectively, and some group element \(y = [e] g \in {\mathbb {G}}\).

3 The probability distribution

The quantum system in (1) collapses to \(y = [e] g\) and (jk) with probability

$$\begin{aligned} \frac{1}{2^{2(2\ell +m)}} \left| \sum _{a} \, \sum _{b} \,\text {exp} \left[ \frac{2 \pi i}{2^{\ell +m}} \, ( a j + 2^m b k) \right] \right| ^2 \end{aligned}$$

where the sums are over all a and b such that \(e \equiv a - bd \;(\text {mod } r)\). Assuming that \(r \ge 2^{\ell + m} + (2^\ell - 1) d\), this simplifies to \(e = a - bd\). Summing over all e, we obtain the probability P of observing the pair (jk) over all e as

$$\begin{aligned} P&= \frac{1}{2^{2(2\ell +m)}} \sum _e \, \left| \sum _{b = b_0(e)}^{b_1(e) - 1} \text {exp} \left[ \frac{2 \pi i}{2^{\ell +m}}((e + bd) j +2^m b k ) \, \right] \, \right| ^2 \\&= \frac{1}{2^{2(2\ell +m)}} \sum _e \, \left| \, \sum _{b=b_0(e)}^{b_1(e) - 1} \text {exp} \left[ \frac{2 \pi i}{2^{\ell +m}} \, b \{ dj + 2^m k \}_{2^{\ell + m}}\right] \right| ^2 \end{aligned}$$

where we have introduced the functions \(b_0(e)\) and \(b_1(e)\) that determine the length of the summation interval in b. For more information, see Sect. 3.1.

The above analysis implies that the probability P of observing a given pair (jk) is determined by its argument \(\alpha \) or, equivalently, by its angle \(\theta \), where

$$\begin{aligned} \alpha (j,k) = \{ dj + 2^m k \}_{2^{\ell +m}} \quad \text {and}\quad \theta (\alpha ) = \frac{2 \pi \alpha }{2^{\ell +m}} \end{aligned}$$

and the probability

$$\begin{aligned} P(\theta ) = \frac{1}{2^{2(2\ell + m)}} \, \sum _e \, \, \left| \, \sum _{b \, = \, b_0(e)}^{b_1(e) - 1} \, \text {e}^{i\theta b} \, \right| ^2 = \frac{1}{2^{2(2\ell + m)}} \, \sum _e \, \underbrace{\, \left| \, \sum _{b \, = \, 0}^{\#b(e) - 1} \, \text {e}^{i\theta b} \, \right| ^2}_{\zeta (\theta , \, \#b(e))} \end{aligned}$$

where \(\#b(e) = b_1(e) - b_0(e)\). If \(\theta = 0\), then

$$\begin{aligned} \zeta (0, \# b(e)) = \left| \, \sum _{b \, = \, 0}^{\#b(e) - 1} \, \text {e}^{i\theta b} \, \right| ^2&= \left| \, \sum _{b \, = \, 0}^{\#b(e) - 1} \, 1 \, \right| ^2 = (\#b(e))^2. \end{aligned}$$

Otherwise, if \(\theta \ne 0\) then

$$\begin{aligned} \zeta (\theta , \# b(e))&= \left| \, \sum _{b \, = \, 0}^{\#b(e) - 1} \, \text {e}^{i\theta b} \, \right| ^2 = \left| \, \frac{\text {e}^{i\theta \, \#b(e)} - 1}{\text {e}^{i\theta } - 1} \, \right| ^2 = \frac{1-\cos {(\theta \, \#b(e))}}{1 - \cos {\theta }}. \end{aligned}$$

3.1 The summation interval for a given e

Recall that \(e = a - bd\) where \(0< d < 2^m\), \(0 \le a < 2^{\ell +m}\) and \(0 \le b < 2^\ell \). This implies that e is an integer on the interval

$$\begin{aligned} -(2^\ell -1)d \le e = a - bd < 2^{\ell +m}. \end{aligned}$$
(2)

Divide the interval for e into three regions, and denote these regions A, B and C, respectively. Define the middle region B to be the region in e where \(\#b(e) = 2^\ell \) or equivalently \(0 = b_0(e) \le b < b_1(e) = 2^\ell \). Then by (2) region B spans the interval \(0 \le e < 2^{\ell +m} - (2^\ell -1)d\). This is easy to see, as for all b on the interval \(0 \le b < 2^{\ell }\) there must exist an a on \(0 \le a < 2^{\ell + m}\) such that \(e = a - bd\).

It follows that region A to the left of B spans \(-(2^\ell -1)d \le e < 0\) and that region C to the right of B spans \(2^{\ell +m} - (2^\ell -1)d \le e < 2^{\ell +m}\). Regions A and C are hence both of length \((2^\ell -1)d\). Regions A and C are furthermore both divided into \(2^\ell -1\) plateaus of length d, as there is, for each multiple of d that is subtracted from e in these regions, one fewer values of b for which there exists an a such that \(e = a - bd\). The situation that arises is depicted in Fig. 1.

Fig. 1
figure 1

The interval length \(\#b(e) = b_1(e) - b_0(e)\) for \(m = 4\), \(\ell = 2\) and \(d = 13\). Similar staircase-like plots arise irrespective of how the parameters are selected

3.2 The probability of observing (jk) summed over all e

We may now sum \(\zeta (\theta , \#b(e))\) over all e to express

$$\begin{aligned} P(\theta ) = \frac{1}{2^{2(2\ell +m)}} \sum _{e \, = \, -(2^\ell - 1) d}^{2^{\ell +m} - 1} \, \zeta (\theta , \#b(e)) \end{aligned}$$

on closed form by first using that if the sum is split into partial sums over the regions A, B and C, respectively, it follows from the previous two sections that the sums over regions A and C must yield the same contribution.

Furthermore, region B is rectangular, and each plateau in regions A and C is rectangular. Closed form expressions for the contribution from these rectangular regions may be trivially derived. This implies that

$$\begin{aligned} 2^{2(2\ell +m)} \, P(\theta )&= \underbrace{\sum _{e \, = \, -(2^\ell - 1) d}^{-1} \, \zeta (\theta , \#b(e))}_{\text {region A}} \,\, + \underbrace{\sum _{e \, = \, 0}^{2^{\ell +m}-(2^\ell -1)d-1} \, \zeta (\theta , \#b(e))}_{\text {region B}} \,\, \\&\quad +\underbrace{\sum _{e \, = \, 2^{\ell +m}-(2^\ell -1)d}^{2^{\ell +m}-1} \, \zeta (\theta , \#b(e))}_{\text {region C}} \,\, \\&=\underbrace{(2^{\ell +m}-(2^\ell -1)d) \cdot \zeta (\theta , 2^\ell )}_{\text {region B}} \,\, + \,\, \underbrace{2d \cdot \sum _{\#b \, = 1\, }^{2^{\ell }-1} \, \zeta (\theta , \#b)}_{\text {regions A and C}}. \end{aligned}$$

If \(\theta = 0\) then

$$\begin{aligned} \sum _{\#b \, = \, 1}^{2^{\ell }-1} \, \zeta (0, \#b)&= \sum _{\#b \, = \, 1}^{2^{\ell }-1} \, (\#b)^2 = \frac{2^{\ell }}{6} \, (2^\ell - 1) (2^{\ell +1} - 1) \end{aligned}$$

so the probability of observing the pair (jk) over all e is

$$\begin{aligned} P(0) = \,&\frac{1}{2^{2(2\ell +m)}} \cdot \bigg ( (2^{\ell +m}-(2^\ell -1)d) \cdot 2^{2\ell } + \frac{2^{\ell } d}{3} \, (2^\ell - 1) (2^{\ell +1} - 1) \Bigg ). \end{aligned}$$

Otherwise, if \(\theta \ne 0\) then

$$\begin{aligned} \sum _{\#b \, = \, 1}^{2^{\ell }-1} \, \zeta (\theta , \#b)&= \sum _{\#b \, = \, 1}^{2^{\ell }-1} \, \frac{1 - \cos {(\theta \cdot \#b)}}{1 - \cos {\theta }} \\&= \frac{1}{1 - \cos \theta } \left[ \, (2^\ell - 1) - \, \frac{1}{2} \left( \frac{\cos ((2^\ell - 1) \theta ) - \cos 2^\ell \theta }{1 - \cos \theta } - 1 \right) \, \right] \end{aligned}$$

so the probability of observing the pair (jk) over all e is

$$\begin{aligned} P(\theta )&= \, \frac{1}{2^{2(2\ell +m)}} \cdot \frac{1}{1 - \cos \theta } \cdot \bigg ( (2^{\ell +m}-(2^\ell -1)d) \cdot (1 - \cos {2^\ell \theta }) \,\, \\&\quad +2d \cdot \left[ \, (2^\ell - 1) - \, \frac{1}{2} \left( \frac{\cos ((2^\ell - 1) \theta ) - \cos 2^\ell \theta }{1 - \cos \theta } - 1 \right) \, \right] \Bigg ). \end{aligned}$$

3.3 The distribution of pairs (jk) with argument \(\alpha \)

We now proceed to analyze the distribution and multiplicity of angles \(\theta \), or equivalently of arguments \(\alpha \), to complete our analysis.

Definition 1

Let \(\kappa \) denote the greatest integer such that \(2^{\kappa }\) divides d.

Definition 2

An argument \(\alpha \) is said to be admissible if there exists \((j, k) \in {\mathbb {Z}}^2\) where \(0 \le j < 2^{\ell + m}\) and \(0 \le k < 2^{\ell }\) such that \(\alpha = \{ dj + 2^{m} k \}_{2^{\ell + m}}\).

Claim 1

All admissible arguments \(\alpha = \{ dj + 2^m k \}_{2^{\ell + m}}\) are multiples of \(2^{\kappa }\).

Proof

As \(2^{\kappa } \mid d < 2^{m}\) and the modulus is a power of two the claim follows. \(\square \)

Lemma 1

There are \(2^{\ell +m-\kappa }\) admissible \(\alpha \) that each occurs with multiplicity \(2^{\ell +\kappa }\).

Proof

Let \(d = 2^\kappa d'\). Fix some \(\alpha '\) on \(0 \le \alpha ' < 2^{\ell +m-\kappa }\). For any k with \(0 \le k < 2^\ell \) the equivalence \(\alpha ' \equiv d' j' + 2^{m-\kappa } k \,\, (\text {mod } 2^{\ell +m-\kappa })\) has a unique solution in \(j'\) with \(0 \le j' < 2^{\ell +m-\kappa }\). For any such \(j'\), there are \(2^\kappa \) values of j on \(0 \le j < 2^{\ell +m}\) with \(j \equiv j' \,\, (\text {mod } 2^{l+m-\kappa })\). Hence there are \(2^{\ell +\kappa }\) pairs (jk) with \(\alpha = \{ d j + 2^m k \}_{2^{\ell +m}}\). By Claim 1, these are the only admissible \(\alpha \), and so the lemma follows. \(\square \)

3.4 The probability of observing a pair (jk) with argument \(\alpha \)

It follows from Lemma 1 that the probability \({\Phi }(\theta (\alpha ))\) of observing one of the pairs (jk) with argument \(\alpha \) is \({\Phi }(\alpha ) = N(\alpha ) \cdot P \left( \theta (\alpha ) \right) \) where the multiplicity

$$\begin{aligned} N(\alpha ) = \left\{ \begin{array}{ll} 2^{\ell + \kappa } &{} \text {if } \alpha \text { is admissible} \\ 0 &{} \text {otherwise.} \end{array} \right. \end{aligned}$$

4 Understanding the probability distribution

In this section, we introduce the notion of t-good pairs, prove upper bounds on the probability of obtaining a t-good pair, and show how the pairs are distributed as a function of t to build an understanding for the distribution.

Definition 3

A pair (jk) is said to be t-good, for t an integer, if

$$\begin{aligned} \left\{ \begin{array}{lcl} 2^{t-1} \le | \, \alpha (j, k) \,|< 2^t &{} \text { when } &{} 0< t < \ell + m \\ \alpha (j, k) = 0 &{} \text { when } &{} t = 0. \end{array} \right. \end{aligned}$$

Definition 4

Let \(\rho (t)\) denote the probability of observing a t-good pair.

Lemma 2

For \(0< t < \ell + m\), the probability \(\rho (t) \le 2^{1-|m-t|}\).

Proof

For \(0< t < \ell + m\), the probability

$$\begin{aligned} \rho (t) = \sum _{ \begin{array}{c} (j, \, k) \, : \\ 2^{t-1} \le |\, \alpha (j, \, k) \,|< 2^{t} \end{array} } \quad {\Phi }(\alpha (j, k)) = \frac{1}{2^{2(2\ell +m)}} \sum _{2^{t-1} \le |\, \alpha \,| < 2^{t}} \;\;\;\; N(\alpha ) \sum _e \zeta (\theta (\alpha ), \# b(e)) \end{aligned}$$
(3)

where the first sum is over all pairs (jk) such that \(0 \le j < 2^{\ell + m}\), \(0 \le k < 2^\ell \) and \(2^{t-1} \le |\, \alpha (j, \, k) \,| < 2^{t}\). The second sum is over at most \(2^{t - \kappa }\) admissible arguments \(\alpha \) that each occurs with multiplicity \(N(\alpha ) = 2^{\ell + \kappa }\). The third sum is over at most \(2^{\ell + m + 1}\) values of e. It therefore follows from (3) that

(4)

As \(\zeta (\theta , \# b(e)) \le 2^{2\ell }\), it follows immediately from (4) that

$$\begin{aligned} \rho (t)&\le 2^{1-2l-m+t} \cdot 2^{2\ell } = 2^{1 - (m - t)}. \end{aligned}$$
(5)

Furthermore, as \(1 - \cos {(\theta \cdot \#b(e))} \le 2\), and as \(1 - \cos \theta \ge 2 (\theta / \pi )^2\) for \(| \, \theta \, | \le \pi \),

$$\begin{aligned} \zeta (\theta , \# b(e))&= \frac{1 - \cos {(\theta \cdot \#b(e))}}{1 - \cos {\theta }} \le \frac{\pi ^2}{\theta ^2}. \end{aligned}$$
(6)

As \(|\, \theta (\alpha ) \,| = 2 \pi \, |\, \alpha \,| \, / 2^{\ell + m} \ge 2^t \pi / 2^{\ell + m}\), it follows from (4) and (6) that

$$\begin{aligned} \rho (t)&\le 2^{1-2l-m+t} \cdot \frac{\pi ^2}{\theta ^2} \le 2^{1-2l-m+t} \cdot \frac{\pi ^2}{(2^t \pi / 2^{\ell + m})^2} = 2^{1 + (m - t)} \end{aligned}$$
(7)

and so, by combining (5) and (7), the lemma follows. \(\square \)

Corollary 1

The probability of observing a t-good (jk) for t such that \(|\, m - t \,| \le \varDelta \), and \(\varDelta \) some positive integer, is lower-bounded by \(1 - 2^{2-\varDelta } - 2^{1-m+\kappa }\).

Proof

The probability of observing a t-good (jk) for \(t > m + \varDelta \) is

$$\begin{aligned} \sum _{t \, = \, \varDelta +m+1}^{\ell +m-1} \rho (t) \le \sum _{t \, = \, \varDelta +m+1}^{\ell +m-1} 2^{1+m-t} = \sum _{t \, = \, \varDelta +1}^{\ell -1} 2^{1-t} \le \sum _{t \, = \, \varDelta +1}^{\infty } 2^{1-t} = 2^{1-\varDelta }, \end{aligned}$$

while the probability of observing a t-good (jk) for \(0< t < m - \varDelta \) is

$$\begin{aligned} \sum _{t \, = \, 1}^{m-\varDelta -1} \rho (t) \le \sum _{t \, = \, 1}^{m-\varDelta -1} 2^{1+t-m} \le \sum _{t \, = \, -\infty }^{m-\varDelta -1} 2^{1+t-m} = \sum _{t \, = \, -\infty }^{-\varDelta -1} 2^{1+t} = 2^{1-\varDelta }, \end{aligned}$$

where we have used Lemma 2, and that the sum is a geometric series. Finally, the probability of observing a t-good (jk) such that \(t = 0\) is

$$\begin{aligned} \rho (0) = \frac{1}{2^{2(2\ell +m)}} \, N(0) \, \sum _e \zeta (0, \# b(e)) \le \frac{2^{\ell +\kappa } \cdot 2^{\ell +m+1} \cdot 2^{{2\ell }}}{2^{2(2\ell +m)}} = 2^{1-m+\kappa }, \end{aligned}$$

where we have used that \(N(0) = 2^{\ell + \kappa }\), that the sum is over at most \(2^{\ell +m+1}\) values of e, and that \(\zeta (0, \# b(e)) \le 2^{2\ell }\), as in the proof of Lemma 2. Hence, the probability of observing a t-good (jk) such that \(|\, m - t \,| \le \varDelta \) is \(1 - 2 \cdot 2^{1-\varDelta } - 2^{1-m+\kappa }\), where we have used that \(\rho (t)\) sums to one, and so the corollary follows. \(\square \)

Fig. 2
figure 2

Histograms for \(\rho (t)\) for various d. The histograms were computed for \(m = \ell = 256\). Virtually identical histograms arise for any choice of sufficiently large m and \(\ell \)

Recall that \(\rho (t)\) is a probability distribution, and hence normalized, whilst Lemma 2 states that \(\rho (t)\) is exponentially suppressed as \(|\, m-t \,|\) grows large for \(0< t < \ell + m\). The probability mass is hence concentrated on the t-good pairs for \(t \sim m\), except for artificially large \(\kappa \) as \(t = 0\) then attracts a non-negligible fraction of the probability mass. This is formally stated in Corollary 1.

This implies that each pair (jk) observed yields \(\sim \ell \) bits of information on d, as we expect \(|\, \alpha (j, k) = \{dj + 2^m k\}_{2^{\ell +m}} \,| \sim 2^m\), whilst for random (jk) we expect \(|\, \alpha (j, k) \,| \sim 2^{\ell +m}\). The situation that arises for different d is visualized in Fig. 2.

In the region denoted A in Fig. 2, d is odd and hence \(\kappa \) is zero. As d decreases from \(2^m - 1\) to \(2^{m-1} + 1\), the probability mass is redistributed towards slightly smaller values of t. This reflects the fact that it becomes easier to solve for d as d decreases in size in relation to \(2^m\). The redistributive effect weakens if d is further decreased: The histogram in region B where \(d = 2^{m-5} - 1\) is roughly representative of the distribution that arises when d is smaller than \(2^m\) by a few orders of magnitude. Further decreasing d has no significant effect.

In the region denoted C in Fig. 2, d is selected to be divisible by a great power of two to illustrate the redistribution that occurs when \(\kappa \) is close to maximal. As the admissible arguments are multiples of \(2^\kappa \), it follows that \(\rho (t) = 0\) for \(0< t < \kappa \), so the corresponding histogram entries are forced to zero and the probability mass redistributed. This effect is only visible for artificially large \(\kappa \). It does not arise in cryptographic applications where d may be presumed to be random.

Recall that [5] introduces the notion of a “good” pair (jk), which in our vocabulary is a pair (jk) such that \(|\, \alpha (j,k) \,| \le 2^{m-2}\), and that it establishes a lower bound of 1/8 on the probability of observing a good pair. The quantum algorithm is therefore run 8s times and all subsets of s outputs post-processed.

Compared to the original analysis in [5], our new analysis is much more precise: It indicates that the probability of observing a good pair is significantly higher that the 1/8 bound guarantees. Empirically, it is at least 3/10 for random d. This immediately gives rise to an efficiency gain when exhausting subsets.

More importantly, as is stated formally in Corollary 1, our new analysis shows that essentially all observed pairs (jk) are “somewhat good”, in that they yield \(\sim \ell \) bits of information on d. This fact enables us to altogether forego exhausting subsets, giving rise to the improved post-processing algorithm in Sect. 6.

5 Simulating the quantum algorithm

In this section, we combine results from the previous sections to construct a high-resolution histogram for the probability distribution for a given d. We furthermore describe how the histogram may be sampled to simulate the quantum algorithm.

5.1 Constructing the histogram

To construct the high-resolution histogram, we divide the argument axis into regions and subregions and integrate \({\Phi }(\alpha )\) numerically in each subregion.

First, we subdivide the negative and positive sides of the argument axis into \(30+\mu \) regions where \(\mu = \min (\ell -2, 11)\). Each region thus formed may be uniquely identified by an integer \(\eta \) by requiring that for all \(\alpha \) in the region

$$\begin{aligned} 2^{|\eta |} \le | \, \alpha \, | \le 2^{|\eta | + 1} \quad \text { and } \quad \mathrm {sgn}(\alpha ) = \mathrm {sgn}(\eta ) \end{aligned}$$

where \(m-30 \le |\, \eta \,| < m+\mu -1\). Then, we subdivide each region into subregions identified by an integer \(\xi \) by requiring that for all \(\alpha \) in the subregion

$$\begin{aligned} 2^{|\eta | + \xi /2^{\nu }} \le | \, \alpha \,| \le 2^{|\eta | + (\xi + 1)/2^{\nu }} \end{aligned}$$

for \(\xi \) an integer on \(0 \le \xi < 2^{\nu }\) and \(\nu \) a resolution parameter. We fix \(\nu = 11\) to obtain a high degree of accuracy in the tail of the histogram.

For each subregion, we compute the probability mass contained within the subregion by applying Simpson’s rule, followed by Richardson extrapolation to cancel the linear error term. Simpson’s rule is hence applied \(2^{\nu }(1 + 2)\) times in each region. Each application requires \({\Phi }(\alpha \)) to be evaluated in up to three points (the two endpoints and the midpoint). When evaluating \({\Phi }(\alpha )\), we divide the result by \(2^{\kappa }\) to account for the density of admissible pairs.

Note that as the distribution is symmetric around zero, we need only compute one side in practice. Note furthermore that the above approach to constructing the histogram is only valid provided \(\kappa \) is is not artificially large in relation to m, as we must otherwise account for which \(\alpha \) are admissible. This is not a concern in cryptographic applications as d may then be presumed random.

5.2 Sampling the distribution

To sample an argument \(\alpha \) from the histogram for the distribution, we first sample a subregion and then sample \(\alpha \) uniformly at random from the set of all admissible arguments in this subregion. To sample the subregion, we first order all subregions in the histogram by probability, and compute the cumulative probability up to and including each subregion in the resulting ordered sequence. Then, we sample a pivot uniformly at random from \([0, 1) \subset {\mathbb {R}}\), and return the first subregion in the ordered sequence for which the cumulative probability is greater than or equal to the pivot. The sampling operation fails if the pivot is greater than the total cumulative probability.

To sample a pair (jk) from the distribution, we first sample an admissible argument \(\alpha \) as described above, and then sample (jk) uniformly at random from the set of all integer pairs (jk) yielding \(\alpha \) using Lemma 3 below.

Lemma 3

The set of integer pairs (jk) on \(0 \le j < 2^{\ell + m}\) and \(0 \le k < 2^\ell \) that yield the admissible argument \(\alpha \) is given by

$$\begin{aligned} j = \left( \frac{\alpha - 2^m k}{2^{\kappa }} \left( \frac{d}{2^{\kappa }} \right) ^{-1} + 2^{\ell + m - {\kappa }} \, t \right) \,\, \mathrm { mod } \,\, 2^{\ell + m} \end{aligned}$$

as t runs through all integers on \(0 \le t < 2^{\kappa }\) and k through all integers on \(0 \le k < 2^{\ell }\).

Proof

As \(\alpha \equiv dj + 2^m k \,\, \text {mod } 2^{\ell + m}\), it follows by solving for j as in Lemma 1 that

$$\begin{aligned} j = ((\alpha - 2^m k) d^{-1} + 2^{\ell + m - \kappa } \, t) \,\, \mathrm {mod } \,\, 2^{\ell + m}, \end{aligned}$$

as k and t run through \(0 \le k < 2^{\ell }\) and \(0 \le t < 2^{\kappa }\), enumerates the \(2^{\ell +\kappa }\) distinct pairs (jk) that yield the admissible argument \(\alpha \). By dividing both \(\alpha - 2^m k\) and d by \(2^\kappa \), so as to show that the required inverse exists, the lemma follows. \(\square \)

More specifically, we sample integers tk uniformly at random from \(0 \le t < 2^\kappa \) and \(0 \le k < 2^\ell \), and then compute j from \(\alpha \) and tk as in Lemma 3.

6 Our improved post-processing algorithm

We are now ready to introduce our improved post-processing algorithm.

In analogy with the post-processing algorithm in [5], it recovers d from a set \(\{ (j_1, k_1), \, \dots , \, (j_n, k_n) \}\) of n pairs produced by executing the quantum algorithm n times. However, where the original post-processing sets \(n = s\) and requires subsets of s pairs from a larger set of about 8s pairs to be exhaustively solved for d, we allow n to assume larger values and simultaneously solve all n pairs for d.

Note that we may admit larger n because of the analysis in Sect. 4 which shows that even if a pair is not good by the definition in [5], then it is close to being good with overwhelming probability, in the sense that all pairs \((j_i, k_i)\) have associated arguments \(\alpha _i = \alpha (j_i, k_i) = \{ dj_i + 2^m k_i \}_{2^{\ell + m}}\) of size \(|\, \alpha _i \,| \sim 2^m\). This implies that virtually all pairs \((j_i, k_i)\) yield \(\sim \ell \) bits of information on d.

In analogy with the original post-processing algorithm in [5], the set of n pairs is used to form a vector \(\mathbf {v} = ( \,\{ -2^m k_1 \}_{2^{\ell +m}}, \, \dots , \, \,\{ -2^m k_n \}_{2^{\ell +m}}, \, 0 ) \in {\mathbb {Z}}^{D}\) and a D-dimensional integer lattice L with basis matrix

$$\begin{aligned} \left[ \begin{array}{ccccc} j_1 &{} j_2 &{} \cdots &{} j_n &{} 1 \\ 2^{\ell +m} &{} 0 &{} \cdots &{} 0 &{} 0 \\ 0 &{} 2^{\ell +m} &{} \cdots &{} 0 &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} \cdots &{} 2^{\ell +m} &{} 0 \end{array} \right] \end{aligned}$$

where \(D = n + 1\). For some constants \(m_1, \, \dots , \, m_n \in {\mathbb {Z}}\), the vector

$$\begin{aligned} {\mathbf {u}} = (\{ dj_1 \}_{2^{\ell +m}} + m_1 2^{\ell +m}, \, \dots , \, \{ dj_n \}_{2^{\ell +m}} + m_n 2^{\ell +m}, \, d ) \in L \end{aligned}$$

is such that the distance

$$\begin{aligned} R = |\, {\mathbf {u}} - \mathbf {v} \, |&= \sqrt{ \, \sum _{i=1}^n \big ( \{ dj_i \}_{2^{\ell +m}} + m_i 2^{\ell +m} - \{ -2^m k_i \}_{2^{\ell +m}} \big )^2 + d^2 \,} \\&= \sqrt{ \, \sum _{i=1}^n \, \underbrace{\{ dj_i + 2^m k_i \}_{2^{\ell +m}}^2}_{\alpha _i^2} + \, d^2 \,} = \sqrt{ \, \sum _{i=1}^n \alpha _i^2 + d^2 \,}. \end{aligned}$$

This implies that \({\mathbf {u}}\) and hence d may be found by enumerating all vectors in L within a D-dimensional hypersphere of radius R centered on \(\mathbf {v}\). The volume of such a hypersphere is

$$\begin{aligned} V_D(R) = \frac{\pi ^{D/2}}{\varGamma (\frac{D}{2}+1)} R^D. \end{aligned}$$
(8)

For comparison, the fundamental parallelepiped in L contains a single lattice vector and is of volume \(\det L = 2^{(\ell +m)n}\). Heuristically, we therefore expect the hypersphere to contain approximately \(v = V_D(R) \, / \det L\) lattice vectors. The exact number depends on placement of the hypersphere in \({\mathbb {Z}}^D\) and on the shape of the fundamental parallelepiped in L.

Assuming v to be sufficiently small for it to be computationally feasible to enumerate all lattice vectors in the hypersphere in practice, the above algorithm may be used to recover d. As the volume quotient v decreases in n, the number of vectors that need to be enumerated may be reduced by running the quantum algorithm more times and including the resulting pairs in L. However, there are limits to the number of pairs that may be included in L, as a reduced basis must be computed to enumerate L, and the complexity of computing such a basis grows fairly rapidly in the dimension of L.

In what follows, we show how to heuristically estimate the minimum number of runs n required to solve a specific known problem instance for d with minimum success probability q, for a given tradeoff factor s, and for a given bound on the number of vectors v that we at most accept to enumerate in L.

6.1 Estimating the minimum n required to solve for d

The radius R depends on the pairs \((j_i, k_i)\) via the arguments \(\alpha _{i}\). For fixed n and fixed probability q, we may estimate the minimum radius \({{\widetilde{R}}}\) such that

$$\begin{aligned} \text {Pr} \left[ \, R = \sqrt{\sum _{i \, = \, 1}^n \alpha _{i}^2 + d^2} \le {{\widetilde{R}}} \, \right] \ge q \end{aligned}$$

by sampling \(\alpha _{i}\) from the probability distribution as in Sect. 6.2. Then

$$\begin{aligned} \text {Pr} \left[ \, v = \frac{V_D(R)}{\det L} \le \frac{V_D({{\widetilde{R}}})}{2^{(\ell + m)n}} \, \right] \ge q, \end{aligned}$$
(9)

providing a heuristic bound on the number of lattice vectors v that at most have to be enumerated that holds with probability at least q.

Given an upper limit on the number of lattice vectors that we accept to enumerate, equation (9) may be used as an heuristic to estimate the minimum value of n such that v is below this limit with probability at least q.

To compute the estimate in practice, we use the heuristic to compute an upper bound on v for \(n = 1, \, 2, \, \dots \) and return the minimum n for which the bound is below the limit on the number of vectors that we accept to enumerate.

As the volume quotient v decreases by approximately a constant factor for every increment in n, the minimum n may be found efficiently via interpolation once the heuristic bound on v has been computed for a few values of n.

6.2 Estimating \({{\widetilde{R}}}\)

To estimate \({{\widetilde{R}}}\) for ms and n, explicitly known d, and a given target success probability q, we sample N sets of n arguments \(\{ \alpha _{1} , \, \dots , \, \alpha _{n} \}\) from the probability distribution as described in Sect. 5.2. For each set, we compute R, sort the resulting list of values in ascending order, and select the value at index \(\left\lceil (N-1) \, q \right\rfloor \) to arrive at our estimate for \({{\widetilde{R}}}\). Note that the constant N controls the accuracy of the estimate. For N sufficiently large in relation to q, and to the variance of the arguments, we expect to obtain sufficiently good estimates. Note furthermore that the sampling of arguments may fail. This occurs when the argument being sampled is not in the regions of the argument axis covered by the histogram.

If the sampling of at least one argument in a set fails, we err on the side of caution and over-estimate \({{\widetilde{R}}}\) by letting \(R = \infty \) for the set. The entries for the failed sets will then all be sorted to the end of the list. If the value of \({{\widetilde{R}}}\) selected from the sorted list is \(\infty \), no estimate is produced.

6.3 Results and analysis

To illustrate the efficiency of our new post-processing algorithm, we heuristically estimate n as a function of m and various s for the hard case \(d = 2^m - 1\), and verify the estimates in practical simulations, in this section:

To this end, we let \(\ell = \left\lceil m / s \right\rceil \) and fix \(q = 0.99\). We fix \(N = 10^6\) when estimating \({{\widetilde{R}}}\), and record the smallest n for which the volume quotient \(v < 2\). Note that the latter requirement ensures that the lattice need not be enumerated. If \(v < 2\) it heuristically suffices to reduce a single basis of dimension \(D = n+1\) and to then apply Babai’s nearest plane algorithm [1] to recover \({\mathbf {u}}\) and hence d from \(\mathbf {v}\). We verify the estimate by sampling \(M = 10^3\) sets of n pairs \(\{ (j_1, k_1), \, \dots , \, (j_n, k_n ) \}\) and solving each set for d with our improved post-processing algorithm. If d is thus recovered, the verification succeeds, otherwise it fails. We record the smallest n such that at most \(M (1 - q) = 10\) verifications fail.

Table 1 The estimated (A) and simulated (B) number of pairs n required for \({\mathbf {u}}\) to be the closest vector to \(\mathbf {v}\) in L allowing d to be recovered via Babai’s algorithm. If \(\mathrm {A} = \mathrm {B}\), we only print \(\mathrm {A}\), otherwise we print \(\mathrm {B} / \mathrm {A}\). Dash indicates no information

Table 1 was produced by executing the procedure described above. As may be seen in the table, the heuristically estimated values of n are in general verified by the simulations, except when v is close to two, in which case the values may differ slightly. Note that for large tradeoff factors s in relation to m, increasing or decreasing n typically has a small effect on v. This may lead to slight instabilities in the estimates, as v may be close to two for several values of n. Note furthermore that the difference in the sample sizes N and M may of course also give rise to slight discrepancies when verifying the estimates.

Compared to the post-processing algorithm originally proposed in [5], that required 8s runs to be performed and all subsets of s outputs from the resulting 8s outputs to be exhaustively solved for d, the new post-processing algorithm is considerably more efficient and achieves considerably better tradeoffs. It furthermore requires considerably fewer quantum algorithm runs. Asymptotically, the number of runs required n tends to \(s+1\) as m tends to infinity for fixed s, when we require a solution to be found without enumerating the lattice, as may be seen in Table 1. If we accept to enumerate a limited number of vectors in the lattice, this may potentially be slightly improved. In particular, a single run then suffices for \(s = 1\).

To reduce the lattice bases, we used the LLL [9] and BKZ [13, 14] algorithms, as implemented in fpLLL 5.2, with default parameters and for BKZ a block size of \(\min (10, n+1)\) for all combinations of m, s and n. For these parameter choices, a basis in general takes at most minutes to reduce and solve for d in a single thread, except when using BKZ for the largest combinations of m and s.

6.4 Further improvements

Above, we conservatively fixed a high minimum success probability \(q=0.99\) and considered the hard case \(d = 2^m - 1\). Furthermore, we required that mapping \(\mathbf {v}\) to the closest vector in L should yield \({\mathbf {u}}\) without enumerating L.

In practice, some of these choices may be relaxed: Instead of requiring \({\mathbf {u}}\) to be the closest vector to \(\mathbf {v}\) in L, we may enumerate all vectors in a hypersphere of limited radius centered on \(\mathbf {v}\). In cryptographic applications, the logarithm d may in general be assumed to be randomly selected. If not, the logarithm may be randomized; solve \(x \odot [c] \, g\) for \(d+c\) with respect to the basis g for some random c.

7 Summary and conclusion

We have introduced a new efficient post-processing algorithm that is practical for greater tradeoff factors, and requires considerably fewer quantum algorithm runs, than the original post-processing algorithm in [5]. To develop the new post-processing algorithm, and to estimate the number of runs it requires, we have analyzed the probability distribution induced by the quantum algorithm, and developed a method for simulating it when d is known.

With our new analysis and post-processing, Ekerå–Håstad’s algorithm achieves an advantage over Shor’s algorithms, not only in each individual run, but also overall, when targeting cryptographically relevant instances of RSA and Diffie–Hellman with short exponents. When making tradeoffs, Ekerå-Håstad’s algorithm with our new post-processing furthermore outperforms Seifert’s algorithm when factoring RSA integers. To the best of our knowledge, this makes Ekerå–Håstad’s algorithm the currently most efficient polynomial time quantum algorithm for breaking the RSA cryptosystem [12], and DLP-based cryptosystems such as Diffie–Hellman [3] when instantiated with safe-prime groups with short exponents, as in TLS [7], IKE [8] and standards such as NIST SP 800-56A [2]. The reader is referred to Appendix A for a detailed analysis supporting these claims.