Skip to main content
Log in

Sparse Multi-Reference Alignment: Phase Retrieval, Uniform Uncertainty Principles and the Beltway Problem

  • Published:
Foundations of Computational Mathematics Aims and scope Submit manuscript

Abstract

Motivated by cutting-edge applications like cryo-electron microscopy (cryo-EM), the Multi-Reference Alignment (MRA) model entails the learning of an unknown signal from repeated measurements of its images under the latent action of a group of isometries and additive noise of magnitude \(\sigma \). Despite significant interest, a clear picture for understanding rates of estimation in this model has emerged only recently, particularly in the high-noise regime \(\sigma \gg 1\) that is highly relevant in applications. Recent investigations have revealed a remarkable asymptotic sample complexity of order \(\sigma ^6\) for certain signals whose Fourier transforms have full support, in stark contrast to the traditional \(\sigma ^2\) that arise in regular models. Often prohibitively large in practice, these results have prompted the investigation of variations around the MRA model where better sample complexity may be achieved. In this paper, we show that sparse signals exhibit an intermediate \(\sigma ^4\) sample complexity even in the classical MRA model. Further, we characterize the dependence of the estimation rate on the support size s as \(O_p(1)\) and \(O_p(s^{3.5})\) in the dilute and moderate regimes of sparsity respectively. Our techniques have implications for the problem of crystallographic phase retrieval, indicating a certain local uniqueness for the recovery of sparse signals from their power spectrum. Our results explore and exploit connections of the MRA estimation problem with two classical topics in applied mathematics: the beltway problem from combinatorial optimization, and uniform uncertainty principles from harmonic analysis. Our techniques include a certain enhanced form of the probabilistic method, which might be of general interest in its own right.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Emmanuel Abbe, Tamir Bendory, William Leeb, João M Pereira, Nir Sharon, and Amit Singer. Multireference alignment is easier with an aperiodic translation distribution. IEEE Transactions on Information Theory, 65(6):3565–3584, 2018.

  2. Emmanuel Abbe, João M Pereira, and Amit Singer. Estimation in the group action channel. In 2018 IEEE International Symposium on Information Theory (ISIT), pages 561–565. IEEE, 2018.

  3. Afonso S Bandeira, Moses Charikar, Amit Singer, and Andy Zhu. Multireference alignment using semidefinite programming. In Proceedings of the 5th conference on Innovations in theoretical computer science, pages 459–470. ACM, 2014.

  4. Afonso S Bandeira, Philippe Rigollet, and Jonathan Weed. Optimal rates of estimation for multi-reference alignment. arXiv preprint arXiv:1702.08546, 2017.

  5. LD Barron. Symmetry and molecular chirality. Chemical Society Reviews, 15(2):189–223, 1986.

    Article  Google Scholar 

  6. Alberto Bartesaghi, Alan Merk, Soojay Banerjee, Doreen Matthies, Xiongwu Wu, Jacqueline LS Milne, and Sriram Subramaniam. 2.2 å resolution cryo-em structure of \(\beta \)-galactosidase in complex with a cell-permeant inhibitor. Science, 348(6239):1147–1151, 2015.

  7. Robert Beinert and Gerlind Plonka. Sparse phase retrieval of one-dimensional signals by prony’s method. Frontiers in Applied Mathematics and Statistics, 3:5, 2017.

    Article  Google Scholar 

  8. Ahmad Bekir. On the nonexistence of additional counterexamples to Sophie Piccard’s theorem. University of Southern California, 2004.

  9. Ahmad Bekir and Solomon W Golomb. There are no further counterexamples to s. piccard’s theorem. IEEE transactions on information theory, 53(8):2864–2867, 2007.

  10. Tamir Bendory, Robert Beinert, and Yonina C Eldar. Fourier phase retrieval: Uniqueness and algorithms. In Compressed Sensing and its Applications, pages 55–91. Springer, 2017.

  11. Tamir Bendory, Nicolas Boumal, Chao Ma, Zhizhen Zhao, and Amit Singer. Bispectrum inversion with application to multireference alignment. IEEE Transactions on Signal Processing, 66(4):1037–1050, 2017.

    Article  MathSciNet  MATH  Google Scholar 

  12. Tamir Bendory and Dan Edidin. Toward a mathematical theory of the crystallographic phase retrieval problem. SIAM Journal on Mathematics of Data Science, 2(3):809–839, 2020.

    Article  MathSciNet  MATH  Google Scholar 

  13. Tamir Bendory, Dan Edidin, William Leeb, and Nir Sharon. Dihedral multi-reference alignment. IEEE Transactions on Information Theory, 2022.

  14. Gary S Bloom. A counterexample to a theorem of s. piccard. Journal of Combinatorial Theory, Series A, 22(3):378–379, 1977.

  15. Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities: A nonasymptotic theory of independence. Oxford University Press, 2013.

    Google Scholar 

  16. Nicolas Boumal, Tamir Bendory, Roy R Lederman, and Amit Singer. Heterogeneous multireference alignment: A single pass approach. In 2018 52nd Annual Conference on Information Sciences and Systems (CISS), pages 1–6. IEEE, 2018.

  17. Lisa Gottesfeld Brown. A survey of image registration techniques. ACM computing surveys (CSUR), 24(4):325–376, 1992.

    Article  Google Scholar 

  18. Victor-Emmanuel Brunel. Learning rates for gaussian mixtures under group action. In Conference on Learning Theory, pages 471–491. PMLR, 2019.

  19. A David Buckingham. Chirality in nmr spectroscopy. Chemical physics letters, 398(1–3):1–5, 2004.

  20. Philip R Bunker and Per Jensen. Molecular symmetry and spectroscopy, volume 46853. NRC Research Press, 2006.

  21. Emmanuel J Candes, Yonina C Eldar, Thomas Strohmer, and Vladislav Voroninski. Phase retrieval via matrix completion. SIAM review, 57(2):225–251, 2015.

  22. Robert Diamond. On the multiple simultaneous superposition of molecular structures by rigid body transformations. Protein Science, 1(10):1279–1287, 1992.

    Article  Google Scholar 

  23. Ian L. Dryden and Kanti V. Mardia. Statistical shape analysis. Wiley series in probability and statistics. Wiley, Chichester [u.a.], 1998.

  24. Veit Elser, Ti-Yen Lan, and Tamir Bendory. Benchmark problems for phase retrieval. SIAM Journal on Imaging Sciences, 11(4):2429–2455, 2018.

    Article  MathSciNet  MATH  Google Scholar 

  25. Zhou Fan, Roy R Lederman, Yi Sun, Tianhao Wang, and Sheng Xu. Maximum likelihood for high-noise group orbit estimation and single-particle cryo-em. arXiv preprint arXiv:2107.01305, 2021.

  26. Zhou Fan, Yi Sun, Tianhao Wang, and Yihong Wu. Likelihood landscape and maximum likelihood estimation for the discrete orbit recovery model. arXiv preprint arXiv:2004.00041, 2020.

  27. Charles L Fefferman. The uncertainty principle. Bulletin of the American Mathematical Society, 9(2):129–206, 1983.

  28. James R Fienup. Phase retrieval algorithms: a personal tour. Applied optics, 52(1):45–56, 2013.

  29. Gerald B Folland and Alladi Sitaram. The uncertainty principle: a mathematical survey. Journal of Fourier analysis and applications, 3(3):207–238, 1997.

  30. Hassan Foroosh, Josiane B Zerubia, and Marc Berthod. Extension of phase correlation to subpixel registration. IEEE transactions on image processing, 11(3):188–200, 2002.

  31. Roberto Gil-Pita, Manuel Rosa-Zurera, P Jarabo-Amores, and Francisco López-Ferreras. Using multilayer perceptrons to align high range resolution radar signals. In International Conference on Artificial Neural Networks, pages 911–916. Springer, 2005.

  32. Kishore Jaganathan, Samet Oymak, and Babak Hassibi. Recovery of sparse 1-d signals from the magnitudes of their fourier transform. In 2012 IEEE International Symposium on Information Theory Proceedings, pages 1473–1477. IEEE, 2012.

  33. Kishore Jaganathan, Samet Oymak, and Babak Hassibi. Sparse phase retrieval: Uniqueness guarantees and recovery algorithms. IEEE Transactions on Signal Processing, 65(9):2402–2410, 2017.

    Article  MathSciNet  MATH  Google Scholar 

  34. Anya Katsevich and Afonso Bandeira. Likelihood maximization and moment matching in low snr gaussian mixture models. arXiv preprint arXiv:2006.15202, 2020.

  35. J Kormylo and J Mendel. Maximum likelihood detection and estimation of bernoulli-gaussian processes. IEEE transactions on information theory, 28(3):482–488, 1982.

    Article  MathSciNet  MATH  Google Scholar 

  36. Rick P Millane. Phase retrieval in crystallography and optics. JOSA A, 7(3):394–411, 1990.

  37. Henrik Ohlsson and Yonina C Eldar. On conditions for uniqueness in sparse phase retrieval. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1841–1845. IEEE, 2014.

  38. Wooram Park and Gregory S Chirikjian. An assembly automation approach to alignment of noncircular projections in electron microscopy. IEEE Transactions on Automation Science and Engineering, 11(3):668–679, 2014.

  39. Wooram Park, Charles R Midgett, Dean R Madden, and Gregory S Chirikjian. A stochastic kinematic model of class averaging in single-particle electron microscopy. The International journal of robotics research, 30(6):730–754, 2011.

  40. Amelia Perry, Jonathan Weed, Afonso S Bandeira, Philippe Rigollet, and Amit Singer. The sample complexity of multireference alignment. SIAM Journal on Mathematics of Data Science, 1(3):497–517, 2019.

  41. Sophie Piccard. Sur les ensembles de distances des ensembles de points d’un espace Euclidien. Paris, 1939.

  42. Juri Ranieri, Amina Chebira, Yue M Lu, and Martin Vetterli. Phase retrieval for sparse signals: Uniqueness conditions. arXiv preprint arXiv:1308.3058, 2013.

  43. K Veera Reddy. Symmetry and Spectroscopy of Molecules. New Age International, 1998.

  44. Ya’Acov Ritov. Estimating a signal with noisy nuisance parameters. Biometrika, 76(1):31–37, 1989.

    Article  MathSciNet  MATH  Google Scholar 

  45. Dirk Robinson, Sina Farsiu, and Peyman Milanfar. Optimal registration of aliased images using variable projection with applications to super-resolution. The Computer Journal, 52(1):31–42, 2007.

    Article  Google Scholar 

  46. Elad Romanov, Tamir Bendory, and Or Ordentlich. Multi-reference alignment in high dimensions: sample complexity and phase transition. SIAM Journal on Mathematics of Data Science, 3(2):494–523, 2021.

    Article  MathSciNet  MATH  Google Scholar 

  47. David M Rosen, Luca Carlone, Afonso S Bandeira, and John J Leonard. Se-sync: A certifiably correct algorithm for synchronization over the special euclidean group. The International Journal of Robotics Research, 38(2-3):95–125, 2019.

  48. Mark Rudelson and Roman Vershynin. On sparse reconstruction from fourier and gaussian measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 61(8):1025–1045, 2008.

    Article  MathSciNet  MATH  Google Scholar 

  49. B.m. Sadler. Shift and rotation invariant object reconstruction using the bispectrum. Workshop on Higher-Order Spectral Analysis, 1989.

  50. Sjors HW Scheres, Mikel Valle, Rafael Nuñez, Carlos OS Sorzano, Roberto Marabini, Gabor T Herman, and Jose-Maria Carazo. Maximum-likelihood multi-reference refinement for electron microscopy images. Journal of molecular biology, 348(1):139–149, 2005.

  51. Yoav Shechtman, Yonina C Eldar, Oren Cohen, Henry Nicholas Chapman, Jianwei Miao, and Mordechai Segev. Phase retrieval with application to optical imaging: a contemporary overview. IEEE signal processing magazine, 32(3):87–109, 2015.

  52. Fred J Sigworth. A maximum-likelihood approach to single-particle image refinement. Journal of structural biology, 122(3):328–339, 1998.

  53. Devika Sirohi, Zhenguo Chen, Lei Sun, Thomas Klose, Theodore C Pierson, Michael G Rossmann, and Richard J Kuhn. The 3.8 å resolution cryo-em structure of zika virus. Science, 352(6284):467–470, 2016.

  54. Charles Soussen, Jérôme Idier, David Brie, and Junbo Duan. From bernoulli–gaussian deconvolution to sparse signal restoration. IEEE Transactions on Signal Processing, 59(10):4572–4584, 2011.

  55. Terence Tao. Structure and randomness: pages from year one of a mathematical blog. American Mathematical Society, Providence, RI, 2008.

    Book  MATH  Google Scholar 

  56. Douglas L Theobald and Phillip A Steindel. Optimal simultaneous superpositioning of multiple structures with missing data. Bioinformatics, 28(15):1972–1979, 2012.

  57. Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge University Press, 2000.

  58. Eugene Wigner. Group theory: and its application to the quantum mechanics of atomic spectra, volume 5. Elsevier, 2012.

  59. J Portegies Zwart, René van der Heiden, Sjoerd Gelsema, and Frans Groen. Fast translation invariant classification of hrr range profiles in a zero phase representation. IEE Proceedings-Radar, Sonar and Navigation, 150(6):411–418, 2003.

Download references

Acknowledgements

SG was supported in part by the MOE grants R-146-000-250-133, R-146-000-312-114 and MOE-T2EP20121-0013. PR was supported by the NSF awards DMS-1712596, IIS-1838071, DMS-2022448, and DMS-210637. The authors would like to thank Victor-Emmanuel Brunel for stimulating discussions that shaped the direction of this project, Tamir Bendory for bringing to their attention recent literature on phase retrieval, and Michel Goemans for pointing them to the partial digest problem. The authors are grateful to the anonymous referees for their meticulous reading of the manuscript and their prescient suggestions towards its improvement, and especially for pointing out important connections of the present work to the problem of crystallographic phase retrieval.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Subhroshekhar Ghosh.

Additional information

Communicated by Rachel Ward.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Appendix: Additional Notations

Definition 37

Let \(\{X_n\}_{n \ge 1}\) be a sequence of non-negative random variables, and \(\{a_n\}_{n \ge 1}\) is a sequence of positive numbers (deterministic or random). Then:

  • By the statement \(X_n=O_p(a_n)\) we mean that, for every \(\varepsilon >0\), there exists \(0<C(\varepsilon )<\infty \) such that

    $$\begin{aligned} \liminf _{n \rightarrow \infty }\mathbb {P}\left[ X_n/a_n \le C(\varepsilon ) \right] \ge 1-\varepsilon . \end{aligned}$$
  • By the statement \(X_n=\Omega _p(a_n)\) we mean that, for every \(\varepsilon >0\), there exists \(0<c(\varepsilon )<\infty \) such that

    $$\begin{aligned} \liminf _{n \rightarrow \infty }\mathbb {P}\left[ X_n/a_n \ge c(\varepsilon ) \right] \ge 1-\varepsilon . \end{aligned}$$
  • By the statement \(X_n=\Theta _p(a_n)\) we mean that for every \(\varepsilon >0\), there exist \(0<c(\varepsilon )<C(\varepsilon )<\infty \) such that

    $$\begin{aligned} \liminf _{n \rightarrow \infty }\mathbb {P}\left[ c(\varepsilon ) \le X_n/a_n \le C(\varepsilon ) \right] \ge 1-\varepsilon . \end{aligned}$$

Further, \(\Vert \cdot \Vert _F\) will denote the Frobenius norm of a matrix, and the expectation \({\mathbb {E}}_G\) will be taken with respect to G chosen uniformly from the group of isometries \({\mathcal {G}}\).

For any positive integer m, by the symbol [m] we denote the set \(\{1,\ldots ,m\}\).

For two sequences of positive numbers \((a_k)_{k>0}\) and \((b_k)_{k>0}\), we write \(a_k \ll b_k\) when we have \(b_k/a_k \rightarrow \infty \) as \(k \rightarrow \infty \).

A sequence of events \(\{E_m\}_{m \ge 1}\), defined with respect to probability measures \(\mathbb {P}_m\), is said to occur with high probability if \(\mathbb {P}_m[E_m] \rightarrow 1\) as \(m \rightarrow \infty \).

For any \(\theta = (\theta _1,\ldots ,\theta _L) \in {\mathbb {R}}^L\), we denote \({\overline{\theta }}=\frac{1}{L}\sum _{i=1}^L \theta _i\).

Appendix: Bernoulli–Gaussian Distributions

We define the notion of the Bernoulli–Gaussian distribution, and the symmetric version thereof. For that, we first define the notion of a Gaussian distribution indexed by a subset of \({\mathbb {Z}}_L\).

Definition 38

(Subset-indexed Gaussian distributions) Let \(A \subset {\mathbb {Z}}_L\), \(\mu :{\mathbb {Z}}_L \rightarrow {\mathbb {R}}\) a function supported on A and \(\Sigma \) be a positive definite \(|A|\times |A|\) matrix. Then the Gaussian distribution indexed by A with mean \(\mu \) and covariance \(\Sigma \), denoted \(N_A(0,\Sigma )\), is the random vector \((\eta _k)_{k \in {\mathbb {Z}}_L}\), with \(\eta _k=0\) for \(k \in A^\complement \), and \((\eta _k)_{k \in A}\) is the |A|-dimensional Gaussian random vector with mean \(\mu \) and covariance \(\Sigma \).

This allows us to define the Bernoulli–Gaussian distribution, a key property of which is that the support is chosen at random according to a Bernoulli sampling scheme.

Definition 39

(BernoulliGaussian distribution) Let \(s \in [L]\) and \(\Xi \subset {\mathbb {Z}}_L\) be a random subset obtained by selecting each member of \({\mathbb {Z}}_L\) independently with probability s/L. The Bernoulli–Gaussian distribution on \(Z_L\) with variance \(\zeta ^2\) and sparsity s is then defined as the Gaussian distribution indexed by \(\Xi \) with mean \({\mathbf {0}}\) and covariance \(\zeta ^2 I\); in other words the random variable \(N_\Xi ({\mathbf {0}},\zeta ^2 I)\), with the Gaussian entries being statistically independent of the support \(\Xi \).

Next, we introduce the concept of a standard symmetric Gaussian random variable indexed by a subset of \({\mathbb {Z}}_L\). To introduce the notion of a symmetric signal, we first recall the notion of the standard parametrization of \({\mathbb {Z}}_L\) (1.7).

We are now ready to define

Definition 40

(Symmetric subset-indexed Gaussian distributions) Let \({\mathbb {Z}}_L\) be in the standard enumeration (1.7), let \(A \subset {\mathbb {Z}}_L\) be symmetric, i.e. \(A = -A\) and let \(\rho >0\). Let \(A_+:=\{0,\ldots ,\lfloor (L-1)/2 \rfloor \} \cap A\), and let \((X_k)_{k \in {\mathbb {Z}}_L}\) denote the random variable \(N_{A_+}(0,\zeta ^2 I)\). Then the symmetric Gaussian distribution indexed by A with mean 0 and variance \(\zeta ^2\), denoted \(N_A^{\mathrm {symm}}(0,\zeta ^2 I)\), is the random vector \((\eta _k)_{k \in {\mathbb {Z}}_L}\) with \(\eta _k=X_{|k|}\).

Finally, all of the above taken together allows us to define

Definition 41

(Symmetric Bernoulli–Gaussian distribution) Let \({\mathbb {Z}}_L\) be in the standard enumeration (1.7). Let \(\Xi _0 \subset {\mathbb {Z}}_L^+=\{0,\ldots ,\lfloor (L-1)/2\rfloor \}\) be a random subset obtained by selecting each member of \({\mathbb {Z}}_L^+\) independently with probability s/L, and consider the symmetric subset \(\Xi :=\Xi _0 \cup (-\Xi _0)\). Then the symmetric Bernoulli–Gaussian distribution with mean zero, variance \(\zeta ^2\) and sparsity parameter s is the distribution \(N^{\mathrm {symm}}_\Xi (0,\zeta ^2 I)\), with the Gaussian entries being statistically independent of the support \(\Xi \).

Heuristically, the symmetric Bernoulli–Gaussian distribution is obtained by taking a Bernoulli–Gaussian random variable on the positive part of \({\mathbb {Z}}_L\) and extending it to all of \({\mathbb {Z}}_L\) by making it symmetric about the origin.

Appendix: Generic Sparse Signals

We introduce the notions of signal support sets that are typically s-sparse and \(\Gamma \)-cosine generic.

Definition 42

Let \(\alpha ,\beta >0\) be fixed numbers and \(s \in [L]\) be a parameter that possibly depends on L. A probability distribution over subsets \(\Xi \subset {\mathbb {Z}}_L\) is said to be typically s-sparse with sparsity constants \((\alpha ,\beta )\) if we have \(\alpha \cdot s \le |\Xi | \le \beta \cdot s\) with probability \(1-o_L(1)\).

To introduce the concept of cosine-genericity of a set, we first define the cosine functional of a set \(\Xi \subset {\mathbb {Z}}_L\) for an element \(a \in {\mathbb {Z}}_L\).

For \(\Xi \subset {\mathbb {Z}}_L\) and \(a \in {\mathbb {Z}}_L\), define

$$\begin{aligned} {\mathcal {V}}(\Xi ,a)=\mathbbm {1}_{\{0 \in \Xi \}} + 2 \sum _{k \in \Xi \setminus \{0\}} \cos ^2(2\pi a k/L), \end{aligned}$$
(C.1)

where \(\mathbbm {1}_A\) denotes the indicator function of the event A.

Then we are ready to introduce

Definition 43

Let \(\Gamma >0\) be a parameter, possibly depending on L. A probability distribution over subsets \(\Xi \subset {\mathbb {Z}}_L\) is said to be \(\Gamma \)-cosine generic if, with probability \(1-o_L(1)\), we have \(\min _{a \in {\mathbb {Z}}_L} {\mathcal {V}}(\Xi ,a) \ge \Gamma (1-o_L(1))\).

Equivalently, we say that the random variable \(\Xi \) is cosine generic with parameter \(\Gamma \). Cosine genericity of a (random) set is a condition that aims to ensure that, with high probability, the set under consideration is sufficiently generic, in the sense that there are no specialized algebraic or arithmetic relations satisfied by the elements of the set which would make \(\min _{a \in {\mathbb {Z}}_L} {\mathcal {V}}(\Xi ,a)\) small.

Putting all of the above together, we may introduce the generic s-sparse symmetric signals.

Definition 44

Let \(s \in [L]\) be a parameter, possibly depending on L, and \(\alpha ,\beta ,\zeta ,\tau >0\) be fixed. We call a random signal \(\theta :{\mathbb {Z}}_L \rightarrow {\mathbb {R}}\) to be a generic s-sparse symmetric signal with dispersion \(\zeta ^2\), sparsity constants \(\alpha ,\beta \) and index \(\tau \) if the following hold:

  • The support \(\Xi \) of \(\theta \) is typically s-sparse with sparsity constants \((\alpha ,\beta )\)and \(s^\tau \)-cosine generic.

  • \(\theta \sim N_{\Xi }^{\mathrm {symm}}(0,\zeta ^2 I)\), with the non-zero entries of \(\theta \) being statistically independent of \(\Xi \).

Appendix: On the Size of Collision Free Sets

In this section, we provide detailed arguments for the assertions that the size of a collision-free subset \(A \subset {\mathbb {Z}}_L\) is maximally \(O(L^{1/2})\) and typically \(O(L^{1/3})\).

To this end, we let \(1 \le k \le L\), and we consider a subset \(B \subset {\mathbb {Z}}_L\) of size \(|B|=k\). If B is collision-free, then B entails \(k(k-1)\) distinct differences between its points; we call this set of differences D. For \(x \in {\mathbb {Z}}_L \setminus B\), we want to understand size restrictions on |B| that enable \(B \cup \{x\}\) to be a collision-free set. If \(B \cup \{x\}\) has to be collision-free, we note that for any fixed \(u \in B\), the difference \(x-u\) needs to be \(\notin D\). This rules out \(k(k-1)\) choices for x. Thus, such a point x can be found only if \(k(k-1) < L - k\), which gives us an upper bound of \(k=O(L^{1/2})\), as desired.

We note in passing that the probability of a randomly selected x in the above setting to yield a collision-free subset \(B \cup \{x\}\) is bounded above by \((L-k-k(k-1))/L\), for any set B.

Now we examine the largest value of m for which a random subset drawn of size m drawn from \({\mathbb {Z}}_L\) collision free with positive probability. For concreteness, we consider m samples without replacement from \({\mathbb {Z}}_L\).

For \(1\le k \le m\), we denote by \({\mathfrak {S}}_k\) the set of first k random samples without replacement. Then we may write

$$\begin{aligned}&\mathbb {P}[{\mathfrak {S}}_m ~\text {is collision-free}] \\&\quad = \mathbb {P}[{\mathfrak {S}}_m ~\text {is collision-free} ~| ~{\mathfrak {S}}_{m-1} ~\text {is collision-free}] \cdot \mathbb {P}[{\mathfrak {S}}_{m-1} ~\text {is collision-free}] \\&\quad = \prod _{k=1}^{m-1} \mathbb {P}[{\mathfrak {S}}_{k+1} ~\text {is collision-free} ~| ~{\mathfrak {S}}_{k} ~\text {is collision-free}] \\&\quad = \prod _{k=1}^{m-1} \mathbb {P}_{x \sim \text {Unif}({\mathbb {Z}}_L \setminus {\mathfrak {S}}_{k})} \big [{\mathfrak {S}}_{k} \cup \{x\} ~\text {is collision-free} ~| ~{\mathfrak {S}}_{k} ~\text {is collision-free}\big ] \\&\quad \le \prod _{k=1}^{m-1} \frac{L-k-k(k-1)}{L} \quad \text {[using the analysis for the set}\ B\ \text {above]} \\&\quad = \prod _{k=1}^{m-1} \left( 1 - \frac{k^2}{L} \right) ~\le \prod _{k=1}^{m-1} \exp (-\frac{k^2}{L}) ~\le \exp (-c m^3/L). \end{aligned}$$

Thus, if \(m^3/L \rightarrow \infty , \mathbb {P}[{\mathfrak {S}}_k ~\text {is collision-free}] \rightarrow 0\). Therefore, for a random subset of size m to be collision-free with positive probability, we must have \(m=O(L^{1/3})\), and to have the same property with high probability, we must have \(m=o(L^{1/3})\).

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghosh, S., Rigollet, P. Sparse Multi-Reference Alignment: Phase Retrieval, Uniform Uncertainty Principles and the Beltway Problem. Found Comput Math 23, 1851–1898 (2023). https://doi.org/10.1007/s10208-022-09584-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-022-09584-6

Keywords

Mathematics Subject Classification

Navigation