Skip to main content
Log in

NMR assignment through linear programming

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

Nuclear Magnetic Resonance (NMR) Spectroscopy is the second most used technique (after X-ray crystallography) for structural determination of proteins. A computational challenge in this technique involves solving a discrete optimization problem that assigns the resonance frequency to each atom in the protein. This paper introduces LIAN (LInear programming Assignment for NMR), a novel linear programming formulation of the problem which yields state-of-the-art results in simulated and experimental datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications. Prentice-Hall Inc, Upper Saddle River (1993)

    MATH  Google Scholar 

  2. Alipanahi, B., Gao, X., Karakoc, E., Li, S.C., Balbach, F., Feng, G., Donaldson, L., Li, M.: Error tolerant NMR backbone resonance assignment and automated structure generation. J. Bioinform. Comput. Biol. 9(1), 15–41 (2011)

    Article  Google Scholar 

  3. Allain, F., Mareuil, F., Ménager, H., Nilges, M., Bardiaux, B.: ARIAweb: a server for automated NMR structure calculation. Nucleic Acids Research 48(W1), W41–W47 (2020). https://doi.org/10.1093/nar/gkaa362

    Article  Google Scholar 

  4. Bahrami, A., Assadi, A.H., Markley, J.L., Eghbalnia, H.R.: Probabilistic interaction network of evidence algorithm and its application to complete labeling of peak lists from protein nmr spectroscopy. PLOS Comput. Biol. 5(3), 1–15 (2009). https://doi.org/10.1371/journal.pcbi.1000307

    Article  Google Scholar 

  5. Bailey-Kellogg, C., Chainraj, S., Pandurangan, G.: A random graph approach to NMR sequential assignment. J. Comput. Biol. 12(6), 569–583 (2005)

    Article  Google Scholar 

  6. Bang-Jensen, J., Gutin, G.Z.: Digraphs: Theory, Algorithms and Applications. Springer, London (2008)

    MATH  Google Scholar 

  7. Baran, M.C., Huang, Y.J., Moseley, H.N.B., Montelione, G.T.: Automated analysis of protein NMR assignments and structures. Chem. Rev. 104(8), 3541–3556 (2004). https://doi.org/10.1021/cr030408p. PMID: 15303826

    Article  Google Scholar 

  8. Bartels, C., Güntert, P., Billeter, M., Wüthrich, K.: Garant-a general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra. J. Comput. Chem. 18(1), 139–149 (1997)

    Article  Google Scholar 

  9. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28(1), 235–242 (2000)

    Article  Google Scholar 

  10. Bodenhausen, G., Ruben, J.D.: Natural abundance nitrogen-15 NMR by enhanced heteronuclear spectroscopy. Chem. Phys. Lett. 69, 185–189 (1980)

    Article  Google Scholar 

  11. Bromiley, P.: Products and convolutions of gaussian probability density functions. Tina-Vision Memo 3(4), 1 (2003)

    Google Scholar 

  12. Cavanagh, J., Fairbrother, W.J., Palmer, A.G., Rance, M., Skelton, N.J.: Protein NMR Spectroscopy, 1st edn. Academic Press Limited, London (1996)

    Google Scholar 

  13. Coggins, B.E., Zhou, P.: PACES: Protein sequential assignment by computer-assisted exhaustive search. J. Biomol. NMR 26(2), 93–111 (2003)

    Article  Google Scholar 

  14. Donald, B.R.: Algorithms in Structural Molecular Biology. The MIT Press, Cambridge (2011)

    Google Scholar 

  15. Donald, B.R., Martin, J.: Automated NMR assignment and protein structure determination using sparse dipolar coupling constraints. Prog. Nuclear Magn. Resonance Spectrosc. 55(2), 101–127 (2009). https://doi.org/10.1016/j.pnmrs.2008.12.001

    Article  Google Scholar 

  16. Ferreira, J.F.S.B., Khoo, Y., Singer, A.: Semidefinite programming approach for the quadratic assignment problem with a sparse graph. Comput. Optim. Appl. 69(3), 677–712 (2018). https://doi.org/10.1007/s10589-017-9968-9968-8

    Article  MathSciNet  MATH  Google Scholar 

  17. Grzesiek, S., Bax, A.: Correlating backbone amide and side chain resonances in larger proteins by multiple relayed triple resonance NMR. J. Am. Chem. Soc. 114(16), 6291–6293 (1992). https://doi.org/10.1021/ja00042a003

    Article  Google Scholar 

  18. Grzesiek, S., Bax, A.: An efficient experiment for sequential backbone assignment of medium-sized isotopically enriched proteins. J. Magn. Resonance 99(1), 201–207 (1969). https://doi.org/10.1016/0022-2364(92)90169-8

    Article  Google Scholar 

  19. Grzesiek, S., Bax, A.: Amino acid type determination in the sequential assignment procedure of uniformly 13C/15N-enriched proteins. J. Biomol. NMR 3(2), 185–204 (1993)

    Article  Google Scholar 

  20. Guerry, P., Herrmann, T.: Comprehensive Automation for NMR Structure Determination of Proteins, pp. 429–451. Humana Press, Totowa (2012). https://doi.org/10.1007/978-1-61779-480-3_22

    Book  Google Scholar 

  21. Güntert, P., Buchner, L.: Combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR 62(4), 453–471 (2015). https://doi.org/10.1007/s10858-015-9924-9

    Article  Google Scholar 

  22. Güntert, P., Salzmann, M., Braun, D., Wüthrich, K.: Sequence-specific NMR assignment of proteins by global fragment mapping with the program mapper. J. Biomol. NMR 18(2), 129–137 (2000). https://doi.org/10.1023/A:1008318805889

    Article  Google Scholar 

  23. Gurobi Optimization, L.: Gurobi optimizer reference manual (2020). http://www.gurobi.com

  24. Hitchens, T.K., Lukin, J.A., Zhan, Y., McCallum, S.A., Rule, G.S.: MONTE: An automated Monte Carlo based approach to nuclear magnetic resonance assignment of proteins. J. Biomol. NMR 25(1), 1–9 (2003)

    Article  Google Scholar 

  25. Jung, Y.S., Zweckstetter, M.: Mars—robust automatic backbone assignment of proteins. J. Biomol. NMR 30(1), 11–23 (2004). https://doi.org/10.1023/B:JNMR.0000042954.99056.ad

    Article  Google Scholar 

  26. Karjalainen, M., Tossavainen, H., Hellman, M., Permi, P.: HACANCOi: a new H-detected experiment for backbone resonance assignment of intrinsically disordered proteins. J. Biomol. NMR 74, 741 (2020)

    Article  Google Scholar 

  27. Leutner, M., Gschwind, R.M., Liermann, J., Schwarz, C., Gemmecker, G., Kessler, H.: Automated backbone assignment of labeled proteins using the threshold accepting algorithm. J. Biomol. NMR 11(1), 31–43 (1998)

    Article  Google Scholar 

  28. Lian, L.Y., Barsukov, I.L.: Resonance Assignments, chap. 3, pp. 55–82. Wiley-Blackwell, Hoboken (2011). https://doi.org/10.1002/9781119972006.ch3

    Book  Google Scholar 

  29. Schmidt, E., Güntert, P.: A new algorithm for reliable and general NMR resonance assignment. J. Am. Chem. Soc. 134(30), 12817–12829 (2012). https://doi.org/10.1021/ja305091n. PMID: 22794163

    Article  Google Scholar 

  30. Ulrich, E.L., Akutsu, H., Doreleijers, J.F., Harano, Y., Ioannidis, Y.E., Lin, J., Livny, M., Mading, S., Maziuk, D., Miller, Z., Nakatani, E., Schulte, C.F., Tolmie, D.E., Kent Wenger, R., Yao, H., Markley, J.L.: Biomagresbank. Nucleic Acids Res. 36(suppl 1), D402–D408 (2008). https://doi.org/10.1093/nar/gkm957

    Article  Google Scholar 

  31. Volk, J., Herrmann, T., Wuthrich, K.: Automated sequence-specific protein NMR assignment using the memetic algorithm MATCH. J. Biomol. NMR 41(3), 127–138 (2008)

    Article  Google Scholar 

  32. Wan, X., Lin, G.: CISA: Combined NMR resonance connectivity information determination and sequential assignment. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(3), 336–348 (2007). https://doi.org/10.1109/tcbb.2007.1047

    Article  Google Scholar 

  33. Yang, Y., Fritzsching, K.J., Hong, M.: Resonance assignment of the NMR spectra of disordered proteins using a multi-objective non-dominated sorting genetic algorithm. J. Biomol. NMR 57(3), 281–296 (2013)

    Article  Google Scholar 

  34. Zeng, J., Zhou, P., Donald, B.R.: HASH: a program to accurately predict protein H\(\alpha \) shifts from neighboring backbone shifts. J. Biomol. NMR 55(1), 105–118 (2013)

    Article  Google Scholar 

  35. Zimmerman, D.E., Kulikowski, C.A., Huang, Y., Feng, W., Tashiro, M., Shimotakahara, S., Ya Chien, C., Powers, R., Montelione, G.T.: Automated analysis of protein NMR assignments using methods from artificial intelligence. J. Mol. Biol. 269(4), 592–610 (1997). https://doi.org/10.1006/jmbi.1997.1052

    Article  Google Scholar 

Download references

Acknowledgements

A.S. was partially supported by NSF BIGDATA award IIS-1837992, NIH/NIGMS award 1R01GM136780-01, award FA9550-17-1-0291 from AFOSR, the Simons Foundation Math+X Investigator Award, and the Moore Foundation Data-Driven Discovery Investigator Award. DC was supported by NIH GM-117212.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José F. S. Bravo-Ferreira.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Data and code availability

Data and preliminary (non-production) code used in simulations and tests is available in the author’s repository at https://github.com/fsbravo/lipras.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A: Grouping peaks

As we mentioned in Sect. 3.1.1, grouping consistent peaks together is a crucial step in the graph creation process for \({\mathcal {G}}=({\mathcal {V}},{\mathcal {E}})\). One would wish the enumeration of valid assignments to be as thorough as possible. We can effectively enumerate peak groupings to construct nodes in \({\mathcal {G}}\) by matching measured and expected peaks in a self-consistent way. In particular, we expect a specific set of peaks due to N–H\({}^{N}\) from residue k (see Fig. 10 for a standard example with three experiments) where the values of these peaks in \({\mathbb {R}}^3\) along certain dimensions are consistent. If there are n residues, we should have n sets of such expected peaks. Therefore, each layer in \({\mathcal {G}}=({\mathcal {V}},{\mathcal {E}})\) in principle should have n nodes, although in practice there are more nodes due ambiguities.

Fig. 10
figure 10

With three NMR experiments (often HSQC, HNCACB, and HN(CO)CACB) we generally expect 7 distinct peaks for each base N–H\({}^{N}\) pair in a residue, k. These peaks must be consistent—that is, the frequencies assigned to the same atom by two different peaks must be approximately the same up to some experimental tolerance. In principle, there should be n sets of such 7 peaks, one for each residue

The notion of consistency can help significantly simplify the enumeration process (which would otherwise result in an exponential number of nodes). In order to efficiently enumerate consistent peak groupings, we do the following. Let \(\mathcal {S}_1, \ldots , \mathcal {S}_{L}\) be collections of measured peak lists corresponding to different heteronuclear experiments, i.e. \(\cup _{l=1}^L\mathcal {S}_l:=[p_1, \ldots , p_{m_2}]\). In the case of Fig. 10, \(L=3\), as we have peaks from three experiments. Now from these \(m_2\) experimental peaks we form all combinations of seven peaks that each consists of one peak from \({\mathcal {S}}_1\), two peaks from \({\mathcal {S}}_2\), and four peaks from \({\mathcal {S}}_3\) using the following criteria.

  • For any pair of \(p_u, p_v\) in a combination of seven peaks,

    $$\begin{aligned} \vert p_u(1)-p_v(1)\vert&\le \delta _1 \\ \vert p_u(2)-p_v(2)\vert&\le \delta _2. \end{aligned}$$

    This means that the frequencies of the seven peaks in the N–H\({}^{N}\) dimension have to coincide up to tolerance \(\delta _1,\delta _2\).

  • Furthermore, for a combination of seven peaks, let \(p_u, p_v\) be the two peaks in \({\mathcal {S}}_2\). These peaks should coincide with two of the peaks in \({\mathcal {S}}_3\) (denoted \(p_i,p_j\)) up to tolerance \(\delta _3\), i.e.

    $$\begin{aligned} \vert p_u(3)-p_i(3)\vert&\le \delta _3 \\ \vert p_v(3)-p_j(3)\vert&\le \delta _3 \end{aligned}$$

    along the \(\text {C}\) dimension.

B: Atom cost

Recall that we defined the cost of an atom, a, under a given set of assigned observations, \(\{x_l\}_{l=1}^{o_a}\) as

Definition 3

(Atom cost) The cost associated with atom a, with a normally distributed prior \(\mathcal {N}(\mu _a, \sigma _a)\), and \(o_a\) observations \(\{x_l^a\}_{l=1}^{o_a}\) defined by the peak grouping, also assumed to be normally distributed around the true frequency, \(\mu \), according to \(\mathcal {N}(\mu , \sigma _l)\) is defined as

$$\begin{aligned} \text {cost}\left( a, \{x_l^a\}_{l=1}^{o_a}\right) \triangleq -\log {\mathbb {E}}_{\mu \sim {\mathcal {N}}(\mu _a, \sigma _a)}\left[ \prod _{l=1}^{o_a}f(x_l^a\mid \mu , \sigma _l)\right] . \end{aligned}$$
(18)

where \(f(\cdot \mid u, v)\) is the Gaussian density with mean u and standard deviation v.

This is Definition 1 in the main text. Note that the term inside the expectation is a product of \(o_a\) univariate Gaussian probability density functions. Furthermore, expanding the expectation, we note that

$$\begin{aligned} {\mathbb {E}}_\mu \left[ \prod _{l=1}^{o_a}f(x_l^a\mid \mu , \sigma _l)\right]&= \int _{-\infty }^{+\infty }f(\mu \mid \mu _a, \sigma _a)\prod _l^{o_a}f(x_l^a\mid \mu , \sigma _l)d\mu \end{aligned}$$
(19)
$$\begin{aligned}&=\int _{-\infty }^{+\infty }f(\mu \mid \mu _a, \sigma _a)\prod _l^{o_a}f(\mu \mid x_l^a, \sigma _l)d\mu \end{aligned}$$
(20)

by symmetry. Using a standard result regarding the product of univariate Gaussian PDFs (see, e.g., [11]), we can write

$$\begin{aligned} {\mathbb {E}}_\mu \left[ \prod _{l=1}^{o_a}f(x_l^a\mid \mu , \sigma _l)\right]&=\int _{-\infty }^{+\infty }f(\mu \mid \mu _a, \sigma _a)\prod _l^{o_a}f(\mu \mid x_l^a, \sigma _l)d\mu \end{aligned}$$
(21)
$$\begin{aligned}&=\int _{-\infty }^{+\infty }Z_af(\mu \mid M_a, \Sigma _a)d\mu \end{aligned}$$
(22)
$$\begin{aligned}&=Z_a \end{aligned}$$
(23)

where

$$\begin{aligned} \Sigma _a&= \left( \frac{1}{\sigma _a^2}+\sum _{l=1}^{o_a} \frac{1}{\sigma _l^2}\right) ^{-1/2} \end{aligned}$$
(24)
$$\begin{aligned} M_a&= \left( \frac{\mu _a}{\sigma _a^2}+\sum _{l=1}^{o_a}\frac{x_l}{\sigma _l^2}\right) \Sigma ^2_{a} \end{aligned}$$
(25)
$$\begin{aligned} Z_a&=\frac{1}{(2\pi )^{o_a/2}}\sqrt{\frac{\Sigma _a^2}{\sigma _a^2\prod _{l=1}^{o_a}\sigma _l^2}}\exp \left[ -\frac{1}{2}\left( \frac{\mu _a^2}{\sigma _a^2}+\sum _{l=1}^{o_a}\frac{x_l^2}{\sigma _l^2}-\frac{M_a^2}{\Sigma _a^2}\right) \right] . \end{aligned}$$
(26)

We see that this choice of cost function is therefore computationally advantageous, as the desired expectation is a simple function of the observations, \(\{x_l\}_{l=1}^{o_a}\) and of the distributional parameters of the prior, \((\mu _a, \sigma _a)\) and experiments, \(\{\sigma _l\}_{l=1}^{o_a}\). That said, it is certainly not the only cost function that one could use. As an example, we could instead solve a maximum likelihood problem for each peak grouping that would assign the highest likelihood frequency to each atom, given the prior and the observations. The exploration of alternative cost functions is left for future work.

C: Statistical Typing

Statistical typing is a process that happens both during the node and edge creation steps. In particular, we want to avoid the creation of nodes and edges which are too unlikely to constitute a valid assignment. The way we action on this notion is to define a threshold below which we would rather have a null assignment than the assignment induced by the relevant nodes. This threshold also determines the cost of the edges to (and from) the dummy nodes, which are therefore the highest cost edges in the graph.

For all simulations in this paper, we use the following definition:

Definition 4

(Atom cost threshold) The maximum allowable cost associated with atom a, with an expected frequency, \(\mu \), distributed according to the normally distributed prior \(\mathcal {N}(\mu _a, \sigma _a)\), and a total of \(o_a\) expected observations is given by:

$$\begin{aligned} \text {threshold}\left( a\right) \triangleq \text {cost}(a, \{w_l^a\}_{l=1}^{o_a}) \end{aligned}$$
(27)

where

$$\begin{aligned} w_l^a = \mu _a +\delta \sigma _a + (-1)^{l+1}\delta \sigma _l. \end{aligned}$$
(28)

That is, we define the maximum allowable cost for atom a by setting \(\{x^a_l\}_{l=1}^{o_a}\) in Definition 1 to \(\{w^a_l\}_{l=1}^{o_a}\), which constitute an adversarial realization of the observations. In this realization, the mean of the observations is \(\approx \delta \) standard deviations away from the prior mean, and the observations are split into two clusters, \(2\delta \) experimental standard deviations apart.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bravo-Ferreira, J.F.S., Cowburn, D., Khoo, Y. et al. NMR assignment through linear programming. J Glob Optim 83, 3–28 (2022). https://doi.org/10.1007/s10898-021-01004-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-021-01004-3

Keywords

Navigation