NMR assignment through linear programming

Bravo-Ferreira, José F. S.; Cowburn, David; Khoo, Yuehaw; Singer, Amit

doi:10.1007/s10898-021-01004-3

NMR assignment through linear programming

Published: 11 March 2021

Volume 83, pages 3–28, (2022)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

José F. S. Bravo-Ferreira ORCID: orcid.org/0000-0003-3713-7759¹,
David Cowburn^2,3,
Yuehaw Khoo⁴ &
…
Amit Singer⁵

366 Accesses
2 Citations
Explore all metrics

Abstract

Nuclear Magnetic Resonance (NMR) Spectroscopy is the second most used technique (after X-ray crystallography) for structural determination of proteins. A computational challenge in this technique involves solving a discrete optimization problem that assigns the resonance frequency to each atom in the protein. This paper introduces LIAN (LInear programming Assignment for NMR), a novel linear programming formulation of the problem which yields state-of-the-art results in simulated and experimental datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Bayesian Framework for Chemical Shift Assignment

Automatic structure-based NMR methyl resonance assignment in large proteins

Article Open access 29 October 2019

Iva Pritišanac, Julia M. Würz, … Peter Güntert

An Overview on Protein Structure Determination by NMR: Historical and Future Perspectives of the use of Distance Geometry Methods

References

Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications. Prentice-Hall Inc, Upper Saddle River (1993)
MATH Google Scholar
Alipanahi, B., Gao, X., Karakoc, E., Li, S.C., Balbach, F., Feng, G., Donaldson, L., Li, M.: Error tolerant NMR backbone resonance assignment and automated structure generation. J. Bioinform. Comput. Biol. 9(1), 15–41 (2011)
Article Google Scholar
Allain, F., Mareuil, F., Ménager, H., Nilges, M., Bardiaux, B.: ARIAweb: a server for automated NMR structure calculation. Nucleic Acids Research 48(W1), W41–W47 (2020). https://doi.org/10.1093/nar/gkaa362
Article Google Scholar
Bahrami, A., Assadi, A.H., Markley, J.L., Eghbalnia, H.R.: Probabilistic interaction network of evidence algorithm and its application to complete labeling of peak lists from protein nmr spectroscopy. PLOS Comput. Biol. 5(3), 1–15 (2009). https://doi.org/10.1371/journal.pcbi.1000307
Article Google Scholar
Bailey-Kellogg, C., Chainraj, S., Pandurangan, G.: A random graph approach to NMR sequential assignment. J. Comput. Biol. 12(6), 569–583 (2005)
Article Google Scholar
Bang-Jensen, J., Gutin, G.Z.: Digraphs: Theory, Algorithms and Applications. Springer, London (2008)
MATH Google Scholar
Baran, M.C., Huang, Y.J., Moseley, H.N.B., Montelione, G.T.: Automated analysis of protein NMR assignments and structures. Chem. Rev. 104(8), 3541–3556 (2004). https://doi.org/10.1021/cr030408p. PMID: 15303826
Article Google Scholar
Bartels, C., Güntert, P., Billeter, M., Wüthrich, K.: Garant-a general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra. J. Comput. Chem. 18(1), 139–149 (1997)
Article Google Scholar
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28(1), 235–242 (2000)
Article Google Scholar
Bodenhausen, G., Ruben, J.D.: Natural abundance nitrogen-15 NMR by enhanced heteronuclear spectroscopy. Chem. Phys. Lett. 69, 185–189 (1980)
Article Google Scholar
Bromiley, P.: Products and convolutions of gaussian probability density functions. Tina-Vision Memo 3(4), 1 (2003)
Google Scholar
Cavanagh, J., Fairbrother, W.J., Palmer, A.G., Rance, M., Skelton, N.J.: Protein NMR Spectroscopy, 1st edn. Academic Press Limited, London (1996)
Google Scholar
Coggins, B.E., Zhou, P.: PACES: Protein sequential assignment by computer-assisted exhaustive search. J. Biomol. NMR 26(2), 93–111 (2003)
Article Google Scholar
Donald, B.R.: Algorithms in Structural Molecular Biology. The MIT Press, Cambridge (2011)
Google Scholar
Donald, B.R., Martin, J.: Automated NMR assignment and protein structure determination using sparse dipolar coupling constraints. Prog. Nuclear Magn. Resonance Spectrosc. 55(2), 101–127 (2009). https://doi.org/10.1016/j.pnmrs.2008.12.001
Article Google Scholar
Ferreira, J.F.S.B., Khoo, Y., Singer, A.: Semidefinite programming approach for the quadratic assignment problem with a sparse graph. Comput. Optim. Appl. 69(3), 677–712 (2018). https://doi.org/10.1007/s10589-017-9968-9968-8
Article MathSciNet MATH Google Scholar
Grzesiek, S., Bax, A.: Correlating backbone amide and side chain resonances in larger proteins by multiple relayed triple resonance NMR. J. Am. Chem. Soc. 114(16), 6291–6293 (1992). https://doi.org/10.1021/ja00042a003
Article Google Scholar
Grzesiek, S., Bax, A.: An efficient experiment for sequential backbone assignment of medium-sized isotopically enriched proteins. J. Magn. Resonance 99(1), 201–207 (1969). https://doi.org/10.1016/0022-2364(92)90169-8
Article Google Scholar
Grzesiek, S., Bax, A.: Amino acid type determination in the sequential assignment procedure of uniformly 13C/15N-enriched proteins. J. Biomol. NMR 3(2), 185–204 (1993)
Article Google Scholar
Guerry, P., Herrmann, T.: Comprehensive Automation for NMR Structure Determination of Proteins, pp. 429–451. Humana Press, Totowa (2012). https://doi.org/10.1007/978-1-61779-480-3_22
Book Google Scholar
Güntert, P., Buchner, L.: Combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR 62(4), 453–471 (2015). https://doi.org/10.1007/s10858-015-9924-9
Article Google Scholar
Güntert, P., Salzmann, M., Braun, D., Wüthrich, K.: Sequence-specific NMR assignment of proteins by global fragment mapping with the program mapper. J. Biomol. NMR 18(2), 129–137 (2000). https://doi.org/10.1023/A:1008318805889
Article Google Scholar
Gurobi Optimization, L.: Gurobi optimizer reference manual (2020). http://www.gurobi.com
Hitchens, T.K., Lukin, J.A., Zhan, Y., McCallum, S.A., Rule, G.S.: MONTE: An automated Monte Carlo based approach to nuclear magnetic resonance assignment of proteins. J. Biomol. NMR 25(1), 1–9 (2003)
Article Google Scholar
Jung, Y.S., Zweckstetter, M.: Mars—robust automatic backbone assignment of proteins. J. Biomol. NMR 30(1), 11–23 (2004). https://doi.org/10.1023/B:JNMR.0000042954.99056.ad
Article Google Scholar
Karjalainen, M., Tossavainen, H., Hellman, M., Permi, P.: HACANCOi: a new H-detected experiment for backbone resonance assignment of intrinsically disordered proteins. J. Biomol. NMR 74, 741 (2020)
Article Google Scholar
Leutner, M., Gschwind, R.M., Liermann, J., Schwarz, C., Gemmecker, G., Kessler, H.: Automated backbone assignment of labeled proteins using the threshold accepting algorithm. J. Biomol. NMR 11(1), 31–43 (1998)
Article Google Scholar
Lian, L.Y., Barsukov, I.L.: Resonance Assignments, chap. 3, pp. 55–82. Wiley-Blackwell, Hoboken (2011). https://doi.org/10.1002/9781119972006.ch3
Book Google Scholar
Schmidt, E., Güntert, P.: A new algorithm for reliable and general NMR resonance assignment. J. Am. Chem. Soc. 134(30), 12817–12829 (2012). https://doi.org/10.1021/ja305091n. PMID: 22794163
Article Google Scholar
Ulrich, E.L., Akutsu, H., Doreleijers, J.F., Harano, Y., Ioannidis, Y.E., Lin, J., Livny, M., Mading, S., Maziuk, D., Miller, Z., Nakatani, E., Schulte, C.F., Tolmie, D.E., Kent Wenger, R., Yao, H., Markley, J.L.: Biomagresbank. Nucleic Acids Res. 36(suppl 1), D402–D408 (2008). https://doi.org/10.1093/nar/gkm957
Article Google Scholar
Volk, J., Herrmann, T., Wuthrich, K.: Automated sequence-specific protein NMR assignment using the memetic algorithm MATCH. J. Biomol. NMR 41(3), 127–138 (2008)
Article Google Scholar
Wan, X., Lin, G.: CISA: Combined NMR resonance connectivity information determination and sequential assignment. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(3), 336–348 (2007). https://doi.org/10.1109/tcbb.2007.1047
Article Google Scholar
Yang, Y., Fritzsching, K.J., Hong, M.: Resonance assignment of the NMR spectra of disordered proteins using a multi-objective non-dominated sorting genetic algorithm. J. Biomol. NMR 57(3), 281–296 (2013)
Article Google Scholar
Zeng, J., Zhou, P., Donald, B.R.: HASH: a program to accurately predict protein H$\alpha $ shifts from neighboring backbone shifts. J. Biomol. NMR 55(1), 105–118 (2013)
Article Google Scholar
Zimmerman, D.E., Kulikowski, C.A., Huang, Y., Feng, W., Tashiro, M., Shimotakahara, S., Ya Chien, C., Powers, R., Montelione, G.T.: Automated analysis of protein NMR assignments using methods from artificial intelligence. J. Mol. Biol. 269(4), 592–610 (1997). https://doi.org/10.1006/jmbi.1997.1052
Article Google Scholar

Download references

Acknowledgements

A.S. was partially supported by NSF BIGDATA award IIS-1837992, NIH/NIGMS award 1R01GM136780-01, award FA9550-17-1-0291 from AFOSR, the Simons Foundation Math+X Investigator Award, and the Moore Foundation Data-Driven Discovery Investigator Award. DC was supported by NIH GM-117212.

Author information

Authors and Affiliations

PACM, Princeton University, Princeton, NJ, 08540, USA
José F. S. Bravo-Ferreira
Department of Biochemistry, Albert Einstein College of Medicine, New York, NY, 10461, USA
David Cowburn
Departments of Physiology and Biophysics, Albert Einstein College of Medicine, New York, NY, 10461, USA
David Cowburn
Department of Statistics, University of Chicago, Chicago, IL, 60637, USA
Yuehaw Khoo
Department of Mathematics and PACM, Princeton University, Princeton, NJ, 08540, USA
Amit Singer

Authors

José F. S. Bravo-Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
David Cowburn
View author publications
You can also search for this author in PubMed Google Scholar
Yuehaw Khoo
View author publications
You can also search for this author in PubMed Google Scholar
Amit Singer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José F. S. Bravo-Ferreira.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Data and code availability

Data and preliminary (non-production) code used in simulations and tests is available in the author’s repository at https://github.com/fsbravo/lipras.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A: Grouping peaks

As we mentioned in Sect. 3.1.1, grouping consistent peaks together is a crucial step in the graph creation process for ${\mathcal {G}}=({\mathcal {V}},{\mathcal {E}})$. One would wish the enumeration of valid assignments to be as thorough as possible. We can effectively enumerate peak groupings to construct nodes in ${\mathcal {G}}$ by matching measured and expected peaks in a self-consistent way. In particular, we expect a specific set of peaks due to N–H${}^{N}$ from residue k (see Fig. 10 for a standard example with three experiments) where the values of these peaks in ${\mathbb {R}}^3$ along certain dimensions are consistent. If there are n residues, we should have n sets of such expected peaks. Therefore, each layer in ${\mathcal {G}}=({\mathcal {V}},{\mathcal {E}})$ in principle should have n nodes, although in practice there are more nodes due ambiguities.

The notion of consistency can help significantly simplify the enumeration process (which would otherwise result in an exponential number of nodes). In order to efficiently enumerate consistent peak groupings, we do the following. Let $\mathcal {S}_1, \ldots , \mathcal {S}_{L}$ be collections of measured peak lists corresponding to different heteronuclear experiments, i.e. $\cup _{l=1}^L\mathcal {S}_l:=[p_1, \ldots , p_{m_2}]$. In the case of Fig. 10, $L=3$, as we have peaks from three experiments. Now from these $m_2$ experimental peaks we form all combinations of seven peaks that each consists of one peak from ${\mathcal {S}}_1$, two peaks from ${\mathcal {S}}_2$, and four peaks from ${\mathcal {S}}_3$ using the following criteria.

For any pair of $p_u, p_v$ in a combination of seven peaks,
$$\begin{aligned} \vert p_u(1)-p_v(1)\vert&\le \delta _1 \\ \vert p_u(2)-p_v(2)\vert&\le \delta _2. \end{aligned}$$
This means that the frequencies of the seven peaks in the N–H${}^{N}$ dimension have to coincide up to tolerance $\delta _1,\delta _2$.
Furthermore, for a combination of seven peaks, let $p_u, p_v$ be the two peaks in ${\mathcal {S}}_2$. These peaks should coincide with two of the peaks in ${\mathcal {S}}_3$ (denoted $p_i,p_j$) up to tolerance $\delta _3$, i.e.
$$\begin{aligned} \vert p_u(3)-p_i(3)\vert&\le \delta _3 \\ \vert p_v(3)-p_j(3)\vert&\le \delta _3 \end{aligned}$$
along the $\text {C}$ dimension.

B: Atom cost

Recall that we defined the cost of an atom, a, under a given set of assigned observations, $\{x_l\}_{l=1}^{o_a}$ as

Definition 3

(Atom cost) The cost associated with atom a, with a normally distributed prior $\mathcal {N}(\mu _a, \sigma _a)$, and $o_a$ observations $\{x_l^a\}_{l=1}^{o_a}$ defined by the peak grouping, also assumed to be normally distributed around the true frequency, $\mu $, according to $\mathcal {N}(\mu , \sigma _l)$ is defined as

$$\begin{aligned} \text {cost}\left( a, \{x_l^a\}_{l=1}^{o_a}\right) \triangleq -\log {\mathbb {E}}_{\mu \sim {\mathcal {N}}(\mu _a, \sigma _a)}\left[ \prod _{l=1}^{o_a}f(x_l^a\mid \mu , \sigma _l)\right] . \end{aligned}$$

(18)

where $f(\cdot \mid u, v)$ is the Gaussian density with mean u and standard deviation v.

This is Definition 1 in the main text. Note that the term inside the expectation is a product of $o_a$ univariate Gaussian probability density functions. Furthermore, expanding the expectation, we note that

$$\begin{aligned} {\mathbb {E}}_\mu \left[ \prod _{l=1}^{o_a}f(x_l^a\mid \mu , \sigma _l)\right]&= \int _{-\infty }^{+\infty }f(\mu \mid \mu _a, \sigma _a)\prod _l^{o_a}f(x_l^a\mid \mu , \sigma _l)d\mu \end{aligned}$$

(19)

$$\begin{aligned}&=\int _{-\infty }^{+\infty }f(\mu \mid \mu _a, \sigma _a)\prod _l^{o_a}f(\mu \mid x_l^a, \sigma _l)d\mu \end{aligned}$$

(20)

by symmetry. Using a standard result regarding the product of univariate Gaussian PDFs (see, e.g., [11]), we can write

$$\begin{aligned} {\mathbb {E}}_\mu \left[ \prod _{l=1}^{o_a}f(x_l^a\mid \mu , \sigma _l)\right]&=\int _{-\infty }^{+\infty }f(\mu \mid \mu _a, \sigma _a)\prod _l^{o_a}f(\mu \mid x_l^a, \sigma _l)d\mu \end{aligned}$$

(21)

$$\begin{aligned}&=\int _{-\infty }^{+\infty }Z_af(\mu \mid M_a, \Sigma _a)d\mu \end{aligned}$$

(22)

$$\begin{aligned}&=Z_a \end{aligned}$$

(23)

where

$$\begin{aligned} \Sigma _a&= \left( \frac{1}{\sigma _a^2}+\sum _{l=1}^{o_a} \frac{1}{\sigma _l^2}\right) ^{-1/2} \end{aligned}$$

(24)

$$\begin{aligned} M_a&= \left( \frac{\mu _a}{\sigma _a^2}+\sum _{l=1}^{o_a}\frac{x_l}{\sigma _l^2}\right) \Sigma ^2_{a} \end{aligned}$$

(25)

$$\begin{aligned} Z_a&=\frac{1}{(2\pi )^{o_a/2}}\sqrt{\frac{\Sigma _a^2}{\sigma _a^2\prod _{l=1}^{o_a}\sigma _l^2}}\exp \left[ -\frac{1}{2}\left( \frac{\mu _a^2}{\sigma _a^2}+\sum _{l=1}^{o_a}\frac{x_l^2}{\sigma _l^2}-\frac{M_a^2}{\Sigma _a^2}\right) \right] . \end{aligned}$$

(26)

We see that this choice of cost function is therefore computationally advantageous, as the desired expectation is a simple function of the observations, $\{x_l\}_{l=1}^{o_a}$ and of the distributional parameters of the prior, $(\mu _a, \sigma _a)$ and experiments, $\{\sigma _l\}_{l=1}^{o_a}$. That said, it is certainly not the only cost function that one could use. As an example, we could instead solve a maximum likelihood problem for each peak grouping that would assign the highest likelihood frequency to each atom, given the prior and the observations. The exploration of alternative cost functions is left for future work.

C: Statistical Typing

Statistical typing is a process that happens both during the node and edge creation steps. In particular, we want to avoid the creation of nodes and edges which are too unlikely to constitute a valid assignment. The way we action on this notion is to define a threshold below which we would rather have a null assignment than the assignment induced by the relevant nodes. This threshold also determines the cost of the edges to (and from) the dummy nodes, which are therefore the highest cost edges in the graph.

For all simulations in this paper, we use the following definition:

Definition 4

(Atom cost threshold) The maximum allowable cost associated with atom a, with an expected frequency, $\mu $, distributed according to the normally distributed prior $\mathcal {N}(\mu _a, \sigma _a)$, and a total of $o_a$ expected observations is given by:

$$\begin{aligned} \text {threshold}\left( a\right) \triangleq \text {cost}(a, \{w_l^a\}_{l=1}^{o_a}) \end{aligned}$$

(27)

where

$$\begin{aligned} w_l^a = \mu _a +\delta \sigma _a + (-1)^{l+1}\delta \sigma _l. \end{aligned}$$

(28)

That is, we define the maximum allowable cost for atom a by setting $\{x^a_l\}_{l=1}^{o_a}$ in Definition 1 to $\{w^a_l\}_{l=1}^{o_a}$, which constitute an adversarial realization of the observations. In this realization, the mean of the observations is $\approx \delta $ standard deviations away from the prior mean, and the observations are split into two clusters, $2\delta $ experimental standard deviations apart.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bravo-Ferreira, J.F.S., Cowburn, D., Khoo, Y. et al. NMR assignment through linear programming. J Glob Optim 83, 3–28 (2022). https://doi.org/10.1007/s10898-021-01004-3

Download citation

Received: 07 August 2020
Accepted: 20 February 2021
Published: 11 March 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s10898-021-01004-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NMR assignment through linear programming

Abstract

Access this article

Similar content being viewed by others

A Bayesian Framework for Chemical Shift Assignment

Automatic structure-based NMR methyl resonance assignment in large proteins

An Overview on Protein Structure Determination by NMR: Historical and Future Perspectives of the use of Distance Geometry Methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Data and code availability

Additional information

Publisher's Note

Appendices

A: Grouping peaks

B: Atom cost

Definition 3

C: Statistical Typing

Definition 4

Rights and permissions

About this article

Cite this article

Keywords

Navigation

NMR assignment through linear programming

Abstract

Access this article

Similar content being viewed by others

A Bayesian Framework for Chemical Shift Assignment

Automatic structure-based NMR methyl resonance assignment in large proteins

An Overview on Protein Structure Determination by NMR: Historical and Future Perspectives of the use of Distance Geometry Methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Data and code availability

Additional information

Publisher's Note

Appendices

A: Grouping peaks

B: Atom cost

Definition 3

C: Statistical Typing

Definition 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation