Abstract
This work is based on the manifold-embedding approach to study biological molecules exhibiting continuous conformational changes. Previous work established a method capable of reconstructing 3D movies and accompanying energetics of atomic-level structures from single-particle cryo-EM images of macromolecules displaying multiple conformational degrees of freedom. Here, we introduce an unsupervised geometric machine learning approach that is informed by detailed heuristic analysis of manifolds formed by simulated heterogeneous cryo-EM datasets generated from an atomic structure. These simulated data were generated with increasing complexity to account for multiple conformational motions, state occupancies and typical microscope parameters in a wide range of signal-to-noise ratios. Using these datasets as ground-truth, we provide detailed exposition of our findings using several conformational motions while exploring the available parameter space. Guided by these insights, we build a framework to leverage the high-dimensional geometric information obtained towards reconstituting a quasi-continuum of conformational states in the form of a free-energy landscape and respective 3D density maps for all states therein. As shown by a direct comparison of results, this framework offers substantial improvements relative to the previous work.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
↵* (Electronic mail: pschwan{at}uwm.edu, jf2192{at}cumc.columbia.edu)
We have refined nomenclature to improve clarity and more effectively connect several fields of research. The name of our algorithm has also changed from DMSA to ESPER. Most significantly, we have added 10 pages of analysis to Supplementary Material, which derives an explicit expression for the PD-manifold eigenfunctions previously empirically observed. After this SM section, we investigate different boundary conditions and their effects on the spectral geometry. Finally, we have broadened our scope to extend to data types from other experimental techniques, and reframed much of our analysis in the context of the broader machine learning literature.
† A tabulated description of symbols and abbreviations used throughout this document is available in the Appendix.
† There is a wide range of nomenclature used here between fields and, in some instances, works by the same author. The following are interchangeable: conformational motions; conformational coordinates; reaction coordinates; collective motion coordinates.
‡ All supplementary material sections will be referenced throughout this document in form SM-{Roman numeral}. The ordering of sections in SM is arranged to form a cohesive narrative, separate from the order each section is introduced in our main text.