当前位置: X-MOL 学术Struct. Dyn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Selecting XFEL single-particle snapshots by geometric machine learning
Structural Dynamics ( IF 2.3 ) Pub Date : 2021-02-18 , DOI: 10.1063/4.0000060
Eduardo R Cruz-Chú 1 , Ahmad Hosseinizadeh 1 , Ghoncheh Mashayekhi 1 , Russell Fung 1 , Abbas Ourmazd 1 , Peter Schwander 1
Affiliation  

A promising new route for structural biology is single-particle imaging with an X-ray Free-Electron Laser (XFEL). This method has the advantage that the samples do not require crystallization and can be examined at room temperature. However, high-resolution structures can only be obtained from a sufficiently large number of diffraction patterns of individual molecules, so-called single particles. Here, we present a method that allows for efficient identification of single particles in very large XFEL datasets, operates at low signal levels, and is tolerant to background. This method uses supervised Geometric Machine Learning (GML) to extract low-dimensional feature vectors from a training dataset, fuse test datasets into the feature space of training datasets, and separate the data into binary distributions of “single particles” and “non-single particles.” As a proof of principle, we tested simulated and experimental datasets of the Coliphage PR772 virus. We created a training dataset and classified three types of test datasets: First, a noise-free simulated test dataset, which gave near perfect separation. Second, simulated test datasets that were modified to reflect different levels of photon counts and background noise. These modified datasets were used to quantify the predictive limits of our approach. Third, an experimental dataset collected at the Stanford Linear Accelerator Center. The single-particle identification for this experimental dataset was compared with previously published results and it was found that GML covers a wide photon-count range, outperforming other single-particle identification methods. Moreover, a major advantage of GML is its ability to retrieve single particles in the presence of structural variability.

中文翻译:


通过几何机器学习选择 XFEL 单粒子快照



结构生物学的一个有前景的新途径是使用 X 射线自由电子激光器 (XFEL) 进行单粒子成像。该方法的优点是样品不需要结晶,可以在室温下检查。然而,高分辨率结构只能从足够多的单个分子(即所谓的单个粒子)的衍射图案中获得。在这里,我们提出了一种方法,可以在非常大的 XFEL 数据集中有效识别单个粒子,在低信号水平下运行,并且能够容忍背景。该方法使用监督几何机器学习(GML)从训练数据集中提取低维特征向量,将测试数据集融合到训练数据集的特征空间中,并将数据分离为“单粒子”和“非单粒子”的二元分布粒子”。作为原理证明,我们测试了 Coliphage PR772 病毒的模拟和实验数据集。我们创建了一个训练数据集,并对三种类型的测试数据集进行了分类:首先,无噪声模拟测试数据集,它提供了近乎完美的分离。其次,模拟测试数据集经过修改以反映不同水平的光子计数和背景噪声。这些修改后的数据集用于量化我们方法的预测极限。第三,斯坦福直线加速器中心收集的实验数据集。将此实验数据集的单粒子识别与之前发表的结果进行比较,发现 GML 覆盖了较宽的光子计数范围,优于其他单粒子识别方法。此外,GML 的一个主要优点是它能够在存在结构变异的情况下检索单个粒子。
更新日期:2021-03-01
down
wechat
bug