当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CATBOSS: Cluster Analysis of Trajectories Based on Segment Splitting
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2021-10-05 , DOI: 10.1021/acs.jcim.1c00598
Jovan Damjanovic 1 , James M Murphy 2 , Yu-Shan Lin 1
Affiliation  

Molecular dynamics (MD) simulations are an exceedingly and increasingly potent tool for molecular behavior prediction and analysis. However, the enormous wealth of data generated by these simulations can be difficult to process and render in a human-readable fashion. Cluster analysis is a commonly used way to partition data into structurally distinct states. We present a method that improves on the state of the art by taking advantage of the temporal information of MD trajectories to enable more accurate clustering at a lower memory cost. To date, cluster analysis of MD simulations has generally treated simulation snapshots as a mere collection of independent data points and attempted to separate them into different clusters based on structural similarity. This new method, cluster analysis of trajectories based on segment splitting (CATBOSS), applies density-peak-based clustering to classify trajectory segments learned by change detection. Applying the method to a synthetic toy model as well as four real-life data sets–trajectories of MD simulations of alanine dipeptide and valine dipeptide as well as two fast-folding proteins–we find CATBOSS to be robust and highly performant, yielding natural-looking cluster boundaries and greatly improving clustering resolution. As the classification of points into segments emphasizes density gaps in the data by grouping them close to the state means, CATBOSS applied to the valine dipeptide system is even able to account for a degree of freedom deliberately omitted from the input data set. We also demonstrate the potential utility of CATBOSS in distinguishing metastable states from transition segments as well as promising application to cases where there is little or no advance knowledge of intrinsic coordinates, making for a highly versatile analysis tool.

中文翻译:

CATBOSS:基于分割的轨迹聚类分析

分子动力学 (MD) 模拟是一种用于分子行为预测和分析的极其有效的工具。然而,这些模拟产生的大量数据可能难以以人类可读的方式处理和呈现。聚类分析是将数据划分为结构上不同的状态的常用方法。我们提出了一种方法,该方法通过利用 MD 轨迹的时间信息来改进现有技术,从而以更低的内存成本实现更准确的聚类。迄今为止,MD 模拟的聚类分析通常将模拟快照视为仅仅是独立数据点的集合,并试图根据结构相似性将它们分成不同的聚类。这种新方法,基于分段分割的轨迹聚类分析(CATBOSS),轨迹段通过变化检测学习。将该方法应用于合成玩具模型以及四个真实数据集——丙氨酸二肽和缬氨酸二肽以及两种快速折叠蛋白的 MD 模拟轨迹——我们发现 CATBOSS 是稳健且高性能的,产生天然的-查看聚类边界并大大提高聚类分辨率。由于通过将点分组为接近状态均值来将点分类为强调数据中的密度差距,因此应用于缬氨酸二肽系统的 CATBOSS 甚至能够解释输入数据集中有意省略的自由度。我们还证明了 CATBOSS 在区分亚稳态和过渡段方面的潜在效用,以及在很少或没有内在坐标高级知识的情况下有希望的应用,
更新日期:2021-10-25
down
wechat
bug