当前位置: X-MOL 学术Biophys. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Unsupervised selection of optimal single-molecule time series idealization criterion
Biophysical Journal ( IF 3.2 ) Pub Date : 2021-09-04 , DOI: 10.1016/j.bpj.2021.08.045
Argha Bandyopadhyay 1 , Marcel P Goldschen-Ohm 1
Affiliation  

Single-molecule (SM) approaches have provided valuable mechanistic information on many biophysical systems. As technological advances lead to ever-larger data sets, tools for rapid analysis and identification of molecules exhibiting the behavior of interest are increasingly important. In many cases the underlying mechanism is unknown, making unsupervised techniques desirable. The divisive segmentation and clustering (DISC) algorithm is one such unsupervised method that idealizes noisy SM time series much faster than computationally intensive approaches without sacrificing accuracy. However, DISC relies on a user-selected objective criterion (OC) to guide its estimation of the ideal time series. Here, we explore how different OCs affect DISC’s performance for data typical of SM fluorescence imaging experiments. We find that OCs differing in their penalty for model complexity each optimize DISC’s performance for time series with different properties such as signal/noise and number of sample points. Using a machine learning approach, we generate a decision boundary that allows unsupervised selection of OCs based on the input time series to maximize performance for different types of data. This is particularly relevant for SM fluorescence data sets, which often have signal/noise near the derived decision boundary and include time series of nonuniform length because of stochastic bleaching. Our approach, AutoDISC, allows unsupervised per-molecule optimization of DISC, which will substantially assist in the rapid analysis of high-throughput SM data sets with noisy samples and nonuniform time windows.



中文翻译:

最优单分子时间序列理想化准则的无监督选择

单分子 (SM) 方法为许多生物物理系统提供了有价值的机械信息。随着技术进步导致数据集越来越大,用于快速分析和识别表现出感兴趣行为的分子的工具变得越来越重要。在许多情况下,潜在机制是未知的,因此需要无监督技术。分裂分割和聚类 (DISC) 算法就是这样一种无监督方法,它比计算密集型方法更快地理想化嘈杂的 SM 时间序列,而不会牺牲准确性。然而,DISC 依赖于用户选择的客观标准 (OC) 来指导其对理想时间序列的估计。在这里,我们探讨了不同的 OC 如何影响 DISC 对 SM 荧光成像实验典型数据的性能。我们发现,OC 对模型复杂度的惩罚各不相同,每个都优化了 DISC 对具有不同属性(例如信号/噪声和样本点数量)的时间序列的性能。使用机器学习方法,我们生成一个决策边界,允许基于输入时间序列无监督地选择 OC,以最大限度地提高不同类型数据的性能。这与 SM 荧光数据集特别相关,这些数据集通常在导出的决策边界附近具有信号/噪声,并且由于随机漂白而包括长度不均匀的时间序列。我们的方法 AutoDISC 允许对 DISC 进行无监督的每分子优化,这将大大有助于快速分析具有噪声样本和不均匀时间窗的高通量 SM 数据集。

更新日期:2021-10-19
down
wechat
bug