当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CoGAPS 3: Bayesian non-negative matrix factorization for single-cell analysis with asynchronous updates and sparse data structures
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-10-14 , DOI: 10.1186/s12859-020-03796-9
Thomas D Sherman 1 , Tiger Gao 2 , Elana J Fertig 1, 3, 4
Affiliation  

Bayesian factorization methods, including Coordinated Gene Activity in Pattern Sets (CoGAPS), are emerging as powerful analysis tools for single cell data. However, these methods have greater computational costs than their gradient-based counterparts. These costs are often prohibitive for analysis of large single-cell datasets. Many such methods can be run in parallel which enables this limitation to be overcome by running on more powerful hardware. However, the constraints imposed by the prior distributions in CoGAPS limit the applicability of parallelization methods to enhance computational efficiency for single-cell analysis. We developed a new software framework for parallel matrix factorization in Version 3 of the CoGAPS R/Bioconductor package to overcome the computational limitations of Bayesian matrix factorization for single cell data analysis. This parallelization framework provides asynchronous updates for sequential updating steps of the algorithm to enhance computational efficiency. These algorithmic advances were coupled with new software architecture and sparse data structures to reduce the memory overhead for single-cell data. Altogether our new software enhance the efficiency of the CoGAPS Bayesian matrix factorization algorithm so that it can analyze 1000 times more cells, enabling factorization of large single-cell data sets.

中文翻译:

CoGAPS 3:具有异步更新和稀疏数据结构的单细胞分析的贝叶斯非负矩阵分解

贝叶斯分解方法,包括模式集中的协调基因活动 (CoGAPS),正在成为单细胞数据的强大分析工具。然而,这些方法比基于梯度的方法具有更大的计算成本。对于大型单细胞数据集的分析,这些成本通常过高。许多这样的方法可以并行运行,从而可以通过在更强大的硬件上运行来克服这一限制。然而,CoGAPS 中的先验分布所施加的约束限制了并行化方法在提高单细胞分析计算效率方面的适用性。我们在 CoGAPS R/Bioconductor 包的第 3 版中开发了一个用于并行矩阵分解的新软件框架,以克服单细胞数据分析中贝叶斯矩阵分解的计算限制。该并行化框架为算法的顺序更新步骤提供异步更新,以提高计算效率。这些算法进步与新的软件架构和稀疏数据结构相结合,以减少单单元数据的内存开销。总之,我们的新软件提高了 CoGAPS 贝叶斯矩阵分解算法的效率,使其可以分析 1000 倍以上的细胞,从而能够对大型单细胞数据集进行分解。该并行化框架为算法的顺序更新步骤提供异步更新,以提高计算效率。这些算法进步与新的软件架构和稀疏数据结构相结合,以减少单单元数据的内存开销。总之,我们的新软件提高了 CoGAPS 贝叶斯矩阵分解算法的效率,使其可以分析 1000 倍以上的细胞,从而能够对大型单细胞数据集进行分解。该并行化框架为算法的顺序更新步骤提供异步更新,以提高计算效率。这些算法进步与新的软件架构和稀疏数据结构相结合,以减少单单元数据的内存开销。总之,我们的新软件提高了 CoGAPS 贝叶斯矩阵分解算法的效率,使其可以分析 1000 倍以上的细胞,从而能够对大型单细胞数据集进行分解。
更新日期:2020-10-14
down
wechat
bug