当前位置: X-MOL 学术J. Biotechnol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-assignment clustering: Machine learning from a biological perspective
Journal of Biotechnology ( IF 4.1 ) Pub Date : 2020-12-04 , DOI: 10.1016/j.jbiotec.2020.12.002
Benjamin Ulfenborg 1 , Alexander Karlsson 2 , Maria Riveiro 3 , Christian X Andersson 4 , Peter Sartipy 1 , Jane Synnergren 1
Affiliation  

A common approach for analyzing large-scale molecular data is to cluster objects sharing similar characteristics. This assumes that genes with highly similar expression profiles are likely participating in a common molecular process. Biological systems are extremely complex and challenging to understand, with proteins having multiple functions that sometimes need to be activated or expressed in a time-dependent manner. Thus, the strategies applied for clustering of these molecules into groups are of key importance for translation of data to biologically interpretable findings. Here we implemented a multi-assignment clustering (MAsC) approach that allows molecules to be assigned to multiple clusters, rather than single ones as in commonly used clustering techniques. When applied to high-throughput transcriptomics data, MAsC increased power of the downstream pathway analysis and allowed identification of pathways with high biological relevance to the experimental setting and the biological systems studied. Multi-assignment clustering also reduced noise in the clustering partition by excluding genes with a low correlation to all of the resulting clusters. Together, these findings suggest that our methodology facilitates translation of large-scale molecular data into biological knowledge. The method is made available as an R package on GitLab (https://gitlab.com/wolftower/masc).



中文翻译:

多任务聚类:生物学视角下的机器学习

分析大规模分子数据的一种常用方法是对具有相似特征的对象进行聚类。这假设具有高度相似表达谱的基因可能参与了一个共同的分子过程。生物系统极其复杂且难以理解,蛋白质具有多种功能,有时需要以时间依赖性方式激活或表达。因此,用于将这些分子聚类成组的策略对于将数据转换为生物学可解释的发现至关重要。在这里,我们实施了一种多分配聚类 (MAsC) 方法,该方法允许将分子分配给多个聚类,而不是像常用聚类技术中的单个聚类。当应用于高通量转录组数据时,MAsC 增加了下游通路分析的能力,并允许识别与实验环境和所研究的生物系统具有高度生物学相关性的通路。多分配聚类还通过排除与所有结果聚类具有低相关性的基因来降低聚类分区中的噪声。总之,这些发现表明我们的方法有助于将大规模分子数据转化为生物学知识。该方法在 GitLab (https://gitlab.com/wolftower/masc) 上作为 R 包提供。多分配聚类还通过排除与所有结果聚类具有低相关性的基因来降低聚类分区中的噪声。总之,这些发现表明我们的方法有助于将大规模分子数据转化为生物学知识。该方法在 GitLab (https://gitlab.com/wolftower/masc) 上作为 R 包提供。多分配聚类还通过排除与所有结果聚类具有低相关性的基因来降低聚类分区中的噪声。总之,这些发现表明我们的方法有助于将大规模分子数据转化为生物学知识。该方法在 GitLab (https://gitlab.com/wolftower/masc) 上作为 R 包提供。

更新日期:2020-12-16
down
wechat
bug