当前位置: X-MOL 学术PLoS Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimal tuning of weighted kNN- and diffusion-based methods for denoising single cell genomics data
PLOS Computational Biology ( IF 3.8 ) Pub Date : 2021-01-07 , DOI: 10.1371/journal.pcbi.1008569
Andreas Tjärnberg 1, 2, 3 , Omar Mahmood 4 , Christopher A Jackson 2, 3 , Giuseppe-Antonio Saldi 2 , Kyunghyun Cho 5, 6 , Lionel A Christiaen 1, 3 , Richard A Bonneau 2, 3, 4, 5, 6
Affiliation  

The analysis of single-cell genomics data presents several statistical challenges, and extensive efforts have been made to produce methods for the analysis of this data that impute missing values, address sampling issues and quantify and correct for noise. In spite of such efforts, no consensus on best practices has been established and all current approaches vary substantially based on the available data and empirical tests. The k-Nearest Neighbor Graph (kNN-G) is often used to infer the identities of, and relationships between, cells and is the basis of many widely used dimensionality-reduction and projection methods. The kNN-G has also been the basis for imputation methods using, e.g., neighbor averaging and graph diffusion. However, due to the lack of an agreed-upon optimal objective function for choosing hyperparameters, these methods tend to oversmooth data, thereby resulting in a loss of information with regard to cell identity and the specific gene-to-gene patterns underlying regulatory mechanisms. In this paper, we investigate the tuning of kNN- and diffusion-based denoising methods with a novel non-stochastic method for optimally preserving biologically relevant informative variance in single-cell data. The framework, Denoising Expression data with a Weighted Affinity Kernel and Self-Supervision (DEWÄKSS), uses a self-supervised technique to tune its parameters. We demonstrate that denoising with optimal parameters selected by our objective function (i) is robust to preprocessing methods using data from established benchmarks, (ii) disentangles cellular identity and maintains robust clusters over dimension-reduction methods, (iii) maintains variance along several expression dimensions, unlike previous heuristic-based methods that tend to oversmooth data variance, and (iv) rarely involves diffusion but rather uses a fixed weighted kNN graph for denoising. Together, these findings provide a new understanding of kNN- and diffusion-based denoising methods. Code and example data for DEWÄKSS is available at https://gitlab.com/Xparx/dewakss/-/tree/Tjarnberg2020branch.



中文翻译:

用于单细胞基因组数据去噪的加权 kNN 和扩散方法的优化调整

单细胞基因组学数据的分析提出了一些统计挑战,并且已经做出了广泛的努力来产生用于分析该数据的方法,这些方法可以估算缺失值、解决采样问题以及量化和校正噪声。尽管做出了这些努力,但尚未就最佳实践达成共识,并且基于现有数据和实证测试,所有当前方法都有很大差异。k 最近邻图 (kNN-G) 通常用于推断细胞的身份和细胞之间的关系,并且是许多广泛使用的降维和投影方法的基础。kNN-G 也是使用邻域平均和图扩散等插补方法的基础。然而,由于缺乏商定的最佳目标函数来选择超参数,这些方法往往会过度平滑数据,从而导致有关细胞身份和调控机制背后的特定基因到基因模式的信息丢失。在本文中,我们研究了一种新颖的非随机方法对基于 kNN 和扩散的去噪方法的调整,以最佳地保留单细胞数据中的生物学相关信息方差。该框架名为“具有加权亲和力内核和自我监督的表达数据去噪”(DEWäKSS),使用自我监督技术来调整其参数。我们证明,使用我们的目标函数选择的最佳参数进行去噪(i)对于使用已建立基准的数据的预处理方法来说是稳健的,(ii)解开细胞身份并在降维方法上保持稳健的聚类,(iii)沿多个表达保持方差与之前基于启发式的方法倾向于过度平滑数据方差不同,(iv) 很少涉及扩散,而是使用固定加权的 kNN 图进行去噪。总之,这些发现提供了对基于 kNN 和基于扩散的去噪方法的新理解。DEWäKSS 的代码和示例数据可在 https://gitlab.com/Xparx/dewakss/-/tree/Tjarnberg2020branch 获取。

更新日期:2021-01-07
down
wechat
bug